secushare Scalability

secushare Scalability

If you prefer to watch a video instead of reading an article, here it is: Scalable and Privacy-Respectful Distributed Systems - Our Chance to Avoid Cloud Computing?

Scalability by Multicast

Once we have a first iteration of this technology running, we can move on to add the extras that make this fully scalable. We already laid out the architecture pretty well, but it still makes sense to use multicast distribution trees (not to be confused with IP Multicast or LAN Multicast!) to make sure distribution of popular data will not get stuck in bottlenecks.

Our PSYC project has gained some experience with application-level multicast, so it would be the easiest to deploy a multicast overlay over the routing technology as we have it, by using PSYC on top of it. That's what our new pubsub and multicast APIs for GNUnet do.

Next we can dig deeper and optimize the routing technology to support such distribution trees the best and also in a safest possible way: We don't want the structure of our distribution trees to be visible to an observer. At the same time user interface designers can start making our dreams visually come true.

So this is the step that is likely to yield humanity-wide scalability. Unfortunately we cannot provide mathematical proof, we're just talking from experience so far. And it isn't rocket science really, after all BitTorrent and cloud technology scale in similar ways.

Protocol efficiency

Of course we shouldn't be wasting CPU power parsing notoriously inefficient formats like XML or JSON. It's not a bottleneck, but since we want to be energy-efficient also on small devices, let's pick a protocol that performs better under such conditions.

Ignorance in the Standards Community

We have been warning about the importance of addressing scalability upfront in both the federated and the distributed networking worlds, and yet standards are being written that simply do not take the problem into account. Both new 'social web' standards do not provide means for scalable delivery of events:

  • Audience targeting as specified in ActivityStreams could be helpful to build distribution tree logic on top, but neither this nor the ActivityPub specification mention any need or use for distribution trees, let alone specify how to do them. Instead they have plenty of exceptions from the 'subscribe/follower' paradigm that would be natural for multicast. Things like 'to', 'cc' and 'bcc' which make a distribution protocol unnecessarily complex, reduce efficiency and introduce vulnerabilities (allowing for spam most likely). So the standards are already bloated before they even sustain the primary purpose.
  • The concurring WebMention standard only has a notion of a 'target' which could be generalized to mean a multicast distribution group, but there's no mention of anything like that.

The outcome of this is that these new standards only work for small communities or for cloud-based systems. Scaling up to a use for the whole human race without centralization becomes near impossible. There is plenty of scientific work documenting the importance of serious distribution strategies, yet the protocol design community continues to have a blind spot in that regard, seven years after we presented "Scalability & Paranoia in a Decentralized Social Network" at a Federated Social Web conference.