The Power of Centralization

The Power of Centralization

"Blockchain" here seems to mean "decentralized personal data". We looked at about 80 such projects and concluded that: 1. It's far harder to pull off than it sounds 2. Even if you can get adoption, it's unlikely that decentralization will improve privacy.

Mr Narayanan, author of 2009's pivotal "De-anonymizing Social Networks", points us to his 2012 research collaboration focusing on the strengths of centralization and how waves of decentralized tech hypes have failed to address them. A Critical Look at Decentralized Personal Data Architectures by Arvind Narayanan, Solon Barocas, Vincent Toubiana, Helen Nissenbaum and Dan Boneh. We first cite his recent comments on the topic, then discuss the paper itself.

Introduction: History of Decentralization Attempts

We're now in wave 3 of attempts to decentralize personal data. Wave 1 was in the late '90s; the companies called themselves "infomediaries". […] A couple of years later, they were all dead and the whole model forgotten. Wave 2 of decentralization attempts was in 2009–10, when concerns about Facebook privacy reached a fever pitch (perhaps even more than today). Literally dozens of decentralized social networks such as Diaspora were launched. At that time, we were surprised by the historical ignorance of the tech community, and we wanted to see what we could learn from the failures so that we don't keep repeating our mistakes and reinventing the wheel. Welcome to wave 3 of personal data decentralization attempts. There's no evidence that we've learnt from past mistakes. Worse, it looks like all the VC and ICO money behind "blockchain" is distorting sound thinking.

The attention economy: A mechanism that replaces rationality with crowd obedience. Especially when money is in the game.

Developers, regulators, and consumer advocates have looked to alternative decentralized architectures as the natural response to threats posed by these centralized services. The result has been a great variety of solutions that include personal data stores (PDS), infomediaries, Vendor Relationship Management (VRM) systems, and federated and distributed social networks. And yet, for all these efforts, decentralized personal data architectures have seen little adoption. This position paper attempts to account for these failures, challenging the accepted wisdom in the web community on the feasibility and desirability of these approaches.

Oh yes, please, we have been haunted by federation and blockchain — there's always a new fad to distract people from technology that could actually work.

The search for alternatives to centralized aggregation of personal data began in the late 1990s which saw a wave of so-called ‘negotiated privacy techniques’ including commercial ‘infomediaries’. These entities would store consumers’ data and help facilitate the drafting of contracts that set the terms of the exchange and use of data.

At least one fad we missed.

Decentralized social networking has been a largely parallel, sometimes overlapping line of development with similar motivations. We subdivide such social networks into federated (ecosystem of interoperable implementations in the client-server model) and distributed (peer-to-peer). The term distributed social networking is frequently but incorrectly used to describe all decentralized social networks.

Yes, terrible! So many people don't understand the difference. Even Wikipedia lumps it all together and doesn't make any distinction. It even goes as far as to define distributed to mean the same as federated, which is utter non-sense. So much for the impartiality and accuracy of Wikipedia, which has long gone out the window.

While some early thinking in the semantic web community could be classified in this category,¹ for the most part decentralized social networking appears not to have anticipated the success of mainstream commercial, centralized social networks, but rather developed as a response to it.

¹) The Internet Archive lists a version of the Friend of a Friend (FOAF) project ( from August 2003, and other efforts may be older.

We actually participated in that discourse back in 2003, trying to convince the semantic web folks that it would at least take federated chat servers to do an "open" Friendster clone. We needed six more years to realize ourselves that federation was not going to fly. But let's get to the meat of the study:

A Unified View of the Data

An architecture without a single point of data aggregation, management and control has several technical disadvantages. First is functionality: there are several types of computations that are hard or impossible without a unified view of the data. Detection of fraud and spam, search, collaborative filtering, identification of trending topics and other types of analytics are all examples.

We think that a unified view of all private and intimate data isn't just a breach of civil rights, it undermines the stability of a democratic open society. Therefore we expect humanity to arrive at the comprehension that this kind of data gathering must be illegal. GDPR is a careful first step in that direction, but we actually need this to be taken seriously on a constitutional level.

What we can still have is a unified view of public information, allowing all sorts of third parties to run algorithms on the open data of a Twitter clone, which — if implemented within secushare — can still protect metadata (the fact that you are following somebody's public expressions). That should provide for trending topics and possibly still allow for a tad too much in analytics.

Whereas collaborative search and filtering are better implemented within our private view of the social graph -- it will reflect our "filter bubble", but in an age of psychological mass manipulation that is an advantage: we see an honest representation of collective knowledge of our social surroundings because manipulators cannot effectively infiltrate it with troll armies. By consequence, SPAM and fraud must come from somebody in our graph. We can see it, we can remove it, we can pay attention to others removing rotten leaves and branches of the graph.

Trade-off between Consistency and Availability

Decentralized systems also suffer from inherently higher network unreliability, resulting in a tradeoff between consistency and availability (formalized as the CAP theorem); they may also be slower from the user’s point of view. The need for synchronized clocks and minimizing data duplication are other challenges.

The complexity in the design of GNUnet shows how much effort went into dealing with these challenges. In CAP terms, secushare emphasizes on Availability and Partition tolerance, trading in Consistency. This is achieved by maintaining a local database of graph data which is updated by means of incoming pubsub streams.

This also addresses user perception: since all interaction is with the local data, secushare appears lightning fast – except when cloud computation would actually be faster than a local calculation, but that sort of centralized AI computation must become illegal for private and social data anyway. Luckily, when doing social communications among humans, people are used to asynchronicity in reactions – to them it doesn't make a big difference if the delay in a reply is caused by a temporary network failure or because the other side went out to get themselves a cup of coffee.

Other distributed tools such as Retroshare and Briar have chosen a similar path of local storage, but their distribution strategy is opportunistic: they will exchange whatever data with whichever peer they connect – and refuse data which isn't of interest to them. Therefore, intermediate nodes may filter content that endpoints would be interested in. Secure Scuttlebutt also follows this model, however to compensate for the filter effect, large backbone nodes collecting everything have been created, re-introducing centralization and the opportunity for social data harvesting.

Additionally to secushare's apparently unique pubsub multicast model, GNUnet's vast toolset offers the possibility to choose other trade-offs regarding the CAP theorem whenever it makes more sense for the application. The use of GNS and DHT, for example, can increase Consistency while reducing Availability.

The CAP theorem may have been presented in 1999, but reasoning on the topic of netsplits in the IRC community goes back to at least 1990. IRC was designed to expect database consistency yet forced to provide availability, resulting in its habit of throwing away the entire database each time there is a network failure. That's how we understood that the large shared database was IRC's greatest flaw, leading to the design of PSYC.

The Fallacy of Open Standardization

Shapiro notes two benefits of standardization: greater realization of network effects and protection of buyers from stranding, and one cost: constraints on variety and innovation, and argues that the impact on competition can be either a benefit or a cost.²

²) C. Shapiro, Setting Compatibility Standards: Cooperation or Collusion?, Oxford University Press, 2001.

We discussed the problematic aspects of open standards regarding PGP and the broken Internet in general. XMPP is a spectacular example of a whole set of bad ideas being declared a standard, impeding many brains from working on better solutions for over a decade. Open standards may introduce buyer protection whenever there is anything to buy, but in the case of free and open source software it makes no sense to stifle innovation and maintain subobtimal technology in use. Open standards make sense wherever proprietary software does.

Interoperability is a laudable goal; it could enhance social utility, as we have mentioned earlier. However, it has frequently been reduced to the notion of open standards. […] One major impediment is that there are too many standards to choose from.

It makes no sense at all to settle on standards when none of them has actually solved the challenge. They are usually lousy on both privacy and scalability, not just one of the two. The best interoperability is when everybody is working on just one software, collaboratively. Tor may be considered a good example. Re-implementations in other languages are kept tightly integrated with the reference code.

The Cloud Economies of Scale

Centralized systems have significant economies of scale which encompasses hosting costs, development costs and maintenance costs (e.g., combating malware and spam), branding and advertising.

Malware and SPAM are symptoms of the broken Internet. As long as it is as easy as collecting email addresses to expose SPAM and malware to people, we will continue to have plenty of so-called cybercrime. If, instead, it takes a cryptographic relationship between entities to be allowed to communicate in the first place, there is no straightforward and easy way to anonymously send SPAM and malware to ordinary citizen. Businesses still have an interest in answering customer questions anonymously, but even then there is no such thing as phishing if there are neither links to malevolent websites nor executable mail attachments.

That's how those two do not apply for secushare. Yet, cloud technology even has hardware optimizations that individual hardware cannot compete with. We are aware that secushare cannot outperform cloud technology. It is an ethical choice to understand the societal damages of cloud technology for social purposes, to refrain from using it and to question its legal viability.

Network Effects

A related point in the context of social networks: we hypothesize that the network effect is stronger for centralized systems due to tighter integration.

One more reason why we recommend the mandatory inclusion of distributed social networking technology on new devices sold, implicitly by the list of ethical requirements we make, of which we don't know of any other technology that could fulfil them. If the new social networking platform is integrated into your brand new phone in place of SMS text messaging, then it liberates the user from having to connect to a website and executing a registration/login procedure.

Band-Aids for Broken Business Models?

Path dependence is another key economic issue: even if we assume that centralized and decentralized architectures represent equally viable equilibria, which one is actually reached might be entirely a consequence of historical accident. Most of the systems under our purview – unlike, say, email – were initially envisioned as commercial applications operating under central control, and it is unsurprising they have stayed that way.

Another reason why we're not convinced whether "regulating" social platforms would be helpful rather than making their toxic business model entirely illegal.

Cognitive Overload

A variety of cognitive factors hinder adoption of decentralized systems as well. First, the fact that decentralized systems typically require software installation is a significant barrier. Second, more control over personal data almost inevitably translates to more decisions, which leads to cognitive overload. Third, since users lack expertise in software configuration, security vulnerabilities may result. A related point is that users may be unable to meaningfully verify privacy guarantees provided through cryptography.

First, the fact that is has become common practice to provide a web browser rather than a distributed social network merely stems from a historical flow of events: distributed social networks are so hard to implement, we still don't have them. The first one that actually delivers on scalability, privacy and anonymity could easily be made the standard and shipped everywhere. In fact, web browsers have themselves become a source of evil and the web needs to be redesigned by ethical parameters – enforced by legislation, since ethics hardly ever wins on the capitalist market.

Second, we need all the algorithms that companies like Facebook have been applying on us to be transparently implemented on our own devices, operating on our own data, without serving anybody else's interests. This will not only impede cognitive overload, it will actually produce better results than what we are getting from companies whose algorithms, by business needs, can't be sincere.

Third, that is why we recommend liquid democracy for software development so that "constitutional" software like operating systems and social networks can evolve under full public scrutiny, yet efficiently thanks to the liquid democratic paradigm. In such a process of software commons it should no longer work out to sneak in vulnerabilities, possibly by flooding software repositories with unintelligible amounts of code changes, like large corporations usually do with their open source products. Given these new preconditions, we believe software can actually be safe most of the time.

Fourth, if the users do not understand the value in trusting their own device and its cryptography rather than some company, they will first flock to whoever makes noisier promises – ergo, the corporate world – and that's where we are now, with Facebook. Individual action doesn't scale, and software development by a small bunch of enthusiasts can't really compete, so once again regulatory intervention and proper funding of ethical software is overdue. The repeated presence of fallacious hypes distracting funding from the actually important projects also needs to be dealt with, for example by using liquid democracy for transparent and collective rationally elaborated funding decisions rather than picking "experts" secretively.

Distributed Information Flow

Finally, we find that decentralized social networking systems in particular fare poorly in terms of mapping the norms of information flow. Access control provides a very limited conception of privacy. We provide several examples.

First is the idea of “degrees of publicness.” […]

With its concept of aspects and channels, secushare offers much finer grained degrees of distribution than Facebook.

Second, in current social networks privacy is achieved not only through technical defenses but also through “nudges”.³ When there are multiple software implementations, users cannot rely on their friends’ software providing these nudges.

³) For example, if a person is about to enter information on a website — whether it’s a name, a birthday, a photograph, a credit card, or a Social Security number — a message would come up warning the user that his or her information might be compromised if he or she continues. From that point, the user would decide whether or not to proceed.

Nudges make less sense when posting information to your friends and only to your friends, that just isn't like entrusting something to whatever website. Still, nudges could be implemented in secushare as well, under AGPL licensing. We however don't recommend multiple software implementations and we disallow proprietary derivatives.

Third, distributed social networks reveal users’ content consumption to their peers who host the content (unless they have a “push” architecture where users always download accessible content, whether they view it or not, which is highly inefficient.)

Yes, we have a push architecture, but by the selection of channels we decide how intensily we follow people, without revealing precisely which of their information we looked at. Also, accessible content isn't useless just because it hasn't been scrutinized – it still feeds into the local search engine and "little big data" algorithms.

Finally, decentralized social networks make reputation management and “privacy through obscurity” (in the sense of "The Case for Online Obscurity") harder, due to factors such as the difficulty of preventing public, federated data from showing up in search results.

Not applicable. secushare is as obscure as it gets if it takes a "whistleblower" in your social surroundings to expose private bits, facing subsequent exclusion from the network.

For any hope of absolute control, users must, at a minimum, host data on their own device resident on their physical property. […] Furthermore, the software running the services must be open-source, and be audited by third-party certification authorities, or by “the crowd”. […] The user also needs the time and knowhow to configure redundant backups, manage software security, etc.

secushare is intended to be able to recover its data from the network in case of loss of a device. Software security is a problem when using proprietary operating systems that are a known threat to civil liberty anyhow, so future devices should be mandated not to sport unverifiable operating systems.

Further still, hardware might have backdoors, and therefore needs an independent trust mechanism as well. Finally almost all decentralized architectures face the the problem of “down- stream abuse” which is that the user has no technical means to exercise control over use and retransmission of data once it has been shared.

Yes, secure production of trustworthy civil hardware is an ongoing topic of research in legislative options and societal organization: How can we guard hardware manufacturing just as well as we guard mints? Can we design a civil rights defender chip (don't confuse with the entirely different Clipper chip) that would ensure that proprietary apps running in sandboxes cannot access more data and resources than they are allowed to?

Other than that, almost all computers share the problem of "downstream abuse", so the only ultimate way to protect something from your friends is to not share it online.