the distributed cloud

Mark Drummond

[Pick the date]

The Distributed Cloud A Foundation for Planetary-Scale Computing The emergence of computing clouds has put a renewed emphasis on the issue of scale in computing. The enormous size of the Web, together with ever-more demanding requirements such as freshness (results in seconds, not weeks) means that massive resources are required to handle enormous datasets in a timely fashion. Datacenters are now considered to be the new units of computer power, e.g. Google's Warehouse-Scale Computer. The number of organizations able to deploy such resources is ever shrinking. Wowd aims to demonstrate that there is an even bigger scale of computing than that yet imagined -- specifically -- planetary-sized distributed clouds. Such clouds can be deployed by motivated collections of users, instead of a handful of gigantic organizations.

2

Background Clouds have emerged as a major trend in computing, as an answer to the ever-increasing scale of resources required to handle Web-sized tasks. The definition of cloud is still not firmly established, so let us start with ours. We consider a cloud to be a collection of computing resources, where it is possible to allocate and provision additional resources in an incremental and seamless way, with no disruption to the deployed applications. In this key respect, a cloud is not simply a group of servers co-located at some data center since with such a collection it is not simple, nor very clear, how to deploy additional machines for many tasks. Consider, for example, the task of a server supporting a Relational Database Management System. A large increase in the number of records in the database cannot be simply handled only by adding additional machines since the underlying database needs to be partitioned such that all underlying operations and queries perform in a satisfactory fashion across all of the machines. The solution in this situation requires significant re-engineering of the database application. Clouds are considered to be collections of machines where it is possible to dynamically scale and provision additional resources for underlying application(s) with no change nor disruption to the operation. Some, such as Google, consider datacenters which are basis for clouds, to be a new form of "warehouse-scale computer" (source: "The Datacenter as a Computer, Google Inc. 2009) Clearly, the number of organizations capable of deploying such resources is small, and getting smaller, due to prohibitive cost. Consider, as an example, P2P networks. For the longest time, indeed, since the very inception of P2P, these networks have been asssociated with a rather narrow scope of activities – principally, sharing of media content. The scale of computing occuring in such networks every moment is truly staggering. However, there is a common (mis)perception that such massive distributed systems are good only for a very limited set of activities, specifically, the sharing of (often illicit) content. Our goal is to demonstrate that distributed networks can be a basis for tremendously powerful distributed clouds, quite literally of planetary-scale. At that scale, the power provided by such a cloud actually dwarfs the power of even the biggest prioprietary clouds. Distributed vs. Proprietary Clouds Planetary-scale distributed clouds have different properties than proprietary clouds. First, note that proprietary clouds appear to be much more homogeneous and (very) tightly coupled compared to distributed ones. In their white paper on datacenter-scale computing, Luiz Andre Barroso and Urs Hoelzle consider each datacenter to be a monolithic warehouse-scale computer.

3

The key to any computer is coupling and communication bandwidth and latency among its components. In a datacenter, one might consider individual servers to be very tightly coupled through a very reliable network. Yet such a network has severe design limitations, e.g. all machines must communicate with a 1 or 10 Gbps fixed bandwidth. The network imposes very significant constraints on the scaling of additional machines in the cloud and places a limit on how far the scaling can go. An important point about scaling of proprietary clouds is a very clear distinction between computing within a single datacenter as opposed to computing across multiple datacenters. A very good real-world example of this distinction is Google search: a query is always answered from a single datacenter, never across multiple ones. In the above-referenced white paper, the authors do not consider multiple datacenters, viewing it as a set of networked computers, with a view that in the future they might need to re-examine how they draw boundaries. We do not enforce such a distinction, and indeed we view the connectivity limits across datacenters, or individual machines across the planet, as the constraints among components of a planetary-size computer. As such, we simply have to live with wide variance in the performance capabilities of the parts of the cloud, much like the huge speed differential between RAM and disk within an individual computer. Proper performance of an individual computer is predicated on good design balance among its components: CPU, RAM, disk, peripherals and connectivity constraints among them. The constraints within a datcenter, or a warehouse-scale computer, are very similar in spirit in that it is essential to achieve a balance in designing the components and communication channels among them to achieve optimal performance. What we are saying is that it is only natural to extend such design thinking and consideration much broader than a datacenter, on a planetary scale. For example, the connectivity constraints of individual nodes may be much stricter compared to a datacenter, yet the aggregate bandwidth and path redundancy of the communication medium of a planetary-scale computer (the Internet) are vastly larger.

4

Left: Centralized cloud at small scale; Right: Distributed cloud, at small scale.

Left: Centralized cloud at large scale; Right: Distributed cloud, at large scale. Aggregate Resources The resources contained in a distributed cloud are truly staggering. Consider a collection of 1M users. Such a group, while significant, would not be among the largest distributed networks in existence today. Consider the case of users running an application that uses 200MB of RAM on their machines. The aggregate amount of the RAM available in the system would be 200TB. Assuming a contribution of 1Mbps of bandwidth per user, the aggregate bandwidth of the system would be 1000 Gbps. Assuming disk space of 10GB / user, then the aggregate available disk space would be 10 PB. Of course, one need not stop at 1M users! Indeed, the largest existing systems, such as Skype, now surpass 10M simultaneous on-line machines. With the worldwide Internet population having already passed 1B people, it’s easy to envision a system with tens of millions or perhaps even hundreds millions of

VS

VS

5

participants. The computing power of such a system would be truly planetary-class! At first, it might seem that such a system would be necessarily unreliable and inconsistent, due to the fact that any given participant can choose to join the network or to leave it at any time. But this impression is wrong, as any user of BitTorrent can attest. In fact, the the aggregate reliability of such a system is unsurpassed, because of the masive redundancy employed. The key to achieving this is designing the interaction of system components in a way that leverages strengths such as aggregate resources and deals effectively with constraints, such as unreliability and the bandwith limitations of individual nodes and communication latencies. New Alternatives The scaling requirements of more traditional architectures are driving the development of new approaches. For example, Relational Database Management Systems have been the backbone of data access for decades. However, their scaling limitations have resulted in the development of approaches such as key-value stores. Another example is Random Access Memory (RAM) – the development of in-memory data grids has arisen from the need to effectively leverage the aggregate RAM of large collections of machines. Our Solution Wowd is building a distributed cloud with the goal of achieving a planetary scale, with (tens, or hundreds of) millions of participants. We are also developing a set of key applications to work on that cloud, including search, discovery and recommendation. We want to demonstrate that a planetary-scale distributed cloud is the perfect platform for the development of applications able to process data from the entire web, in real time. Some of these applications may be surprising, for instance, search. The overall latency in computing an answer to a search query is a key parameter. It might be (very) surprising to expect that an answer to a query can be computed in under, say 1 sec, on a planetary-scale cloud. It turns out with the right design, that this is entirely possible. We mentioned in the paragraphs above some of the common and perceived issues with distributed clouds. We have developed methods to deal with these limitations very effectively, specifically:

• Unreliability of individual nodes is handled by large redundancy.

• Individual bandwidth limitations are dealt with by partitioning data into a large number of small pieces.

• Communication latencies are handled by limiting the number of hops on

6

critical paths. Conclusion In summary, our goal is to demonstrate that clouds are not the province of the chosen and powerful few, but that massive clouds – capable of supporting even web search – can be created from much more diverse individual machines on a much wider scale. The aggregate power of such distributed clouds will dwarf the size and power of proprietary clouds. We strongly believe that the advent of distributed clouds will usher in a new era of cloud computing. This new era will be characterized by decentralized and democratic access to computational resources.

the distributed cloud

Technology

scale of resources

scale of computing occuring

bigger scale of computing

issue of scale

warehousescale computer

scale computer source

emergence of computing

massive resources