topic 1: big data and warehouse-scale computing
DESCRIPTION
Cloud Computing Workshop 2013, ITUTRANSCRIPT
1: Big Data and Warehouse-scale Computing
Zubair Nabi
April 17, 2013
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 1 / 23
Outline
1 Introduction
2 Ecosystem
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 2 / 23
Outline
1 Introduction
2 Ecosystem
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 3 / 23
From the very beginning
From the dawn civilization to the year 2003, we created 5EB ofinformation
We now create the same amount of data every 2 days!
By 2012, we had spawned 2.7ZB of data
Following the same trend, we will have 8ZB by 2015
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 4 / 23
From the very beginning
From the dawn civilization to the year 2003, we created 5EB ofinformation
We now create the same amount of data every 2 days!
By 2012, we had spawned 2.7ZB of data
Following the same trend, we will have 8ZB by 2015
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 4 / 23
From the very beginning
From the dawn civilization to the year 2003, we created 5EB ofinformation
We now create the same amount of data every 2 days!
By 2012, we had spawned 2.7ZB of data
Following the same trend, we will have 8ZB by 2015
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 4 / 23
From the very beginning
From the dawn civilization to the year 2003, we created 5EB ofinformation
We now create the same amount of data every 2 days!
By 2012, we had spawned 2.7ZB of data
Following the same trend, we will have 8ZB by 2015
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 4 / 23
Big Data
Large datasets whose processing and storage requirements exceed alltraditional paradigms and infrastructure
I On the order of exabytes and beyond
Generated by web 2.0 applications, sensor networks, scientificapplications, financial applications, etc.
Radically different tools needed to record, store, process, and visualize
Moving away from the desktop
Offloaded to the “cloud”
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
Big Data
Large datasets whose processing and storage requirements exceed alltraditional paradigms and infrastructure
I On the order of exabytes and beyond
Generated by web 2.0 applications, sensor networks, scientificapplications, financial applications, etc.
Radically different tools needed to record, store, process, and visualize
Moving away from the desktop
Offloaded to the “cloud”
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
Big Data
Large datasets whose processing and storage requirements exceed alltraditional paradigms and infrastructure
I On the order of exabytes and beyond
Generated by web 2.0 applications, sensor networks, scientificapplications, financial applications, etc.
Radically different tools needed to record, store, process, and visualize
Moving away from the desktop
Offloaded to the “cloud”
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
Big Data
Large datasets whose processing and storage requirements exceed alltraditional paradigms and infrastructure
I On the order of exabytes and beyond
Generated by web 2.0 applications, sensor networks, scientificapplications, financial applications, etc.
Radically different tools needed to record, store, process, and visualize
Moving away from the desktop
Offloaded to the “cloud”
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
Big Data
Large datasets whose processing and storage requirements exceed alltraditional paradigms and infrastructure
I On the order of exabytes and beyond
Generated by web 2.0 applications, sensor networks, scientificapplications, financial applications, etc.
Radically different tools needed to record, store, process, and visualize
Moving away from the desktop
Offloaded to the “cloud”
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
Big Data
Large datasets whose processing and storage requirements exceed alltraditional paradigms and infrastructure
I On the order of exabytes and beyond
Generated by web 2.0 applications, sensor networks, scientificapplications, financial applications, etc.
Radically different tools needed to record, store, process, and visualize
Moving away from the desktop
Offloaded to the “cloud”
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 5 / 23
Example: Facebook’s “Haystack”
65 billion photos
4 images of different sizes stored for each photoI For a total of 260 billion images and 20PB of storage
1 billion new photos uploaded each week (increment of 60TB)
At peak traffic 1 million images served per second
An image request is like finding a needle in a haystack
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
Example: Facebook’s “Haystack”
65 billion photos4 images of different sizes stored for each photo
I For a total of 260 billion images and 20PB of storage
1 billion new photos uploaded each week (increment of 60TB)
At peak traffic 1 million images served per second
An image request is like finding a needle in a haystack
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
Example: Facebook’s “Haystack”
65 billion photos4 images of different sizes stored for each photo
I For a total of 260 billion images and 20PB of storage
1 billion new photos uploaded each week (increment of 60TB)
At peak traffic 1 million images served per second
An image request is like finding a needle in a haystack
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
Example: Facebook’s “Haystack”
65 billion photos4 images of different sizes stored for each photo
I For a total of 260 billion images and 20PB of storage
1 billion new photos uploaded each week (increment of 60TB)
At peak traffic 1 million images served per second
An image request is like finding a needle in a haystack
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
Example: Facebook’s “Haystack”
65 billion photos4 images of different sizes stored for each photo
I For a total of 260 billion images and 20PB of storage
1 billion new photos uploaded each week (increment of 60TB)
At peak traffic 1 million images served per second
An image request is like finding a needle in a haystack
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
Example: Facebook’s “Haystack”
65 billion photos4 images of different sizes stored for each photo
I For a total of 260 billion images and 20PB of storage
1 billion new photos uploaded each week (increment of 60TB)
At peak traffic 1 million images served per second
An image request is like finding a needle in a haystack
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 6 / 23
More examples
The LHC at CERN generates 22PB of data annually (after throwing awayaround 99% of readings)
The Square Kilometre Array (under construction) is expected to generatehundreds of PB each day
Farecast, a part of Bing, searches through 225 billion flight and pricerecords to advise customers on their ticket purchases
The amount of annual traffic flowing over the Internet is around 700EB
Walmart handles in excess of 1 million transactions every hour (25PB intotal)
400 million Tweets everyday
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
More examples
The LHC at CERN generates 22PB of data annually (after throwing awayaround 99% of readings)
The Square Kilometre Array (under construction) is expected to generatehundreds of PB each day
Farecast, a part of Bing, searches through 225 billion flight and pricerecords to advise customers on their ticket purchases
The amount of annual traffic flowing over the Internet is around 700EB
Walmart handles in excess of 1 million transactions every hour (25PB intotal)
400 million Tweets everyday
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
More examples
The LHC at CERN generates 22PB of data annually (after throwing awayaround 99% of readings)
The Square Kilometre Array (under construction) is expected to generatehundreds of PB each day
Farecast, a part of Bing, searches through 225 billion flight and pricerecords to advise customers on their ticket purchases
The amount of annual traffic flowing over the Internet is around 700EB
Walmart handles in excess of 1 million transactions every hour (25PB intotal)
400 million Tweets everyday
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
More examples
The LHC at CERN generates 22PB of data annually (after throwing awayaround 99% of readings)
The Square Kilometre Array (under construction) is expected to generatehundreds of PB each day
Farecast, a part of Bing, searches through 225 billion flight and pricerecords to advise customers on their ticket purchases
The amount of annual traffic flowing over the Internet is around 700EB
Walmart handles in excess of 1 million transactions every hour (25PB intotal)
400 million Tweets everyday
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
More examples
The LHC at CERN generates 22PB of data annually (after throwing awayaround 99% of readings)
The Square Kilometre Array (under construction) is expected to generatehundreds of PB each day
Farecast, a part of Bing, searches through 225 billion flight and pricerecords to advise customers on their ticket purchases
The amount of annual traffic flowing over the Internet is around 700EB
Walmart handles in excess of 1 million transactions every hour (25PB intotal)
400 million Tweets everyday
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
More examples
The LHC at CERN generates 22PB of data annually (after throwing awayaround 99% of readings)
The Square Kilometre Array (under construction) is expected to generatehundreds of PB each day
Farecast, a part of Bing, searches through 225 billion flight and pricerecords to advise customers on their ticket purchases
The amount of annual traffic flowing over the Internet is around 700EB
Walmart handles in excess of 1 million transactions every hour (25PB intotal)
400 million Tweets everyday
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 7 / 23
Outline
1 Introduction
2 Ecosystem
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 8 / 23
Big data ecosystem
Presentation layer
Application layer: frameworks + storage
Operating system layer
Virtualization layer (optional)
Network layer (intra- and inter-data center)
Physical infrastructure
Can roughly be called the “cloud”
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
Big data ecosystem
Presentation layer
Application layer: frameworks + storage
Operating system layer
Virtualization layer (optional)
Network layer (intra- and inter-data center)
Physical infrastructure
Can roughly be called the “cloud”
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
Big data ecosystem
Presentation layer
Application layer: frameworks + storage
Operating system layer
Virtualization layer (optional)
Network layer (intra- and inter-data center)
Physical infrastructure
Can roughly be called the “cloud”
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
Big data ecosystem
Presentation layer
Application layer: frameworks + storage
Operating system layer
Virtualization layer (optional)
Network layer (intra- and inter-data center)
Physical infrastructure
Can roughly be called the “cloud”
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
Big data ecosystem
Presentation layer
Application layer: frameworks + storage
Operating system layer
Virtualization layer (optional)
Network layer (intra- and inter-data center)
Physical infrastructure
Can roughly be called the “cloud”
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
Big data ecosystem
Presentation layer
Application layer: frameworks + storage
Operating system layer
Virtualization layer (optional)
Network layer (intra- and inter-data center)
Physical infrastructure
Can roughly be called the “cloud”
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
Big data ecosystem
Presentation layer
Application layer: frameworks + storage
Operating system layer
Virtualization layer (optional)
Network layer (intra- and inter-data center)
Physical infrastructure
Can roughly be called the “cloud”
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 9 / 23
Presentation Layer
Acts as the user-facing end of the entire ecosystem
Forwards user queries to the backend (potentially the rest of the stack)
Can be both local and remote
For most web 2.0 applications, the presentation layer is a web portal
For instance, the Google search website is a presentation layer: it takesuser queries, forwards them to a scatter-gather application, and presentsthe results to the user (within a time bound)
Made up of many technologies, such as HTTP, HTML, AJAX, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
Presentation Layer
Acts as the user-facing end of the entire ecosystem
Forwards user queries to the backend (potentially the rest of the stack)
Can be both local and remote
For most web 2.0 applications, the presentation layer is a web portal
For instance, the Google search website is a presentation layer: it takesuser queries, forwards them to a scatter-gather application, and presentsthe results to the user (within a time bound)
Made up of many technologies, such as HTTP, HTML, AJAX, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
Presentation Layer
Acts as the user-facing end of the entire ecosystem
Forwards user queries to the backend (potentially the rest of the stack)
Can be both local and remote
For most web 2.0 applications, the presentation layer is a web portal
For instance, the Google search website is a presentation layer: it takesuser queries, forwards them to a scatter-gather application, and presentsthe results to the user (within a time bound)
Made up of many technologies, such as HTTP, HTML, AJAX, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
Presentation Layer
Acts as the user-facing end of the entire ecosystem
Forwards user queries to the backend (potentially the rest of the stack)
Can be both local and remote
For most web 2.0 applications, the presentation layer is a web portal
For instance, the Google search website is a presentation layer: it takesuser queries, forwards them to a scatter-gather application, and presentsthe results to the user (within a time bound)
Made up of many technologies, such as HTTP, HTML, AJAX, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
Presentation Layer
Acts as the user-facing end of the entire ecosystem
Forwards user queries to the backend (potentially the rest of the stack)
Can be both local and remote
For most web 2.0 applications, the presentation layer is a web portal
For instance, the Google search website is a presentation layer: it takesuser queries, forwards them to a scatter-gather application, and presentsthe results to the user (within a time bound)
Made up of many technologies, such as HTTP, HTML, AJAX, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
Presentation Layer
Acts as the user-facing end of the entire ecosystem
Forwards user queries to the backend (potentially the rest of the stack)
Can be both local and remote
For most web 2.0 applications, the presentation layer is a web portal
For instance, the Google search website is a presentation layer: it takesuser queries, forwards them to a scatter-gather application, and presentsthe results to the user (within a time bound)
Made up of many technologies, such as HTTP, HTML, AJAX, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 10 / 23
Application Layer
Serves as the back-end
Either computes a result for the user, or fetches a previously computedresult or content from storage
The execution is predominantly distributed
The computation itself might entail cross-disciplinary (across sciences)technology
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 11 / 23
Application Layer
Serves as the back-end
Either computes a result for the user, or fetches a previously computedresult or content from storage
The execution is predominantly distributed
The computation itself might entail cross-disciplinary (across sciences)technology
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 11 / 23
Application Layer
Serves as the back-end
Either computes a result for the user, or fetches a previously computedresult or content from storage
The execution is predominantly distributed
The computation itself might entail cross-disciplinary (across sciences)technology
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 11 / 23
Application Layer
Serves as the back-end
Either computes a result for the user, or fetches a previously computedresult or content from storage
The execution is predominantly distributed
The computation itself might entail cross-disciplinary (across sciences)technology
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 11 / 23
Computation
Can be a custom solution, such as a scatter-gather application
Might also be an existing data intensive computation framework, such asMapReduce, Dryad, MPI, etc. or a stream processing system, such asStorm, S4, etc.
Analytics engines: R, Matlab, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 12 / 23
Computation
Can be a custom solution, such as a scatter-gather application
Might also be an existing data intensive computation framework, such asMapReduce, Dryad, MPI, etc. or a stream processing system, such asStorm, S4, etc.
Analytics engines: R, Matlab, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 12 / 23
Computation
Can be a custom solution, such as a scatter-gather application
Might also be an existing data intensive computation framework, such asMapReduce, Dryad, MPI, etc. or a stream processing system, such asStorm, S4, etc.
Analytics engines: R, Matlab, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 12 / 23
Storage
1 Relational database management systems (RDBMS): MySQL, OracleDB, IBM DB2, etc. (structured data)
2 NoSQL: Key-value stores, document stores, graphs, tables, etc.(semi-structured and unstructured data)
I Document stores: MongoDB, CouchDB, etc.I Graphs: FlockDB, etc.I Key-value stores: Dynamo, Cassandra, Voldemort, etc.I Tables: BigTable, HBase, etc.
3 NewSQL: The best of both worlds: Spanner, VoltDB, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
Storage
1 Relational database management systems (RDBMS): MySQL, OracleDB, IBM DB2, etc. (structured data)
2 NoSQL: Key-value stores, document stores, graphs, tables, etc.(semi-structured and unstructured data)
I Document stores: MongoDB, CouchDB, etc.I Graphs: FlockDB, etc.I Key-value stores: Dynamo, Cassandra, Voldemort, etc.I Tables: BigTable, HBase, etc.
3 NewSQL: The best of both worlds: Spanner, VoltDB, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
Storage
1 Relational database management systems (RDBMS): MySQL, OracleDB, IBM DB2, etc. (structured data)
2 NoSQL: Key-value stores, document stores, graphs, tables, etc.(semi-structured and unstructured data)
I Document stores: MongoDB, CouchDB, etc.
I Graphs: FlockDB, etc.I Key-value stores: Dynamo, Cassandra, Voldemort, etc.I Tables: BigTable, HBase, etc.
3 NewSQL: The best of both worlds: Spanner, VoltDB, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
Storage
1 Relational database management systems (RDBMS): MySQL, OracleDB, IBM DB2, etc. (structured data)
2 NoSQL: Key-value stores, document stores, graphs, tables, etc.(semi-structured and unstructured data)
I Document stores: MongoDB, CouchDB, etc.I Graphs: FlockDB, etc.
I Key-value stores: Dynamo, Cassandra, Voldemort, etc.I Tables: BigTable, HBase, etc.
3 NewSQL: The best of both worlds: Spanner, VoltDB, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
Storage
1 Relational database management systems (RDBMS): MySQL, OracleDB, IBM DB2, etc. (structured data)
2 NoSQL: Key-value stores, document stores, graphs, tables, etc.(semi-structured and unstructured data)
I Document stores: MongoDB, CouchDB, etc.I Graphs: FlockDB, etc.I Key-value stores: Dynamo, Cassandra, Voldemort, etc.
I Tables: BigTable, HBase, etc.
3 NewSQL: The best of both worlds: Spanner, VoltDB, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
Storage
1 Relational database management systems (RDBMS): MySQL, OracleDB, IBM DB2, etc. (structured data)
2 NoSQL: Key-value stores, document stores, graphs, tables, etc.(semi-structured and unstructured data)
I Document stores: MongoDB, CouchDB, etc.I Graphs: FlockDB, etc.I Key-value stores: Dynamo, Cassandra, Voldemort, etc.I Tables: BigTable, HBase, etc.
3 NewSQL: The best of both worlds: Spanner, VoltDB, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
Storage
1 Relational database management systems (RDBMS): MySQL, OracleDB, IBM DB2, etc. (structured data)
2 NoSQL: Key-value stores, document stores, graphs, tables, etc.(semi-structured and unstructured data)
I Document stores: MongoDB, CouchDB, etc.I Graphs: FlockDB, etc.I Key-value stores: Dynamo, Cassandra, Voldemort, etc.I Tables: BigTable, HBase, etc.
3 NewSQL: The best of both worlds: Spanner, VoltDB, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 13 / 23
Operating System Layer
Consists of the traditional operating system stack with the usual suspects,Windows, variants of *nix, etc.
Alternatives exist though. Specialized for the cloud or multicore systems
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 14 / 23
Operating System Layer
Consists of the traditional operating system stack with the usual suspects,Windows, variants of *nix, etc.
Alternatives exist though. Specialized for the cloud or multicore systems
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 14 / 23
Virtualization Layer
Allows multiple operating systems to run on top of the same physicalhardware
Enables infrastructure sharing, isolation, and optimized utilization
Different allocation strategies possible
Easier to dedicate CPU and memory but not the network
Allocation either in the form of VMs or containers
VMWare, Xen, LXC, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
Virtualization Layer
Allows multiple operating systems to run on top of the same physicalhardware
Enables infrastructure sharing, isolation, and optimized utilization
Different allocation strategies possible
Easier to dedicate CPU and memory but not the network
Allocation either in the form of VMs or containers
VMWare, Xen, LXC, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
Virtualization Layer
Allows multiple operating systems to run on top of the same physicalhardware
Enables infrastructure sharing, isolation, and optimized utilization
Different allocation strategies possible
Easier to dedicate CPU and memory but not the network
Allocation either in the form of VMs or containers
VMWare, Xen, LXC, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
Virtualization Layer
Allows multiple operating systems to run on top of the same physicalhardware
Enables infrastructure sharing, isolation, and optimized utilization
Different allocation strategies possible
Easier to dedicate CPU and memory but not the network
Allocation either in the form of VMs or containers
VMWare, Xen, LXC, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
Virtualization Layer
Allows multiple operating systems to run on top of the same physicalhardware
Enables infrastructure sharing, isolation, and optimized utilization
Different allocation strategies possible
Easier to dedicate CPU and memory but not the network
Allocation either in the form of VMs or containers
VMWare, Xen, LXC, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
Virtualization Layer
Allows multiple operating systems to run on top of the same physicalhardware
Enables infrastructure sharing, isolation, and optimized utilization
Different allocation strategies possible
Easier to dedicate CPU and memory but not the network
Allocation either in the form of VMs or containers
VMWare, Xen, LXC, etc.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 15 / 23
Network Layer
Connects the entire ecosystem together
Consists of the entire protocol stack
Tenants assigned to Virtual LANs
Multiple protocols available across the stack
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 16 / 23
Network Layer
Connects the entire ecosystem together
Consists of the entire protocol stack
Tenants assigned to Virtual LANs
Multiple protocols available across the stack
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 16 / 23
Network Layer
Connects the entire ecosystem together
Consists of the entire protocol stack
Tenants assigned to Virtual LANs
Multiple protocols available across the stack
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 16 / 23
Network Layer
Connects the entire ecosystem together
Consists of the entire protocol stack
Tenants assigned to Virtual LANs
Multiple protocols available across the stack
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 16 / 23
Physical Infrastructure Layer
The physical hardware itself
Servers and network elements
Mechanism for power distribution, wiring, and cooling
Servers are connected in various topologies using different interconnects
Dubbed as datacenters
“We must treat the datacenter itself as one massive warehouse-scalecomputer” – Luiz André Barroso and Urs Hölzle
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
Physical Infrastructure Layer
The physical hardware itself
Servers and network elements
Mechanism for power distribution, wiring, and cooling
Servers are connected in various topologies using different interconnects
Dubbed as datacenters
“We must treat the datacenter itself as one massive warehouse-scalecomputer” – Luiz André Barroso and Urs Hölzle
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
Physical Infrastructure Layer
The physical hardware itself
Servers and network elements
Mechanism for power distribution, wiring, and cooling
Servers are connected in various topologies using different interconnects
Dubbed as datacenters
“We must treat the datacenter itself as one massive warehouse-scalecomputer” – Luiz André Barroso and Urs Hölzle
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
Physical Infrastructure Layer
The physical hardware itself
Servers and network elements
Mechanism for power distribution, wiring, and cooling
Servers are connected in various topologies using different interconnects
Dubbed as datacenters
“We must treat the datacenter itself as one massive warehouse-scalecomputer” – Luiz André Barroso and Urs Hölzle
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
Physical Infrastructure Layer
The physical hardware itself
Servers and network elements
Mechanism for power distribution, wiring, and cooling
Servers are connected in various topologies using different interconnects
Dubbed as datacenters
“We must treat the datacenter itself as one massive warehouse-scalecomputer” – Luiz André Barroso and Urs Hölzle
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
Physical Infrastructure Layer
The physical hardware itself
Servers and network elements
Mechanism for power distribution, wiring, and cooling
Servers are connected in various topologies using different interconnects
Dubbed as datacenters
“We must treat the datacenter itself as one massive warehouse-scalecomputer” – Luiz André Barroso and Urs Hölzle
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 17 / 23
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 18 / 23
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 19 / 23
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 20 / 23
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 21 / 23
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 22 / 23
Example: Google
All that infrastructure enables Google to:
Index 20 billion web pages a day
Handle in excess of 3 billion search queries daily
Provide email storage to 425 million Gmail users
Serve 3 billion YouTube videos a day
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 23 / 23
Example: Google
All that infrastructure enables Google to:
Index 20 billion web pages a day
Handle in excess of 3 billion search queries daily
Provide email storage to 425 million Gmail users
Serve 3 billion YouTube videos a day
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 23 / 23
Example: Google
All that infrastructure enables Google to:
Index 20 billion web pages a day
Handle in excess of 3 billion search queries daily
Provide email storage to 425 million Gmail users
Serve 3 billion YouTube videos a day
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 23 / 23
Example: Google
All that infrastructure enables Google to:
Index 20 billion web pages a day
Handle in excess of 3 billion search queries daily
Provide email storage to 425 million Gmail users
Serve 3 billion YouTube videos a day
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 23 / 23
1 Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel.2010. Finding a needle in Haystack: Facebook’s photo storage. InProceedings of the 9th USENIX conference on Operating systems designand implementation (OSDI’10). USENIX Association, Berkeley, CA, USA.
2 Urs Hoelzle and Luiz Andre Barroso. 2009. The Datacenter as aComputer: An Introduction to the Design of Warehouse-Scale Machines(1st ed.). Morgan and Claypool Publishers.
Zubair Nabi 1: Big Data and Warehouse-scale Computing April 17, 2013 24 / 23