big data tutorial - qconsp.com · introduction to big data and its uses survey of big data...
TRANSCRIPT
![Page 1: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/1.jpg)
Big Data TutorialQCon São Paulo 2013
![Page 2: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/2.jpg)
Everything old is new again
![Page 3: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/3.jpg)
Everything old is technically
feasible
![Page 4: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/4.jpg)
![Page 5: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/5.jpg)
![Page 6: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/6.jpg)
![Page 7: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/7.jpg)
The ability to summon 100’s or 1000’s of
machines with an API call is what brings parallel
computing to everyone...
![Page 8: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/8.jpg)
combined with virtually limitless cloud storage,
Big Data is now accessible to everyone, not just big companies.
![Page 9: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/9.jpg)
Tweet @jedberg with feedback!
Jeremy Edberg
![Page 10: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/10.jpg)
Tweet @jedberg with feedback!
What is reddit?
![Page 11: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/11.jpg)
Tweet @jedberg with feedback!
Netflix is the world’s leading Internet television network with nearly 38 million members in 40
countries enjoying more than one billion hours of TV shows and movies per month, including original
series. For one low monthly price, Netflix members can watch as much as they want, anytime,
anywhere, on nearly any Internet-connected screen.Source: http://ir.netflix.com
What is Netflix?
![Page 12: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/12.jpg)
Tweet @jedberg with feedback!
Why Big Data, how is it useful and what can it do for you?
SQL and NoSQL -- What's the difference, what are the pros and cons, how do you move from one
to the other?
Practical steps to keep your Big Data systems reliable.
NoSQL technologies such as HBase/HDFS, BigTable, MongoDB, S3, Redis, Cassandra, Hadoop, Pig, Hive,
Flume and more.
What You Will Learn
![Page 13: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/13.jpg)
Tweet @jedberg with feedback!
This is your workshop
• We’ll be together for 3+ hours
• You (or your employer) paid a lot of money to be here
• Let’s make it worth your while!
![Page 14: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/14.jpg)
Tweet @jedberg with feedback!
Let’s make this awesome together
• Ask questions
• Let me know if you want me to move on or go into more detail
• Keep it interactive!
![Page 15: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/15.jpg)
Tweet @jedberg with feedback!
Schedule
Introduction to Big Data and its uses
Survey of Big Data Technology
Real-Time Data Systems
Demo: Cassandra in Action -- Building and using a data model
Building reliable Big Data systems
Wrap up, conclusions, questions
![Page 16: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/16.jpg)
Tweet @jedberg with feedback!
What is Big Data?
• The tools and processes of managing and utilizing large datasets.
• (with virtualized resources)
• Structured and Unstructured data
(I’ll ask this again at the end)
![Page 17: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/17.jpg)
Tweet @jedberg with feedback!
Simple vs. Complex
![Page 18: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/18.jpg)
Tweet @jedberg with feedback!
Flu outbreak
![Page 19: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/19.jpg)
Tweet @jedberg with feedback!
Data Wants to be Free
![Page 20: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/20.jpg)
Tweet @jedberg with feedback!
![Page 21: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/21.jpg)
Tweet @jedberg with feedback!
Data is the most important asset your business will
have.
![Page 22: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/22.jpg)
Tweet @jedberg with feedback!
![Page 23: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/23.jpg)
Tweet @jedberg with feedback!
Privacy
• That sharing comes at a cost, and that’s privacy.
• Some people value privacy vs utility, and some don’t.
• Teenagers don’t seem to value privacy at all.
![Page 24: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/24.jpg)
Tweet @jedberg with feedback!
So how can Big Data help me?
![Page 25: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/25.jpg)
Tweet @jedberg with feedback!
Security
![Page 26: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/26.jpg)
Tweet @jedberg with feedback!
Security
![Page 27: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/27.jpg)
Tweet @jedberg with feedback!
How Big Data transformed the dairy
industry
![Page 28: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/28.jpg)
Tweet @jedberg with feedback!
How India’s “Satyamev Jayate” uses Big Data to power their TV show.
![Page 29: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/29.jpg)
Tweet @jedberg with feedback!
Trend Analysis
![Page 30: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/30.jpg)
Tweet @jedberg with feedback!
Trend Analysis
![Page 31: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/31.jpg)
Tweet @jedberg with feedback!
![Page 32: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/32.jpg)
Tweet @jedberg with feedback!
Actionable Metrics
![Page 33: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/33.jpg)
Tweet @jedberg with feedback!
Other Metrics
• Pennies earned
• Pageviews
• Votes / comments / links
![Page 34: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/34.jpg)
Tweet @jedberg with feedback!
How Big Data can make your business more successful.
• Use big data to do real time analysis to deliver better experiences for your customers
• Sometimes information is more valuable when it is shared.
• We are floating in good answers, but the good questions are scarce.
• Keep your data clean on the way in.
• Where does big data create value in your company?
![Page 35: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/35.jpg)
Tweet @jedberg with feedback!
What's possible -- and what's difficult -- for companies that adopt Big Data approaches to
storage and analysis.
• Data gravity. As you data gets bigger you need to move your application closer to it.
• Moving from Sql to NoSql
![Page 36: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/36.jpg)
Tweet @jedberg with feedback!
DataWhat does Netflix do with it all?
![Page 37: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/37.jpg)
Tweet @jedberg with feedback!
We store it!
• Cache (memcached)
• Cassandra
• RDS (MySql)
![Page 38: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/38.jpg)
Tweet @jedberg with feedback!
RDS (Relational Database Service)
![Page 39: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/39.jpg)
Tweet @jedberg with feedback!
Cassandra
![Page 40: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/40.jpg)
Tweet @jedberg with feedback!
Overview
3
!""#$%&'()* +(##,%-(./*
0$1,*
Data collection pipeline
Data processing pipeline
234*
Overview Data collection pipeline Data collection pipeline Data collection pipeline
Text
Data Collection Pipeline
3
Data processing pipeline Data processing pipeline
TextTextData Processing Pipeline
![Page 41: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/41.jpg)
Tweet @jedberg with feedback!
Chuckwa/Honu messages / min
63 billion
messages a day
![Page 42: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/42.jpg)
Tweet @jedberg with feedback!
Hiveselect videoID, count(*) as cfrom events where dateint>=20120611 and dateint<=20120617 and event="Watched" and result="SUCCESS" group by videoid order by count desc limit 5;
![Page 43: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/43.jpg)
Tweet @jedberg with feedback!
A/B Testing
![Page 44: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/44.jpg)
Tweet @jedberg with feedback!
A/B Testing
Online Data Offline Data
Test Cell allocationTest MetadataStart/End dateUI Directives
Test trackingRetention
Fraction ViewedPages Viewed
![Page 45: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/45.jpg)
Tweet @jedberg with feedback!
Atlas
![Page 46: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/46.jpg)
Tweet @jedberg with feedback!
AWS Usage (Ice)Dollar amounts have been carefully removed
![Page 47: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/47.jpg)
Tweet @jedberg with feedback!
Chronos
![Page 48: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/48.jpg)
Tweet @jedberg with feedback!
Netflix Dataoven
Data WarehouseOver 2 Petabytes
Ursula
Aegisthus
Data Pipelines
From cloud Services
~100 BillionEvents/day
From C*Terabytes ofDimension
data
Hadoop Clusters – AWS EMR
1300 nodes 800 nodes Multiple 150 nodes
Over 2 Petabytes
Hadoop Clusters – AWS EMR
RDS
Metadata
Gateways
Tools
![Page 49: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/49.jpg)
Tweet @jedberg with feedback!
Genie: Goals
• Open up the data engineering infrastructure• Self-service for SLA/production
jobs
• Abstraction/management of back-end resources• Hadoop/Hive/Pig as a Service• Eliminate “gateway” bottlenecks
![Page 50: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/50.jpg)
Tweet @jedberg with feedback!
Genie: Set of Services
• Job Execution• REST-ful API to run Hadoop,
Hive and Pig jobs
• Abstracting out cluster details from clients
•Horizontal scalability via auto-scaling groups on the cloud
![Page 51: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/51.jpg)
Tweet @jedberg with feedback!
Genie: Set of Services
• Resource Configuration/Management
• Management of cluster status
• Repository of configurations (for cluster, hive, pig)
• Mapping of jobs to clusters
![Page 52: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/52.jpg)
Tweet @jedberg with feedback!
Data Gravity
• Coined by Dave McCrory
• First described here: http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/
![Page 53: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/53.jpg)
Tweet @jedberg with feedback!
What is Data Gravity?
Source: nationalgeographic.com
![Page 54: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/54.jpg)
Tweet @jedberg with feedback!
Data Gravity and you
• The bigger your dataset, the harder it is to move from anywhere to anywhere
• Also, how do you move that data without affecting your running application?
![Page 55: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/55.jpg)
Tweet @jedberg with feedback!
reddit’s data gravity problem
• We had a lot of data that was ever-growing
• We were so resource constrained we couldn’t move it without hurting our application
![Page 56: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/56.jpg)
Tweet @jedberg with feedback!
Netflix’s data gravity problem
• Needed the data in the datacenter
• We were “Roman Riding” for a long time
Source: http://horseandman.com
![Page 57: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/57.jpg)
Tweet @jedberg with feedback!
Questions?
![Page 58: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/58.jpg)
Tweet @jedberg with feedback!
Schedule
Introduction to Big Data and its uses
Survey of Big Data Technology
Real-Time Data Systems
Demo: Cassandra in Action -- Building and using a data model
Building reliable Big Data systems
Wrap up, conclusions, questions
![Page 59: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/59.jpg)
Tweet @jedberg with feedback!
![Page 60: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/60.jpg)
Tweet @jedberg with feedback!
![Page 61: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/61.jpg)
Tweet @jedberg with feedback!
SQL vs. NoSQL
• NoSql is generally unstructured and the data storage is schemaless
• Eventually consistent systems
• Horizontally scalable
![Page 62: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/62.jpg)
Tweet @jedberg with feedback!
SQL vs. NoSQL
• SQL systems have structured data and fixed schemas
• ACID compliant (I’d rather put my $$ here than in an eventually consistent system!)
• Generally have to scale up, not so good at out
![Page 63: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/63.jpg)
Tweet @jedberg with feedback!
CAP Theorem
• Consistent
• Available
• Partition-resistant
![Page 64: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/64.jpg)
Tweet @jedberg with feedback!
Key/Value vs. Document Store
• Key/Value is just like the hash table data structure you are used to
• Great for use with object oriented languages
• redis, Cassandra, S3
![Page 65: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/65.jpg)
Tweet @jedberg with feedback!
Key/Value vs. Document Store
• Stores whole documents with certain properties, often in JSON or XML
• Good for large chunks of data, like things scraped from the web
• MongoDB, CouchDB
![Page 66: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/66.jpg)
Tweet @jedberg with feedback!
JSON• JavaScript Object Notation
• Originally a subject of JSON, now a standard cross platform document format
• Lots of parsers in many languages
• Very similar to XML, less verbose
![Page 67: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/67.jpg)
Tweet @jedberg with feedback!
{"firstName": "John","lastName": "Smith","age": 25,"address": {
"streetAddress": "21 2nd Street","city": "New York","state": "NY","postalCode": "10021"
},"phoneNumber": [
{"type": "home","number": "212 555-1234"
},{
"type": "fax","number": "646 555-4567"
}]
}
![Page 68: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/68.jpg)
Tweet @jedberg with feedback!
The Technologies
![Page 69: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/69.jpg)
Tweet @jedberg with feedback!
Cassandra
![Page 70: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/70.jpg)
Tweet @jedberg with feedback!
Cassandra Architecture
![Page 71: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/71.jpg)
Tweet @jedberg with feedback!
Cassandra Architecture
![Page 72: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/72.jpg)
Tweet @jedberg with feedback!
How it works• Replication factor
• Quorum reads / writes
• Bloom Filter for fast negative lookups
• Immutable files for fast writes
• Seed nodes
• Multi-region
• Gossip protocol
![Page 73: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/73.jpg)
Tweet @jedberg with feedback!
Cassandra Benefits
• Fast writes
• Fast negative lookups
• Easy incremental scalability
• Distributed -- No SPoF
![Page 74: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/74.jpg)
Tweet @jedberg with feedback!
Things Netflix stores in Cassandra
• Track service level call
• Instrument low level HTTP client
• Calls graph (who is calling who)
• Request processing vs Perceived latency
• Payload marshalling/unmarshalling- duration, size, etc
• Service Results- Status, Error code, Exception, etc
![Page 75: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/75.jpg)
Tweet @jedberg with feedback!
Things Netflix stores in Cassandra
• Video Quality
• Network issues
• Usage History
• Playback Errors
![Page 76: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/76.jpg)
Tweet @jedberg with feedback!
Why Cassandra?
• Availability over consistency
• Writes over reads
• We know Java
• Open source + support
![Page 77: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/77.jpg)
Tweet @jedberg with feedback!
astyanax
• Netflix Cassandra Java client
• High level abstractions for Cassandra
• https://github.com/Netflix/astyanax
![Page 78: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/78.jpg)
Tweet @jedberg with feedback!
Hadoop
Image from searchworks.org
![Page 79: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/79.jpg)
Tweet @jedberg with feedback!
http://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/
![Page 80: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/80.jpg)
Tweet @jedberg with feedback!
HBase
• Open source clone of Google’s BigTable (a sparse, distributed multi-dimensional sorted map)
• Integrated with Hadoop for easy reads and writes.
• Distributed
![Page 81: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/81.jpg)
Tweet @jedberg with feedback!
Overview
3
!""#$%&'()* +(##,%-(./*
0$1,*
Data collection pipeline
Data processing pipeline
234*
Overview Data collection pipeline Data collection pipeline Data collection pipeline
Text
Data Collection Pipeline
3
Data processing pipeline Data processing pipeline
TextTextData Processing Pipeline
![Page 82: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/82.jpg)
Tweet @jedberg with feedback!
Hiveselect videoID, count(*) as cfrom events where dateint>=20120611 and dateint<=20120617 and event="Watched" and result="SUCCESS" group by videoid order by count desc limit 5;
![Page 83: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/83.jpg)
Tweet @jedberg with feedback!
Hive
INSERT OVERWRITE TABLE user_active SELECT user.* FROM user WHERE user.active = 1;
![Page 84: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/84.jpg)
Tweet @jedberg with feedback!
PigA = load 'passwd' using PigStorage(':'); B = foreach A generate $0 as id;dump B; store B into ‘id.out’;
![Page 85: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/85.jpg)
Tweet @jedberg with feedback!
Oozie<property> <name>cassandra.thrift.address</name> <value>${cassandraHost}</value></property><property> <name>cassandra.thrift.port</name> <value>${cassandraPort}</value></property><property> <name>cassandra.partitioner.class</name> <value>org.apache.cassandra.dht.RandomPartitioner</value></property><property> <name>cassandra.consistencylevel.read</name> <value>${cassandraReadConsistencyLevel}</value></property><property> <name>cassandra.consistencylevel.write</name> <value>${cassandraWriteConsistencyLevel}</value></property><property> <name>cassandra.range.batch.size</name> <value>${cassandraRangeBatchSize}</value></property>
![Page 86: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/86.jpg)
Tweet @jedberg with feedback!
Other “NoSQL” solutions
• Memcache
• Redis
• CouchDB
• MongoDB
• DynamoDB
• Voldemort
• Riak
• Zookeeper
• S3
• Postgres
![Page 87: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/87.jpg)
Tweet @jedberg with feedback!
I love memcacheI make heavy use of memcached
![Page 88: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/88.jpg)
Tweet @jedberg with feedback!
A
BCC B
3
2
A1
![Page 89: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/89.jpg)
Tweet @jedberg with feedback!
A
BCC B
3
2
A1
D
![Page 90: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/90.jpg)
Tweet @jedberg with feedback!
A
BCC B
3
2
A1
D
+
EVCache
![Page 91: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/91.jpg)
Tweet @jedberg with feedback!
Redis
• Stores the entire database in RAM
• Support complex data structures
• Writes to disk periodically
• Fast and predictable with small datasets
![Page 92: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/92.jpg)
Tweet @jedberg with feedback!
Redis data structures
• strings
• hashes
• lists
• sets and sorted sets
![Page 93: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/93.jpg)
Tweet @jedberg with feedback!
Redis use cases
• As a drop in replacement for memcache
• Ephemeral data that you’re ok with losing
• Performance falls off a cliff when the dataset gets bigger than RAM
![Page 94: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/94.jpg)
Tweet @jedberg with feedback!
CouchDB
• Document oriented database
• Stores json like objects with deep queries
• Not easy to scale horizontally
• Uses JS mapreduce functions call “views” for data access
![Page 95: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/95.jpg)
Tweet @jedberg with feedback!
CouchDB use cases
• You have a large dataset and you want easy access to attributes
• Prototyping or just starting out
![Page 96: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/96.jpg)
Tweet @jedberg with feedback!
MongoDB
• Document store, similar to CouchDB
• JSON like objects that are easy to work with
• Javascript query language
![Page 97: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/97.jpg)
Tweet @jedberg with feedback!
MongoDB use cases
• Similar to CouchDB
• Less scalable than CouchDB
• Biases towards speed over durability
![Page 98: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/98.jpg)
Tweet @jedberg with feedback!
Voldemort
• Open source clone of Amazon’s Dynamo database (not DynamoDB)
• Consistent key hashing for fast lookups and easy horizontal scaling
• Built in versioning
![Page 99: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/99.jpg)
Tweet @jedberg with feedback!
Voldemort use cases
• Places where eventual consistency are ok
• Like an Amazon shopping cart for example!
• Or Linkedin!
• Sometimes multiple different answers can come back, it is up to the client to figure out the right answer
![Page 100: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/100.jpg)
Tweet @jedberg with feedback!
Riak
• Like Voldemort (Amazon’s Dynamo paper)
• Uses a gossip protocol like Cassandra
• Query in Erlang or Javascript
![Page 101: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/101.jpg)
Tweet @jedberg with feedback!
Riak use cases
• Similar to Voldemort, where eventual consistency is ok
![Page 102: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/102.jpg)
Tweet @jedberg with feedback!
Zookeeper
• Specialized key/value store
• Presents like a file system
• Distributed for reliability and fast reads
• At the expense of slow writes with more nodes
![Page 103: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/103.jpg)
Tweet @jedberg with feedback!
Zookeeper use cases
• System configuration
![Page 104: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/104.jpg)
Tweet @jedberg with feedback!
Postgres
![Page 105: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/105.jpg)
Tweet @jedberg with feedback!
Sample Schemalink_thing int id timestamp date int ups int downs bool deleted bool spam
link_data int thing_id string name string value char kind
![Page 106: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/106.jpg)
Tweet @jedberg with feedback!
The thing layer
• Postgres is used like a key/value store
• Thing table has denormalized data
• Data table has arbitrary keys
• Lots of indexes tuned for our specific queries
• Thing and data tables are on the same box, but don’t have to be
![Page 107: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/107.jpg)
Tweet @jedberg with feedback!
Moving from Postgres to Cassandra
• We were lucky -- we already used key/value
• But it wasn’t completely straightforward
• Some things are a lot easier relationaly
• Like taking counts of things
![Page 108: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/108.jpg)
Tweet @jedberg with feedback!
Tips to moving successfully
• No normalizaion
• Your app will have to do a lot of what your database used to do
• De-normalize
![Page 109: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/109.jpg)
Tweet @jedberg with feedback!
Schedule
Introduction to Big Data and its uses
Survey of Big Data Technology
Real-Time Data Systems
Demo: Cassandra in Action -- Building and using a data model
Building reliable Big Data systems
Wrap up, conclusions, questions
![Page 110: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/110.jpg)
Tweet @jedberg with feedback!
Hadoop -- Past its prime?
• Was pioneered by Google, then an open source clone came along
• Google has mostly moved on to more real-time systems
![Page 111: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/111.jpg)
Tweet @jedberg with feedback!
Google Projects
• Dremel a.k.a. BigQuery
• Percolator
• Pregel
![Page 112: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/112.jpg)
Tweet @jedberg with feedback!
Other real-time projects
• Storm -- Twitter
• Turbine -- Netflix
• Redshift -- Amazon
![Page 113: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/113.jpg)
Tweet @jedberg with feedback!
NoSQL + SQL + Hadoop
• The latest trend in Big Data
• Putting a layer of SQL on top of a distributed data store
• Finally splitting the query layer from the data layer!
![Page 114: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/114.jpg)
Tweet @jedberg with feedback!
Schedule
Introduction to Big Data and its uses
Survey of Big Data Technology
Real-Time Data Systems
Demo: Cassandra in Action -- Building and using a data model
Building reliable Big Data systems
Wrap up, conclusions, questions
![Page 115: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/115.jpg)
Tweet @jedberg with feedback!
Building a Data Model
• What questions you want to ask your data?
• Don’t try and normalize anything
• Instead of changing a value keep a record of what happened
![Page 116: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/116.jpg)
Tweet @jedberg with feedback!
Let’s build a telemetry system!
• This is a slightly modified real-world example of something we built to support the Netflix open connect project
![Page 117: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/117.jpg)
Tweet @jedberg with feedback!
Background
• Caches all over the world
• Named like ORD1, LAX1, SJC2, etc.
• We need to collect about 20 metrics for each cache on a regular basis
![Page 118: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/118.jpg)
Tweet @jedberg with feedback!
The questions
• Get last 3 runs for SJC2 and show the collected data
• What caches did we see on the last run and what are their details?
![Page 119: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/119.jpg)
Tweet @jedberg with feedback!
The tables
collected_propertiescollected_propertiescollected_propertiescollected_propertiescollected_propertiescollected_propertiescollected_properties
Keys HealthyHealthy other load upup
collection_cache_by_timescollection_cache_by_timescollection_cache_by_timescollection_cache_by_timescollection_cache_by_timescollection_cache_by_timescollection_cache_by_times
Keys cache1 cache2 cache3 cache4 cache5 ...
collections_by_cachecollections_by_cachecollections_by_cachecollections_by_cachecollections_by_cachecollections_by_cachecollections_by_cache
Keys 1 2 3 4 5 ...
![Page 120: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/120.jpg)
Tweet @jedberg with feedback!
Python Code Walkthrough
![Page 121: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/121.jpg)
Tweet @jedberg with feedback!
![Page 122: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/122.jpg)
Tweet @jedberg with feedback!
![Page 123: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/123.jpg)
Tweet @jedberg with feedback!
![Page 124: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/124.jpg)
Tweet @jedberg with feedback!
![Page 125: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/125.jpg)
Tweet @jedberg with feedback!
![Page 126: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/126.jpg)
Tweet @jedberg with feedback!
![Page 127: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/127.jpg)
Tweet @jedberg with feedback!
Files129837 95014
43534 10020
345345 90069
980345 10001
1098445 59390
9084309 32901
43534 98898Queue
Data Loader
Data Loader
Data Loader
Data Loader
Queue
Data Processor
Data Processor
Data Processor
Data Processor
DB
DB
![Page 128: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/128.jpg)
Tweet @jedberg with feedback!
Schedule
Introduction to Big Data and its uses
Survey of Big Data Technology
Real-Time Data Systems
Demo: Cassandra in Action -- Building and using a data model
Building reliable Big Data systems
Wrap up, conclusions, questions
![Page 129: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/129.jpg)
Tweet @jedberg with feedback!
Building a Reliable Data Store
![Page 130: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/130.jpg)
Tweet @jedberg with feedback!
If it won’t scale, it'll fail.-- paradrox
![Page 131: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/131.jpg)
Tweet @jedberg with feedback!
1 > 2 > 3 Going from two to three is hard
![Page 132: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/132.jpg)
Tweet @jedberg with feedback!
1 > 2 > 3 Going from one to two is harder
![Page 133: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/133.jpg)
Tweet @jedberg with feedback!
1 > 2 > 3If possible, plan for 3 or more from the beginning.
![Page 134: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/134.jpg)
Tweet @jedberg with feedback!
Going multi-zone
![Page 135: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/135.jpg)
Tweet @jedberg with feedback!
Benefits of Amazon’s Zones
• Loosely connected
• Low latency between zones
• 99.95% uptime guarantee per region
![Page 136: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/136.jpg)
Tweet @jedberg with feedback!
Going Multi-region
![Page 137: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/137.jpg)
Tweet @jedberg with feedback!
Leveraging Multi-region
• 100% uptime is theoretically possible.
• You have to replicate your data
• This will cost money
![Page 138: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/138.jpg)
Tweet @jedberg with feedback!
Reliability and $$
![Page 139: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/139.jpg)
Tweet @jedberg with feedback!
Alert Systems
alerting
api
api
COREEvent
Gateway
Paging Service
AmazonSES
CORE Agent
Other Team’s Agent
CORE Agent
Atlas
![Page 140: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/140.jpg)
Tweet @jedberg with feedback!
Automate all the things!
![Page 141: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/141.jpg)
Tweet @jedberg with feedback!
Automate all the things!
• Application startup
• Configuration
• Code deployment
• System deployment
![Page 142: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/142.jpg)
Tweet @jedberg with feedback!
Automation
• Standard base image
• Tools to manage all the systems
• Automated code deployment
![Page 143: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/143.jpg)
Tweet @jedberg with feedback!
Netflix has moved the granularity from the
instance to the cluster
![Page 144: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/144.jpg)
Tweet @jedberg with feedback!
!"#$%&'()*'+,-')./!0)/120)3456)
7'8)1,$')%()*,#-%+'(9):/;)
<#'()*=$=)
/'(#%>=?,@=A%>)
1$('=&,>B):/;)
*CD)
E%1)F%BB,>B)
GH'>!%>>'-$)!*I)J%K'#)
!*I)D=>=B'&'>$)=>L)
1$''(,>B)
!%>$'>$)M>-%L,>B)
!%>#"&'()M?'-$(%>,-#)
:71)!?%"L)1'(+,-'#)
!*I)MLB')F%-=A%>#)
J(%N#')
/?=9)
7=$-O)
The Netflix SOA
![Page 145: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/145.jpg)
Tweet @jedberg with feedback!
The Netflix way
• Everything is “built for three”
• Fully automated build tools to test and make packages
• Fully automated machine image bakery
• Fully automated image deployment
![Page 146: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/146.jpg)
Tweet @jedberg with feedback!
The Monkey Theory
• Simulate things that go wrong
• Find things that are different
![Page 147: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/147.jpg)
Tweet @jedberg with feedback!
![Page 148: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/148.jpg)
Tweet @jedberg with feedback!
The simian army• Chaos -- Kills random instances
• Chaos Gorilla -- Kills zones
• Chaos Kong -- Kills regions
• Latency -- Degrades network and injects faults
• Conformity -- Looks for outliers
• Circus -- Kills and launches instances to maintain zone balance
• Doctor -- Fixes unhealthy resources
• Janitor -- Cleans up unused resources
• Howler -- Yells about bad things like Amazon limit violations
• Security -- Finds security issues and expiring certificates
![Page 149: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/149.jpg)
Tweet @jedberg with feedback!
Circuit BreakersBe liberal in what you accept, strict in what you send
![Page 150: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/150.jpg)
Tweet @jedberg with feedback!
Incident Reviews
• What went wrong?
• How could we have detected it sooner?
• How could we have prevented it?
• How can we prevent this class of problem in the future?
• How can we improve our behavior for next time?
Ask the key questions:
![Page 151: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/151.jpg)
Tweet @jedberg with feedback!
Cassandra Architecture
![Page 152: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/152.jpg)
Tweet @jedberg with feedback!
Cassandra Architecture
![Page 153: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/153.jpg)
Tweet @jedberg with feedback!
Database Resiliency with Shardingwith Sharding
![Page 154: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/154.jpg)
Tweet @jedberg with feedback!
Horizontal vs. Vertical
![Page 155: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/155.jpg)
Tweet @jedberg with feedback!
Sharding• reddit split writes across four master databases
• Links/Accounts/Subreddits, Comments, Votes and Misc
• Each has at least one slave in another zone
• Avoid reading from the master if possible
• Wrote their own database access layer, called the “thing” layer
![Page 156: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/156.jpg)
Tweet @jedberg with feedback!
Queues are your friend• Votes
• Comments
• Thumbnail scraper
• Precomputed queries
• Spam
• processing
• corrections
![Page 157: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/157.jpg)
Tweet @jedberg with feedback!
Pain Points
Higher and more varied network latency
Workaround: Fewer network calls, ask for more data at a time.
![Page 158: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/158.jpg)
Tweet @jedberg with feedback!
Pain Points
EBS sometimes slowed down a bit
Workaround: Use caching and replication with read slaves to avoid relying on a single disk, or better yet, avoid the need for EBS altogether.
![Page 159: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/159.jpg)
Tweet @jedberg with feedback!
Pain Points
Instances go away sometimes or become so slow that you want to make them go away.
Workaround: Avoid single points of failure and make sure your servers have automated configuration.
![Page 160: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/160.jpg)
Tweet @jedberg with feedback!
Protip
The environment in a public cloud is inherently more variant (co-tenants, abusive or heavy users, etc)
Make sure your code is written to handle this -- state should be kept somewhere shared and redundant, not on the instance.
![Page 161: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/161.jpg)
Tweet @jedberg with feedback!
Protip
Security was not the first thought when a lot of the cloud systems were designed
Make it your first thought though. A little planning goes a long way. Use security groups judiciously and keep those keys safe!
![Page 162: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/162.jpg)
Tweet @jedberg with feedback!
Protip
Keep track of those limits!
To prevent someone from consuming too much, all resources have per account limits. Keep track of them and get them raised ahead of when you need them. Make sure to catch the exceptions too.
![Page 163: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/163.jpg)
Tweet @jedberg with feedback!
Cause chaos
![Page 164: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/164.jpg)
Tweet @jedberg with feedback!
Best Practices
• Keep data in multiple Availability Zones
• Avoid keeping state on a single instance
• Take frequent snapshots of EBS disks
• No secret keys on the instance
• Different functions in different Security Groups
![Page 165: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/165.jpg)
Tweet @jedberg with feedback!
Autoscaling
Traffic Peak
Text1
2
![Page 166: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/166.jpg)
Tweet @jedberg with feedback!
What about private clouds?
• Some of the problems you don’t have: noisy neighbors, lack of physical access
• Problem you do have: You have to pay for your spare capacity instead of someone else
![Page 167: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/167.jpg)
Tweet @jedberg with feedback!
A taxonomy of Big Data and next-
generation storage solutions
• Noisy neighbors are a problem.
• Efficiency is necessary and getting better
![Page 168: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/168.jpg)
Tweet @jedberg with feedback!
Schedule
Introduction to Big Data and its uses
Survey of Big Data Technology
Real-Time Data Systems
Demo: Cassandra in Action -- Building and using a data model
Building reliable Big Data systems
Wrap up, conclusions, questions
![Page 169: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/169.jpg)
Tweet @jedberg with feedback!
What is Big Data?
• The tools and processes of managing and utilizing large datasets.
• (with virtualized resources)
• Structured and Unstructured data
(What’s missing?)
![Page 170: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/170.jpg)
Tweet @jedberg with feedback!
This is where the slide on what you should have learned would
go.
I’m more interested in what you actually learned.
![Page 171: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/171.jpg)
Tweet @jedberg with feedback!
More Netflix details
• http://techblog.netflix.com/2010/12/four-reasons-we-choose-amazons-cloud-as.html
• http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html
• http://techblog.netflix.com/2011/03/cloud-connect-keynote-complexity-and.html
![Page 172: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/172.jpg)
Tweet @jedberg with feedback!
Just a quick reminder...(Some of) Netflix is open source:
https://github.com/netflix
Including astyanax:
https://github.com/Netflix/astyanax
reddit is open source too:
https://github.com/reddit
patches are now being accepted!
![Page 173: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/173.jpg)
Tweet @jedberg with feedback!
![Page 174: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/174.jpg)
Tweet @jedberg with feedback!
Netflix is hiring
http://jobs.netflix.com/jobs.html
![Page 175: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/175.jpg)
Tweet @jedberg with feedback!
Please don’t forget to vote!
Voting is how we know what to present to you next time. :)
![Page 176: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/176.jpg)
Tweet @jedberg with feedback!
![Page 177: Big Data Tutorial - qconsp.com · Introduction to Big Data and its uses Survey of Big Data Technology Real-Time Data Systems Demo: Cassandra in Action -- Building and ... Redis data](https://reader031.vdocuments.us/reader031/viewer/2022013021/5f0f33257e708231d442faf0/html5/thumbnails/177.jpg)
Tweet @jedberg with feedback!
Email: jedberg@{gmail,netflix}.com
Twitter: @jedberg
Web: www.jedberg.net
Facebook: facebook.com/jedberg
Linkedin: www.linkedin.com/in/jedberg
You can contact me here for questions: