making sense of big data with hadoop

49
Making Sense of BIG DATA with Hadoop

Upload: chen-gwen-shapira

Post on 26-Jan-2015

118 views

Category:

Technology


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Making Sense of Big data with Hadoop

Making Sense of

BIG DATA with Hadoop

Page 2: Making Sense of Big data with Hadoop

© 2012 Pythian

● 13 years with a pager● Oracle ACE Director● Oak table member● Senior consultant for Pythian● @gwenshap● http://www.pythian.com/

news/author/shapira/● [email protected]

Page 3: Making Sense of Big data with Hadoop

© 2012 Pythian

Pythian Recognized Leader:

• Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and Microsoft SQL Server

• Work with over 165 multinational companies such as LinkShare Corporation, IGN Entertainment, CrowdTwist, TinyCo and Western Union to help manage their complex IT deployments

Expertise:

• One of the world’s largest concentrations of dedicated, full-time DBA expertise. Employ 7 Oracle ACEs/ACE Directors. Heavily involved in the MySQL community, driving the MySQL Professionals Group and sit on the IOUG Advisory Board for MySQL.

• Hold 7 Specializations under Oracle Platinum Partner program, including Oracle Exadata, Oracle GoldenGate & Oracle RAC

Global Reach & Scalability:

• 24/7/365 global remote support for DBA and consulting, systems administration, special projects or emergency response

3

Page 4: Making Sense of Big data with Hadoop

What is Big Data?

Page 5: Making Sense of Big data with Hadoop

© 2012 Pythian

MORE DATA THAN YOU CAN HANDLE

Page 6: Making Sense of Big data with Hadoop

© 2012 Pythian

MORE DATA THAN RELATIONAL DATABASESCAN HANDLE

Page 7: Making Sense of Big data with Hadoop

© 2012 Pythian

MORE DATA THAN RELATIONAL DATABASESCAN HANDLE CHEAPLY

Page 8: Making Sense of Big data with Hadoop

© 2012 Pythian

Data Arriving at fast RatesTypically unstructuredStored without aggregationAnalyzed in Real TimeFor Reasonable Cost

Page 9: Making Sense of Big data with Hadoop

© 2012 Pythian

Complex Data Architecture

Page 10: Making Sense of Big data with Hadoop

© 2012 Pythian

Your Data

is NOTas BIG

as you think

Page 11: Making Sense of Big data with Hadoop

Why Big Data?Why Hadoop?

Page 12: Making Sense of Big data with Hadoop

© 2012 Pythian

BECAUSE WE CAN

Page 13: Making Sense of Big data with Hadoop

© 2012 Pythian

More Data Beats Smarter Algorithms

Page 14: Making Sense of Big data with Hadoop

© 2012 Pythian

emailPhotos

Tweets

Job posting

Blog posts

Medicalimaging

Sensors

Video

Tags Scanned docs

Page 15: Making Sense of Big data with Hadoop

Data is Messy

Page 16: Making Sense of Big data with Hadoop

© 2012 Pythian 16

An Imperial College Team found:• 3,000 patients under 19 were treated in geriatric

clinics

• between 15,000 and 20,000 men have been admitted to obstetric wards

• and almost 10,000 to gynecology wards

http://www.straightstatistics.org/blog/2012/04/06/why-are-so-many-men-pregnant

Page 17: Making Sense of Big data with Hadoop

Unstructured Eventually Structured

Data

Page 18: Making Sense of Big data with Hadoop

© 2012 Pythian

Scalable Storage+

Massive Parallel Processing

+Reasonable Cost

Page 19: Making Sense of Big data with Hadoop

© 2012 Pythian

Hadoop: Platform for distributed computing

Page 20: Making Sense of Big data with Hadoop

© 2012 Pythian

Hadoop is Scalable. But not fast.

Page 21: Making Sense of Big data with Hadoop

Much Ado about Hadoop

Page 22: Making Sense of Big data with Hadoop

© 2012 Pythian

Assumptions• Lots of data• Large Files• Unstructured• Scan entire files• Unreliable Hardware• Adding servers = increase capacity

Page 23: Making Sense of Big data with Hadoop

© 2012 Pythian

Principles• Bring Code to Data• Share Nothing

Page 24: Making Sense of Big data with Hadoop

© 2012 Pythian

HDFS• Distributed• Replicated• Big Files• Write Once• Read Entire File

Page 25: Making Sense of Big data with Hadoop

© 2012 Pythian

/users/shapira/log-1, blocks {1,4,5}/users/shapira/log-2, blocks {2,3,6}

1

1

2

12 2

3

3 3

4

4

4

5

5

5

666

Page 26: Making Sense of Big data with Hadoop

StartJob 1

StartJob 2

Map

Map

Map

Hadoop Job

Map

Map

Map

Combine

Combine

Reduce

Reduce?

Reduce?

Reduce

Reduce?

Reduce?

StopJob 1

StopJob 1

Results

Map Reduce

Page 27: Making Sense of Big data with Hadoop

© 2012 Pythian

Implementation• Balance disks, cores and RAM• High Bandwidth• More nodes or better nodes?

Page 28: Making Sense of Big data with Hadoop

© 2012 Pythian

It’s about the Ecosystem• Sqoop• Flume• Hive• Pig • HBase

Page 29: Making Sense of Big data with Hadoop

Use Cases

Page 30: Making Sense of Big data with Hadoop

Use Case:Log processing

Page 31: Making Sense of Big data with Hadoop

© 2012 Pythian

Use Case:ETL

OLTP DWH

BI

Page 32: Making Sense of Big data with Hadoop

Use Case:Recommendations

Page 33: Making Sense of Big data with Hadoop

© 2012 Pythian

Use case:Listening to the crowd

Page 34: Making Sense of Big data with Hadoop

© 2012 Pythian 34

Our customers use Hadoop for:• Storing lots of pre-processed data• Merging different data types• Scalable data processing• Advanced data processing

Page 35: Making Sense of Big data with Hadoop

Big Data in your Company

Page 36: Making Sense of Big data with Hadoop

© 2012 Pythian

Easy case:Your CTO heard about Big DataAnd is eager to invest.You have a Big Budget.

Page 37: Making Sense of Big data with Hadoop

© 2012 Pythian

Require

Acquire

Organize

Analyze

Serve

Measure

Page 38: Making Sense of Big data with Hadoop

© 2012 Pythian

Require

HadoopNoSQLOLTP

RDMB

HadoopBI, R

BI, NoSQ

L, Oracle

Measure

Page 39: Making Sense of Big data with Hadoop

© 2012 Pythian

Data Scientist=Sneaky BIDisregards SilosCool Toys

Page 40: Making Sense of Big data with Hadoop

© 2012 Pythian

Mining Tools:• Machine Learning• Cluster Detection• Regression• Graph Analysis• Visualization

Page 41: Making Sense of Big data with Hadoop

© 2012 Pythian

http://nicolasrapp.com/?p=1118

Page 42: Making Sense of Big data with Hadoop

© 2012 Pythian

http://www.orgnet.com/slumlords.html

Page 43: Making Sense of Big data with Hadoop

© 2012 Pythian

Want to do more with your data?Don’t know where to start?No budget?

No problem!

Page 44: Making Sense of Big data with Hadoop

© 2012 Pythian

Sneak Hadoop to Your Business• Find an important business problem• Acquire data (be sneaky!)• Get the tools: R, Hadoop, Tableau• Laptops, desktops, test servers• Analyze data• Make pretty charts• Get business used to it• Wait for an Outage• PROFIT!

Page 45: Making Sense of Big data with Hadoop

Oracle Big DataThe “ETL Machine”

Page 46: Making Sense of Big data with Hadoop

© 2012 Pythian

Hardware18 servers216 cores864G RAM648T disksInfiniband

Page 47: Making Sense of Big data with Hadoop

© 2012 Pythian

SoftwareOracle NoSQLCloudera Hadoop DistributionOracle Loader for HadoopData Integrator for HadoopDirect Connector for HadoopOracle Connector for R

Page 48: Making Sense of Big data with Hadoop

© 2012 Pythian

Cores, Storage, Infiniband and SoftwareMakes Oracle Big DataThe Ultimate ETL Machine

Page 49: Making Sense of Big data with Hadoop

© 2012 Pythian 49

Thank you & Q&A

http://www.pythian.com/news/

http://www.facebook.com/pages/The-Pythian-Group/

http://twitter.com/pythian

http://www.linkedin.com/company/pythian

1-866-PYTHIAN

[email protected]

To contact us…

To follow us…