![Page 2: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/2.jpg)
Cloudera The Leader in Big Data Management Powered by Apache Hadoop™
The Leading Open Source Distribution of Apache Hadoop
Powerful Suite of System & Data Management Software
Built for the Enterprise
Founded: 2008
Employees: 450+
Customers: Over 50% of the Fortune 50 and 65% of the Fortune 500 plus top US intelligence and defense agencies
Partner Ecosystem: 700+ in hardware, software, and services
Education: 15,000+ trained annually; developers, admins, analysts, data scientists
Community: Founders and top supporters of the Hadoop open source ecosystem
2 ©2013 Cloudera, Inc. All Rights Reserved.
![Page 3: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/3.jpg)
Cloudera’s Mission Help Organizations Gain Value from All Their Data
Solve data problems.
Solve problems with data.
Ask Bigger Questions.
3 ©2013 Cloudera, Inc. All Rights Reserved.
![Page 4: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/4.jpg)
Why is This Happening Now?
4 ©2013 Cloudera, Inc. All Rights Reserved.
![Page 5: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/5.jpg)
IT’S ALL (BIG) DATA (NOT)
10TB to 10PB
5 ©2013 Cloudera, Inc. All Rights Reserved.
![Page 6: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/6.jpg)
Complications of Status Quo
Structure Storage Network Silos
6 ©2013 Cloudera, Inc. All Rights Reserved.
![Page 7: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/7.jpg)
The Story of “T”
7
OLTP
Enterprise Applications
ODS
Data Warehouse
Query Extract
Transform
Load
Transform
Business Intelligence
©2013 Cloudera, Inc. All Rights Reserved.
![Page 8: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/8.jpg)
Volume, Velocity, Variety = Problems
8
OLTP
Enterprise Applications
Data Warehouse
Query Extract
Transform
Load
Transform
1
1
1
Slow Data Transformations = Missed ETL SLAs.
2
2
Slow Queries = Frustrated Business Users.
3 Must Archive. Archived data has a ton of latent value
Business Intelligence
©2013 Cloudera, Inc. All Rights Reserved.
![Page 9: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/9.jpg)
Data Warehouse Optimization
9
OLTP
Enterprise Applications
ODS
Data Warehouse
Query (High $/Byte)
Cloudera
Transform
Query History
Active Storage
ETL Business Intelligence
©2013 Cloudera, Inc. All Rights Reserved.
![Page 10: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/10.jpg)
10
Our Vision: The Android of Big Data
Integration and Data Collection
Storage for All of your Data (Structured or Unstructured)
Met
adat
a
Man
age
me
nt
Secu
rity
Batch Processing
… Interactive
SQL Interactive
Search Machine Learning
Partner Apps
Processing & Analytics
Resource Management
Cloudera Enterprise | The Platform for Big Data
©2013 Cloudera, Inc. All Rights Reserved.
![Page 11: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/11.jpg)
Agility/Flexibility
11
Schema-on-Read (Hadoop):
Schema-on-Write (RDBMS):
• Prescriptive Data Modeling:
• Create static DB schema
• Transform data into RDBMS
• Query data in RDBMS format
• New columns must be added explicitly before new data can propagate into the system.
• Good for Known Unknowns (Repetition)
• Descriptive Data Modeling:
• Copy data in its native format
• Create schema + parser
• Query Data in its native format (does ETL on the fly)
• New data can start flowing any time and will appear retroactively once the schema/parser properly describes it.
• Good for Unknown Unknowns (Exploration)
©2013 Cloudera, Inc. All Rights Reserved.
![Page 12: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/12.jpg)
Scalable Technology + Scalable Development
12
Grows without requiring developers to re-architect their algorithms/application
©2013 Cloudera, Inc. All Rights Reserved.
AUTO SCALE
![Page 13: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/13.jpg)
Low ROB (but still a ton of
aggregate value)
High ROB
Economics: Return on Byte
13 ©2013 Cloudera, Inc. All Rights Reserved.
![Page 14: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/14.jpg)
Cloudera Impala
14
BEFORE IMPALA
• With Impala: Interactive ANSI-92 SQL queries Native distributed query engine Optimized for low-latency
• Provides:
Answers as fast as you can ask Everyone can ask questions of all data Big data storage and analytics together
WITH IMPALA
• Unified storage: Supports HDFS and HBase Flexible file formats and schemas
• Unified Metastore • Unified Security • Unified Client Interfaces:
ODBC/JDBC SQL syntax Hue Beeswax Web UI
BATCH PROCESSING
USER INTERFACE
REAL-TIME ACCESS
©2013 Cloudera, Inc. All Rights Reserved.
![Page 15: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/15.jpg)
But What about the RDBMS?
15
“Use right tool for the right job”
Optimize existing EDW systems for high-performance operational analytics
MOVE TO CLOUDERA
• Historical Data
• Data Processing
• Ad Hoc Exploration
• Transformation/Batch
KEEP IN EDW
• Operational Analytics
• Reporting
• Multi-statement Transactions
©2013 Cloudera, Inc. All Rights Reserved.
![Page 16: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/16.jpg)
Legacy Information Architecture
16
Enterprise Applications
OLTP Systems
Networked Storage
ETL Grid
Data Warehouse
BI &
Rep
ort
ing
![Page 17: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/17.jpg)
New Information Architecture
17
Enterprise Applications
OLTP Systems
Networked Storage
ETL Grid
Data Warehouse
BI &
Rep
ort
ing
![Page 18: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/18.jpg)
The New Enterprise Big Data Stack
18 ©2013 Cloudera, Inc. All Rights Reserved.
![Page 19: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/19.jpg)
19
Maturity Path
Operational Efficiency Competitive Advantage
ETL Acceleration
EDW Optimization
Deep BI Exploration
Historical Compliance
Agility Of
Schema
Not Only SQL
Any Data Type
Consolidation Data Hub
Business IT
![Page 20: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/20.jpg)
Beyond Data Warehousing
20
COMMUNICATIONS Location- based advertising
HEALTH CARE Patient sensors, monitoring, EHRs Quality of care
LAW ENFORCEMENT & DEFENSE Threat analysis, Social media monitoring, Photo analysis
EDUCATION & RESEARCH Experiment sensor analysis
FINANCIAL SERVICES Risk & portfolio analysis New products
ON-LINE SERVICES / SOCIAL MEDIA People & career matching Website optimization
UTILITIES Smart Meter analysis for network capacity
CONSUMER PACKAGED GOODS Sentiment analysis of what’s hot, customer service
MEDIA / ENTERTAINMENT Viewers / advertising effectiveness
TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment
LIFE SCIENCES Clinical trials Genomics
RETAIL Consumer sentiment Optimized marketing
AUTOMOTIVE Auto sensors reporting location, problems
HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg quality Warranty analysis
OIL & GAS Drilling exploration sensor analysis
©2013 Cloudera, Inc. All Rights Reserved.
![Page 21: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/21.jpg)
Benefit 1: Flexibility • Store any data • Run any analysis • Keep’s pace with the rate of change of incoming data
Benefit 2: Scalability • Proven growth to PBS/1,000s of nodes • No need to rewrite queries, automatically scales • Keep’s pace with the rate of growth of incoming data Benefit 3: Economics • Cost per TB at a fraction of other options • Keep all of your data alive in an active archive • Powering the data beats algorithm movement
The Cloudera Platform for Big Data
21 ©2013 Cloudera, Inc. All Rights Reserved.
Key Use Cases: • Transformation Offload (aka ETL/ELT Offload) • Exploratory Archive (aka Active Archive)
![Page 22: Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera Hadoop in the... · Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com . Cloudera](https://reader030.vdocuments.us/reader030/viewer/2022041018/5ecc71722bb5662794203255/html5/thumbnails/22.jpg)
Dr. Amr Awadallah CTO/Founder @awadallah [email protected]