wednesday, october 6, 2010 · presentation outline 1. defining the platform bi: science for...
TRANSCRIPT
Wednesday, October 6, 2010
Evolving a New Analytical PlatformWhat Works and What’s Missing
Jeff HammerbacherChief Scientist, ClouderaOctober 10, 2010
Wednesday, October 6, 2010
My BackgroundThanks for Asking
▪ [email protected]▪ Studied Mathematics at Harvard▪ Worked as a Quant on Wall Street▪ Conceived, built, and led Data team at Facebook▪ Nearly 30 amazing engineers and data scientists▪ Several open source projects and research papers
▪ Founder of Cloudera▪ Chief Scientist▪ Also, check out the book “Beautiful Data”
Wednesday, October 6, 2010
Presentation Outline▪ 1. Defining the Platform▪ BI: Science for Profit▪ Need tools for whole research cycle▪ SQL Server 2008 R2: defining the platform
▪ 2. State of the Platform Ecosystem▪ 3. Foundations for a New Implementation▪ HDFS and MapReduce▪ Evolution of Hadoop
▪ 4. Future Developments▪ Questions and Discussion
Wednesday, October 6, 2010
1. Defining the Platform
Wednesday, October 6, 2010
BI is looking more like science (for profit)
Wednesday, October 6, 2010
Jim Gray: Science entering Fourth Paradigm“We have to do better at producing tools to
support the whole research cycle”
Wednesday, October 6, 2010
RDBMS only a small part of this tool set
Wednesday, October 6, 2010
Example: SQL Server 2008 R2
Wednesday, October 6, 2010
RDBMS: SQL Server
Wednesday, October 6, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Wednesday, October 6, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting Services
Wednesday, October 6, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Wednesday, October 6, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Search: Full-Text Search
Wednesday, October 6, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Search: Full-Text Search
CEP: StreamInsight
Wednesday, October 6, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Search: Full-Text Search
CEP: StreamInsight
OLAP: PowerPivot
Wednesday, October 6, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Search: Full-Text Search
CEP: StreamInsight
OLAP: PowerPivot
MDM: Master Data Services
Wednesday, October 6, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Search: Full-Text Search
CEP: StreamInsight
OLAP: PowerPivot
MDM: Master Data ServicesCollaboration: SharePoint
Wednesday, October 6, 2010
What do we call this unified suite?
Wednesday, October 6, 2010
For today: Analytical Data Platform
Wednesday, October 6, 2010
LAMP Stack for Analytical Data ManagementFor today: Analytical Data Platform
Wednesday, October 6, 2010
2. The State of the Platform Ecosystem
Wednesday, October 6, 2010
Who makes up the platform ecosystem?
Wednesday, October 6, 2010
Platform Providers
Wednesday, October 6, 2010
Platform ProvidersInfrastructure Providers
Wednesday, October 6, 2010
Platform ProvidersInfrastructure Providers
Application Developers
Wednesday, October 6, 2010
Platform ProvidersInfrastructure Providers
Application Developers
Content Providers
Wednesday, October 6, 2010
Platform ProvidersInfrastructure Providers
Application DevelopersEnd Users
Content Providers
Wednesday, October 6, 2010
What is new about the ecosystem today?
Wednesday, October 6, 2010
Content Providers1. > 95% of enterprise data is unstructured
2. Data volumes growing rapidly
Wednesday, October 6, 2010
Infrastructure Providers1. Cloud
2. Warehouse-Scale Computers
Wednesday, October 6, 2010
Platform Providers1. Open source
2. Driven by consumer web properties
Wednesday, October 6, 2010
Application Developers1. Data Scientists
2. Diversity of languages
Wednesday, October 6, 2010
End Users1. Browser is the client
2. Tell a story about the business
Wednesday, October 6, 2010
3. Foundations for a New Implementation
Wednesday, October 6, 2010
New foundations: HDFS and MapReduce
Wednesday, October 6, 2010
2005: Doug/Mike start project inside Nutch
Wednesday, October 6, 2010
2006: Doug joins Yahoo!
Wednesday, October 6, 2010
2007: Make Hadoop scale
Wednesday, October 6, 2010
2007: Make Hadoop scaleYahoo! makes Pig open source
Wednesday, October 6, 2010
2007: Make Hadoop scaleJim Gray’s “Fourth Paradigm” lecture
Yahoo! makes Pig open source
Wednesday, October 6, 2010
2007: Make Hadoop scaleJim Gray’s “Fourth Paradigm” lecture
Yahoo! makes Pig open source
Randy Bryant’s “DISC” lecture
Wednesday, October 6, 2010
2007: Make Hadoop scaleJim Gray’s “Fourth Paradigm” lecture
Yahoo! makes Pig open source
Randy Bryant’s “DISC” lecture
Powerset makes HBase open source
Wednesday, October 6, 2010
2008: Make Hadoop fast
Wednesday, October 6, 2010
2008: Make Hadoop fastYahoo! wins Daytona terabyte sort benchmark
Wednesday, October 6, 2010
2008: Make Hadoop fastFirst Hadoop Summit
Yahoo! wins Daytona terabyte sort benchmark
Wednesday, October 6, 2010
2008: Make Hadoop fastFirst Hadoop Summit
Yahoo! wins Daytona terabyte sort benchmarkYahoo! builds production webmap with Hadoop
Wednesday, October 6, 2010
2008: Make Hadoop fastFirst Hadoop Summit
Yahoo! wins Daytona terabyte sort benchmarkYahoo! builds production webmap with Hadoop
Facebook makes Hive open source
Wednesday, October 6, 2010
2008: Make Hadoop fastFirst Hadoop Summit
Yahoo! wins Daytona terabyte sort benchmarkYahoo! builds production webmap with Hadoop
Facebook makes Hive open source“MapReduce: A Major Step Backwards”
Wednesday, October 6, 2010
2009: Insert Hadoop into the enterprise
Wednesday, October 6, 2010
2009: Insert Hadoop into the enterpriseCloudera releases CDH
Wednesday, October 6, 2010
2009: Insert Hadoop into the enterpriseCloudera releases CDH
First Hadoop World NYC
Wednesday, October 6, 2010
2009: Insert Hadoop into the enterpriseCloudera releases CDH
First Hadoop World NYCYahoo! sorts a petabyte with Hadoop
Wednesday, October 6, 2010
2009: Insert Hadoop into the enterpriseCloudera releases CDH
First Hadoop World NYCYahoo! sorts a petabyte with Hadoop
Cloudera adds training, support, services
Wednesday, October 6, 2010
2009: Insert Hadoop into the enterpriseCloudera releases CDH
First Hadoop World NYCYahoo! sorts a petabyte with Hadoop
Cloudera adds training, support, services
“The Unreasonable Effectiveness of Data”
Wednesday, October 6, 2010
2010: Integrate Hadoop into the enterprise
Wednesday, October 6, 2010
2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights
Wednesday, October 6, 2010
2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights
Yahoo! completes enterprise-class security
Wednesday, October 6, 2010
2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights
Yahoo! completes enterprise-class security
Datameer and Karmasphere funded
Wednesday, October 6, 2010
2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights
Yahoo! completes enterprise-class security
Datameer and Karmasphere funded
Quest, Talend, Netezza, and more integrate
Wednesday, October 6, 2010
2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights
Yahoo! completes enterprise-class security
Datameer and Karmasphere funded
Quest, Talend, Netezza, and more integrateCloudera releases Cloudera Enterprise
Wednesday, October 6, 2010
Hadoop will be an Analytical Data Platform
Wednesday, October 6, 2010
Wednesday, October 6, 2010
4. Future Developments
Wednesday, October 6, 2010
Capture: Web and Intranet Documents
Wednesday, October 6, 2010
Curate: Unified Metadata
Wednesday, October 6, 2010
Curate: Workflow and Scheduling
Wednesday, October 6, 2010
Curate: Indexes and Materialized Views
Wednesday, October 6, 2010
Curate: Learn Structure from Data
Wednesday, October 6, 2010
Analyze: Mesos-enabled frameworks
Wednesday, October 6, 2010
Analyze: Link working set and historical data
Wednesday, October 6, 2010
Analyze: Iterative in-memory analysis
Wednesday, October 6, 2010
Analyze: Low-latency queries on Avro data
Wednesday, October 6, 2010
All behind a single user interface
Wednesday, October 6, 2010
HueMaking Many Computers Feel Like One
Wednesday, October 6, 2010
(c) 2010 Cloudera, Inc. or its licensors. "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0
Wednesday, October 6, 2010