disrupting big data with apache spark in the cloud
TRANSCRIPT
![Page 1: Disrupting Big Data with Apache Spark in the Cloud](https://reader035.vdocuments.us/reader035/viewer/2022062522/587c04c61a28ab7c668b74f5/html5/thumbnails/1.jpg)
Disrupting Big Data with Apache Spark in the Cloud
Ali GhodsiCo-Founder and CEO
![Page 2: Disrupting Big Data with Apache Spark in the Cloud](https://reader035.vdocuments.us/reader035/viewer/2022062522/587c04c61a28ab7c668b74f5/html5/thumbnails/2.jpg)
The Dawn of Advanced Analytics
2
WatsonSIRI/assistantsSelf-driving cars
Not just sci-fi, important applications for businesses
![Page 3: Disrupting Big Data with Apache Spark in the Cloud](https://reader035.vdocuments.us/reader035/viewer/2022062522/587c04c61a28ab7c668b74f5/html5/thumbnails/3.jpg)
Analytics Transforming Industries
3
Predictive analytics Anomaly Detection
Predict Product RevenueCustomer AssessmentTargeted Advertising
Fraud DetectionRisk Assessment
Equipment Failure
Data-Driven Real-time Analytics Applications
![Page 4: Disrupting Big Data with Apache Spark in the Cloud](https://reader035.vdocuments.us/reader035/viewer/2022062522/587c04c61a28ab7c668b74f5/html5/thumbnails/4.jpg)
Today’s Data Reality
4
HADOOP DATA LAKES DATA HUBS
CLOUD STORAGE
DATA WAREHOUSES
Siloed, Fast-Growing Size, Cost
![Page 5: Disrupting Big Data with Apache Spark in the Cloud](https://reader035.vdocuments.us/reader035/viewer/2022062522/587c04c61a28ab7c668b74f5/html5/thumbnails/5.jpg)
The Analytics Gap
5
IndustrialMediaPharma
HADOOP DATA LAKES DATA HUBS
CLOUD STORAGE
DATA WAREHOUSES
Siloed, Fast-Growing Size, Cost
Real-time Data-Driven Analytics Applications
![Page 6: Disrupting Big Data with Apache Spark in the Cloud](https://reader035.vdocuments.us/reader035/viewer/2022062522/587c04c61a28ab7c668b74f5/html5/thumbnails/6.jpg)
Why is there a gap?
6
Real-time Data-Driven Analytics Applications
Manage Data infrastructure
• Create, tune, monitor compute clusters.• Securely access silos of disparate data sources.• Enforce proper data governance.• 1
Empower teams to be productive
• Securely share big data clusters among analysts.• Interactively explore data and prototype ideas.• Debug, troubleshoot, version-control big data applications.• • • 2
Establish Production-Ready Applications
• Setup robust data pipelines for ETL/ELT.• Productionize real-time applications with HA, FT.• Build, serve, maintain advanced machine learning models.• 3
Siloed, Fast-Growing Size, Cost
![Page 7: Disrupting Big Data with Apache Spark in the Cloud](https://reader035.vdocuments.us/reader035/viewer/2022062522/587c04c61a28ab7c668b74f5/html5/thumbnails/7.jpg)
Databricks Cloud-Hosted Platform
7
• Separate compute & storage
• Integrate existing data stores
• Efficient cache on first access
Just-in-Time Data Platform
1
Agile
• Workflow scheduler for ML, streaming, SQL, ETL
• High availability, fault-tolerant, performance-optimized
Automated Apache Spark Management
3
Production-Ready
• Interactive notebooks, dashboards, reports
• Real-time exploration, machine learning, graph use cases
Integrated Workspace
2
Democratize Big Data
![Page 8: Disrupting Big Data with Apache Spark in the Cloud](https://reader035.vdocuments.us/reader035/viewer/2022062522/587c04c61a28ab7c668b74f5/html5/thumbnails/8.jpg)
HADOOP / DATA LAKES
DATA WAREHOUSESYOUR STORAGE
CLOUD STORAGE
8
Databricks Just-in-Time Data PlatformINTEGRATED WORKSPACE
DASHBOARDSReports
NOTEBOOKSgithub, viz, collaboration
BI TOOLS
JUST-IN-TIMEPROCESSING
POWERED BYAPACHE CLUSTERS: Auto-scaled, resilient, multi-tenant
DATA INTEGRATION: secure and fast data source integrations
INTERFACES: REST APIs & BI tools
DATABRICKS SERVICES
+
YOUR CUSTOM SPARK APPS
PRODUCTION JOBS
DATA LAKEDATA HUB
![Page 9: Disrupting Big Data with Apache Spark in the Cloud](https://reader035.vdocuments.us/reader035/viewer/2022062522/587c04c61a28ab7c668b74f5/html5/thumbnails/9.jpg)
The Challenge of Securing Analytics
9
End-to-end security a challenge for enterprises
Securing file management
Secure table management
Secure cluster management
Secure job workflows
Secure dashboards, report, notebook
management
Today there are piecemeal solutions, but no comprehensive solution
![Page 10: Disrupting Big Data with Apache Spark in the Cloud](https://reader035.vdocuments.us/reader035/viewer/2022062522/587c04c61a28ab7c668b74f5/html5/thumbnails/10.jpg)
Databricks Enterprise Security (DBES)
10
Holistic end-to-end security for Data Analytics
Tables Clusters Workflows Notebooks, Dashboards,
Reports
Files
• Role-based access control• Auditing and governance• Integrated identity-management• Encryption on-disk and on-the-wire
DBES provides
The First End-to-End Security Solution for Apache Spark
![Page 11: Disrupting Big Data with Apache Spark in the Cloud](https://reader035.vdocuments.us/reader035/viewer/2022062522/587c04c61a28ab7c668b74f5/html5/thumbnails/11.jpg)
Enterprise use-cases
11
Preventing credit card fraud
Predict energy demand based on massive weather data
Predict player churn, predicting network outages
Natural language processing to extract author graph
Generating tailored programs based on big data
![Page 12: Disrupting Big Data with Apache Spark in the Cloud](https://reader035.vdocuments.us/reader035/viewer/2022062522/587c04c61a28ab7c668b74f5/html5/thumbnails/12.jpg)
Thank you.
![Page 13: Disrupting Big Data with Apache Spark in the Cloud](https://reader035.vdocuments.us/reader035/viewer/2022062522/587c04c61a28ab7c668b74f5/html5/thumbnails/13.jpg)
Try Apache Spark with Databricks
13
http://databricks.com/tryTry latest version of Apache Spark and preview of Spark 2.0