cwin17 frankfurt / cloudera
TRANSCRIPT
1© Cloudera, Inc. All rights reserved.
Connected Services
Stefan Lipp/Jochen FaltermeierCWIN 2017 - Frankfurt
2© Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved.
Cloudera at-a-glance
Customer successLarge enterprises fueling growth
48% 140%+customer growth net expansion
Last 4 years Global 8000 customers
Expansion driven by data and new
use cases
Open partner networkBest of breed solutions
3000+partners
Vast ecosystem of solution &
service providers
First to marketOpen source innovation
2008founded
1600+Clouderans
Global team doing business in 28 countries
Big data innovators from Google,
Yahoo and Oracle
3© Cloudera, Inc. All rights reserved.
The data-driven enterprise
Explosion of data and devices (IoT)
30Bconnected
devices
440x more data
Transformation of IT infrastructure
open source
cloud
machine learning
$200Btotal
market1
1 IDC Worldwide Big Data and Business Analytics Market Through 2020
4© Cloudera, Inc. All rights reserved.
We believe data can make what is impossible
today, possible tomorrow
5© Cloudera, Inc. All rights reserved.
We empower people to transform complex data into clear and actionable insights
DRIVE CUSTOMER INSIGHTS
CONNECTPRODUCTS & SERVICES (IoT)
PROTECTBUSINESS
6© Cloudera, Inc. All rights reserved.
We deliverthe modern platform for machine learning and analytics
optimized for the cloud
RUNS ANYWHERECloudMulti-cloudOn-premises
SCALABLEElasticCost-effectiveLower TCO
ENTERPRISE GRADESecurePerformantCompliant
7© Cloudera, Inc. All rights reserved.
DRIVE CUSTOMER INSIGHTS CONNECT PRODUCTS & SERVICES (IoT) PROTECT BUSINESS
Delivering greater value through improved customer understanding
Powering predictive analytics to increaseperformance and reduce fleet downtime
Creating new revenue streams with an advanced anti-fraud solution
Cloudera powering data-driven customers
8© Cloudera, Inc. All rights reserved.
IntroductionNavistar is a leading manufacturer of commercial trucks, buses, defense vehicles and engines. Since 1831, our history has been interwoven with some of the most defining moments in world history. Whether it was America's westward expansion or WWII, we were there, pushing the limits of what's possible and driving history forward. But that doesn't mean we're stuck in the past. We're determined to keep delivering smart, sustainable technologies - because we believe that innovation defines America's future, too.
9© Cloudera, Inc. All rights reserved.
The Data Challenge & Pre-Hadoop ChallengeIn late 2013, Navistar launched OnCommand™ Connection. OnCommand™ Connection is part of the OnCommand™ family of fleet Management Services from Navistar.
OnCommand™ Connection leverages data feeds from telematics service providers and marries it with Meteorological, Geographical, Engineering, Vehicle Usage, Traffic, Historical Warranty, Service and Part Inventory Data to provide:
Real-time vehicle performance data streamlined within a single portal.Service Advisory’s and Scheduling before problems occurOptimized service plans and part delivery to the nearest dealer when problems do occur
We now actively monitor more than 300,000 vehicles and are adding to that total daily
10© Cloudera, Inc. All rights reserved.
Using Predictive Maintenance to Improve Performance and Reduce Fleet Downtime
• OnCommand Connection is collecting telematics and geolocation data across the fleet
• Reduced maintenance costs to $.03 per mile from $.12-$.15 per mile
• Centralizing data from 13 systems with varying frequency and semantic definitions
• Real-time visibility of ca. 300,000 trucks in order to improve uptime and vehicle performance
MANUFACTURING» SERVICE IMPROVEMENT» PREDICTIVE ANALYTICS» PROCESS IMPROVEMENT
11© Cloudera, Inc. All rights reserved.
Benefits & Impact
Quantifying Hadoop’s impact:By having literally all of our data in one place, we can perform analytics on an ad-hoc basis. Historically, simple questions required months to answer as we built out subject areas and transformed data.Our “Publish” Cluster brings the data to the consumer and it is certified. We have reduced hard dollar spending on proprietary hardware and expensive disk solutions, but also soft dollars in our speed to deliver answers.We can evaluate “what if” scenarios without the risk of impacting production processes.We can evaluate billions of rows of data and deliver answers in hours not weeks.
12© Cloudera, Inc. All rights reserved.
Data/Software > Analytics > Automation > AI is eating the world
„the innovation foodchain“ Marc Andreessen
Navistar IR Deck – H1 2017
− Connected services to reducemaintenance cost and improvevehicle uptime− Advanced driver assistancesystems and platooning toimprove fuel efficiency
and safety− Automated record-keeping toenhance driver productivity
13© Cloudera, Inc. All rights reserved.
#1 Telematics provider with 130 billion miles of driving data collected from black boxes in connected cars
Challenge:• Drive analytics on 12 million miles of
driving data collected every hour
Solution:• Telematics solution based on Cloudera to
process data from black boxes• Analytics around driving behavior, risks,
location, braking patterns, contextual elements and crash information
• Provide Usage Based Insurance services
TELEMATICS» CONNECTED VEHICLES» INSURANCE TELEMATICS» PREDICTIVE ANALYTICS
Connected Car Telematics for Insurance
CASE STUDY
DATA-DRIVENPROCESS
IOT & Connected Products
15© Cloudera, Inc. All rights reserved.
The IoT Ecosystem & Architecture
IoT Gateway
Gateway• Edge-Processing• Edge-Analytics
IoT Data Storage, Processing & Analytics
Centralized IoT Analytics• Time Series Data, Trends• Machine Learning • Context Enrichment• Deeper business insights
Distributed Data Processing & Analytics• Cloud & On-PremiseConnected Things
• Analytics at the edge• For immediate response
Data Center
Cloud
IoT Analytics
Enterprise Data Sources
Combining sensor data with contextual data is the key to value creation from IoT
17© Cloudera, Inc. All rights reserved.
TheClouderaPlatformforIoT– DataMgmt.ValueChain
Data Sources Data Ingest Data Storage & Processing Serving, Analytics &Machine Learning
ENTERPRISEDATAHUB
Apache KafkaStream or batch ingestion of IoT data
Apache SqoopIngestion of data from relational sources
Apache HadoopStorage (HDFS) & deep batch processing
Apache KuduStorage & serving for fast changing data
Apache HBaseNoSQL data store for real time
applications
Apache ImpalaMPP SQL for fast analytics
Cloudera SearchReal time searchConnectedThings/Data
Sources
StructuredDataSources Security, Scalability & Easy Management
DeploymentFlexibility:
Datacenter Cloud
Apache SparkStream & iterative processing, ML
18© Cloudera, Inc. All rights reserved.
ClouderaforIoT– KeyInnovations/Differentiators
Ideal for real-time analytics on IoTand time series data. Simplifies Lambda architectures for running real-time analytics on streaming data
Preserve business flexibility and data portability and minimize cloud lock-inby running in any one of the three major public cloud providers or in private cloud
Kudu:Real-TimeAnalytics SharedDataExperienceSDX DataScienceWorkbenchCollaborative hub for enterprisedata science and an integrated development environment for running Python, R, & Scala with support for Spark
19© Cloudera, Inc. All rights reserved.
HDFS
FastScans,AnalyticsandProcessingof
StoredData
FastOn-LineUpdates&DataServing
ArbitraryStorage(ActiveArchive)
FastAnalytics(onfast-changingor
frequently-updateddata)
Kudu– FastAnalyticsonFastDataReal Time Use cases that fall between HDFS and HBase were difficult to manage
Unchanging
FastChangingFrequentUpdates
HBase
Append-Only
Real-Time
ComplexHybridArchitectures
AnalyticGap
PaceofAnalysis
PaceofD
ata
20© Cloudera, Inc. All rights reserved.
S3 | ADLS | HDFS | KUDU
Cloudera Enterprise
20CONFIDENTIAL— RESTRICTED
The modern platform for machine learning and analytics optimized for the cloud
EXTENSIBLE SERVICES
CORE SERVICES DATA
ENGINEERINGOPERATIONAL
DATABASEANALYTIC DATABASE
DATA CATALOG
INGEST & REPLICATIONSECURITY GOVERNANCE WORKLOAD
MANAGEMENT
DATA SCIENCE
SHARED DATAEXPERIENCE
SHARED STORAGE
21© Cloudera, Inc. All rights reserved.
• Unified security – protects sensitive data with consistent controls, even for transient and recurring workloads
• Consistent governance – enables secure self-service access to all relevant data and increases compliance
• Easy workload management – increases user productivity and boosts job predictability
• Flexible ingest and replication – aggregates a single copy of all data, provides disaster recovery, and eases migration
• Shared catalog – defines and preserves structure and business context of data for new applications and partner solutions
Open platform servicesBuilt for multi-function analytics | Optimized for cloud
SHARED DATA
EXPERIENCE
22© Cloudera, Inc. All rights reserved.
Shared: Data, Operations, Governance, Security, Metadata
Data Engineering Data Science Deployment
Data Wrangling
Visualization and Analysis
Model Training & Testing Batch Scoring
Online Scoring
ServingData GovernanceCuration
Processing
Acquisition
Reports, Dashboards
Dev: Collaboration, Version Control Ops: Deployment, Scheduling, Orchestration
Support the complete data science workflowFrom data to exploration to action
23© Cloudera, Inc. All rights reserved.
Accelerates data science from development to production with:
● Secure self-service data access● On-demand compute● Support for Python, R, and Scala● Project dependency isolation for
multiple library versions● Workflow automation, version
control, collaboration and sharing
Cloudera Data Science WorkbenchSelf-service data science for the enterprise
24© Cloudera, Inc. All rights reserved.
A modern data science architecture
CDH CDH
Cloudera Manager
gateway nodes CDH nodes
● Built on Docker and Kubernetes● Runs on dedicated gateway nodes● User sessions run in isolated “engine”
containers which:○ Host Kerberos-authenticated
Python/R/Scala runtimes○ Interact with Spark via YARN
client mode (Driver runs in container, workers on CDH)
● Single-cluster only (for now)
Hive, HDFS, ...
CDSW CDSW
...
Master
...
Engine
EngineEngine
EngineEngine
25© Cloudera, Inc. All rights reserved.
“Our data scientists want GPUs, but we can’t find a way to deliver multi-tenancy.If they go to the cloud on their own, it’s expensive and we lose governance.”
● Extend existing CDSW benefits to GPU-optimized deep learning tools
● Schedule & share GPU resources● Train on GPUs, deploy on CPUs● Works on-premises or cloud
Accelerated deep learning on-demand with GPUs
Data Science Workbench
GPUCPU
CDH
CPU
CDH
CPU
single-node training distributedtraining, scoring
Multi-tenant GPU support on-premises or cloud
26© Cloudera, Inc. All rights reserved.
Open Ecosystem Black Box
An open ecosystem for agility and innovation
27© Cloudera, Inc. All rights reserved.
Run anywhere. Deploy any way.
Simple Unified Enterprise
Proven at scaleTrusted security
Hybrid or multi cloudPlatform-as-a-Service
Simplifies operationsWorks with your tools
28© Cloudera, Inc. All rights reserved.
Realtime Analytics bzw. Operational Analytics?
my definition
„apply logic and mathematics real-time on data to improve operations“
Model Analyze Repeat
# Aggregate relational, NoSQL, structured & unstructured data# Accelerate data science from exploration to production using R, Python, Spark and more# Deploy pipelines and models on-premise or in the cloud.
Seeking Abnormal Behavior
# Serve real-time data at scale for real-time decision making# Stream processing & analytics on changing operational data
„