ein rennen der anderen art: big-data plattformen im...
TRANSCRIPT
©2017 Avanade Inc. All Rights Reserved.©2017 Avanade Inc. All Rights Reserved.
Ein Rennen der anderen Art: Big-Data Plattformen im AutomobilbauThomas Pagel, Principal Technologist, Data & AnalyticsFranziska Weng, Consultant, Data & Analytics
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
• The Avanade Story
• How we do Analytics in the Cloud
• Proof of Concept Overview
• Analytical Processing
• Results/Lessons Learned
• Q&A
Agenda
The Avanade Story
Innovative digital services, business solutions and design-led experiences delivered through the power of people and the Microsoft ecosystem
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
4
Avanade by the numbers (FY16)
$ 2,4 BSales
20 % Average annual growth
1.200 + Client partners worldwide – typically mid to large-scale enterprises and government agencies
43 % Of Global 500 companies as clients
Created in 2000 as a joint venture between Accenture and Microsoft – today majority owned by Accenture. We help clients to maximize their performance and realize their vision. Our innovative solutions help to improve productivity and efficiency.
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
Avanade group in Germany, Austria and Switzerland
19 locations with more than
850 employees
Our professionals combine
technology, business and
industry expertise to build
and deploy solutions to
realize results for clients and
their customers.The Avanade group
includes:
• Avanade
• Infoman
• KCS.net
1 Nearshore development
center in Slovakia
Bratislava
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
fast efficient development and delivery of quality services
Avanade Global Delivery Network:
More than 29.000
professionals in
over 80 locations
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
Avanade’s innovative spirit, leadership and commitment to high ethical standards have resulted in a broad range of industry honors.
Recognized for great work –as well as being a great place to work
Microsoft Auszeichnungen:
Alliance SI Partner of the Year
2016, 2015, 2014, 2013, sowie 8 weitere seit 2001
Microsoft Customer Relationship Management (CRM)
Partner of the Year 2015
Dynamics Inner Circle and President‘s Club
2015, 2014, 2013, 2012, sowie 6 weitere seit 2005
Country Partner of the Year
2015 Italien, 2013 Deutschland und Spanien
Microsoft Dynamics Reseller of the Year
2015
Mobility Partner of the Year
2014, 2013, 2012
Microsoft Service Partner of the Year 2013
Industry Recognition Top Employer Recognition
How we do Analytics in the Cloud
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
9
Avanade‘s Modern Analytics Platform (AMAP) - an industrialized Managed Service for IoT/Analytic solutions
AMAP▪ Built on Microsoft
Azure PaaS components
▪ Azure Data Platform + Azure ML + Power BI
▪ Combines the shared service labor cost model with consumption based compute cost
▪ Provides connectivity for Industurial Devices and Business Integration
▪ Use Client‘s Microsoft EA for Azure
➢ Any Data from any Device or any Resource in any Format can be loaded in a secure way in Azure IoT/ Analytic Platform
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
10
Avanade Modern Analytics Platform Capabilities
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
11
Avanade’s sophisticated Analytic platform gets you there
Combine real-time and advanced analytics with existing systems & data
Bridge the skills gap on advanced techniques
Ready on Demand
Provide the business the data and analytics platform capabilities
Augment your Talent Pool
Accelerate Insights
Answer Business Questions
Manage, upkeep and running of existing
systems
OutcomesChallenges
Hire / Train People
Deliver Digital Platforms
Buy Hardware & Software
Set Strategy, Goals & Questions
Data Management
Support & Enablement
Self-Service & Discovery
...Meanwhile customer needs are changing
Innovate & Accelerate
Insights
The AMAP ApproachTraditional Approach
Staying out front
Pressure is on to realize
results
Innovate & Accelerate Insights
Quick and Comprehensive
AMAP Managed Services
Develop & Design Experiences
Big Data/IOT/’Streams
Advanced Analytics
Set Strategy, Goals & Questions
Avanade Modern Analytic Platform (AMAP) is a Platform and a Managed ServicePowering new solutions and offerings including Digital Marketing Analytics, IOT Analytics, Self-Service Analytics and custom client solutions
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
Proof of Concept Overview
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
The customer asked three service providers to demonstrate the capabilities of their teams and their preferred platform:
The PoC: Azure vs. AWS vs. Teradata
Avanade/Accenture on Microsoft Azure
Another Consulting Company on Amazon Web
Services
Teradata with their ASTER Appliance
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
Situation: A German car manufacturer has frequent issues with a specific engine in regards to failure and extension of the camshaft drive chain leading to engine failures and frequent workshop visits.
Introduction:Camshaft drive chain failure prediction 1/2
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
Goals: A German car manufacturer wants to identify potential camshaft issues early to do predictive maintenance
• Analytics goals: Predict extension of camshaft drive chain lengthening and potential causes (oil, soot deposition, climate condition, too long service intervals)
• Business goals: Predict warranty cases save costs/increase customer satisfaction through early replacement of the camshaft drive chain without impact on the drivers experience
• IT goals: Evaluate Azure Big Data Analytics stack as well as TCO
Introduction:Camshaft drive chain failure prediction 2/2
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
Is it possible to…
… select features relevant for analytics processes?
… predict camshaft drive chain failures?
… identify reasons for camshaft drive chain failures?
… make a statement about sensor-readings regarding starting chain lengthening?
Analytical Questions
Analytical Processing
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
Big Data Analytics Case Approach using the Azureplatform capabilities to realize analytics results fast
• Discuss analysis
pilot goals & scope
• Identify relevant
data sources
• Basic data
understanding
• Data understanding
• Design data
structure
• Data quality checks
• Build data structure
• Calculate models
and build reports
• Evaluate models
with test data
Data sourcing &
data management
Analytical briefing
workshop
Insight generation
and validation
OUTCOMES
• Use case briefing
done
• Analytical data
transferred
• Analytical results
ready
Set - up
• Define analysis
pilot goals & scope
• Setup
infrastructure
• Data transfer
• Presentation of
proof of concept
results and potential
next steps
• Recommendations
for the client
Presentation of
Analytical Solution
• Final use case
presentation and
next steps roadmap
• Analytical data
structure ready
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
Apache Spark is an open-source cluster-computing framework based on Hadoop technology. You can use it interactively from the Scala, Python and R shells.
import org.apache.hadoop.io._
val fn = "<filename>"
val sf2 = sc.sequenceFile[MapWritable,BytesWritable](fn)
val xml = sf2.map(row => org.apache.hadoop.io.Text.decode(row._2.getBytes()))
Additionally we needed to use Hive to clear line breaks from the text formatted XML data.
INSERT INTO TABLE <target table>
SELECT REGEXP_EXTRACT( REGEXP_REPLACE( REGEXP_REPLACE( <column name of the sequence file results table>, '\\n', ''), '\\r', ''), '^.*(<\\?xml.*)', 1)
FROM <sequence file result table>;
Proof of Concept: Sequence File to Text Conversion
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
A good way to convert terabytes of complex XML to table structured data is applying XSLT using an HDInsight Apache Spark Cluster and Scala.
import com.elsevier.spark_xml_utils.xslt.XSLTProcessor
val xmlKeyPair = sc.sequenceFile[String, String](“<filename of xml data>")
val stylesheet = sc.textFile(“<filename of xsl>").collect.head
val srctitles = xmlKeyPair.mapPartitions(recsIter => {
val proc = XSLTProcessor.getInstance(stylesheet)
recsIter.map(rec => proc.transform(rec._2))
})
Proof of Concept: XML Text to Table Structure Conversion
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
Azure SQL Data Warehouse is the technology you should choose, because:
+ You can stop it over night and pay less.
+ It is about as fast as an HDInsight Apache Spark Cluster.
+ You can use PolyBase to comfortably load data from Azure Blob Storage.
Cleaning and Preprocessing Tasks performed in Azure DWH:
1. Load data using PolyBase
2. Unique car filtering
3. Relevant sample selection
4. Target variable generation
Proof of Concept: Clean and Preprocess Data
21
Generate
target
variable
Relevant
sample
selection
Load from external
Table (PolyBase)
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
Visual Data Inspection using Power BI
Visual data inspection can be done using Power BI. Additionally, the Quick Insights feature can show you some interesting information. (https://powerbi.microsoft.com/en-us/documentation/powerbi-service-auto-insights/)
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
Root-Cause Analysis
• Feature data and Label from same point in time
• Which features‘ behavior is related to the camshaft chain failure?
Predictive Analysis
• Feature data from one week before Label data
• Which features‘ past behavior is related to the camshaft chain failure?
Analytical Processing using Azure Machine Learning
23
Target variable labeling
Format adjustment
Column selection
Sample splitting
Modeling
Pearson
Correlation
Analysis
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
Sample Results (most important Features/Starting Values)
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
Sample Results (Prediction/Root Cause Analysis)
Results/Lessons Learned
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
Guess who won?
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
It took the client 2 years to archive the same quality of results Avanade/Accenture provided after 2 weeks analytical work
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
Business Goals - Conclusion
Success Dimension The Azure platforms unique proposition
CostYou pay only for resources that you are actually using - cost are
transparent on a monthly bill.
Flexibility
The architecture is highly flexible – The client is able to decide on the
toolset and also can bring in own technologies as Virtual Machines (IaaS)
if necessary.
Security
Azure is using market leading and regulator compliant security to keep
client’s data safe in motion, in rest and on use. Additional security layers
and management & monitoring can be provided by Accenture.
Productivity
Instead of spending time and money on installation, operations and
management, the platform is handled as a service. The client’s resources
can spend their time on their core competency to drive value.
©2017 Avanade Inc. All Rights Reserved. <Restricted> See Avanade’s Data Management Policy
Key Lessons Learned
Around the project:
• Data provisioning can take quite a while
• Complex XML -> Complex transformation/analytics
• Include the business know-how
• Test your assumptions/hypothesis – No blind believe in “will never happen”
Around Azure:
• Azure enables agile projects – fail fast, learn fast
• Find the appropriate tools to tackle the challenge
• Azure often offers more than one tool for the same purpose
• Even on Hadoop Azure is very up-to-date and competitive
• Azure ML is powerful and easy but lacks some control (better with R)
• Azure SQL DWH was already great as a preview
• PaaS offerings are often a great alternative vs. IaaS based solutions
• Big Data is not necessarily equals Hadoop
Questions?
Just visit us at our booth!
Thank you ☺