audi's journey to an enterprise big data platform

28
Strata Data 2018 - London Audi's journey to an enterprise big data platform Matthias Graunitz (AUDI AG, Germany) Carsten Herbe (Audi Business Innovation GmbH, Germany)

Upload: others

Post on 22-Apr-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Audi's journey to an enterprise big data platform

Strata Data 2018 - London

Audi's journey to an enterprise big data platform

Matthias Graunitz (AUDI AG, Germany)Carsten Herbe (Audi Business Innovation GmbH, Germany)

Page 2: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform2

WHO ARE WE?

Page 3: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform3

Audi GroupAudi, Lamborghini, Ducati and Italdesign

Page 4: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform4

Vorsprung is our promiseStrategy 2025

Page 5: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform5

Audi Business Innovation GmbH

...is the development, establishment, sales and operation of innovative concepts, products and services, as well the holding

of shares in the field of future mobility.

Audi mobilityinnovations

Audi on demand

Audi balancedtechnologies

Audi e-gas

Audi customerIT solutions

Page 6: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform6

About us

Matthias GraunitzAUDI AG

» Center of Competence Big Data & BI

» Big Data Architect

» 10+ years Data Warehousing & BI

Carsten HerbeAudi Business Innovation GmbH

» Data Platform & Solution Architecture

» Hadoop since 2013

» 10+ years Data Warehousing & BI

Page 7: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform7

2 YEARS AGO…

STARTING BIG DATA AT AUDI

Page 8: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform8

Analytical Capabilities by 2015

! Data Domains

Finance

Purchase

Production

Quality

Sales

Car Data

Programs Projects Data Scientists

Embed Analytics

Analyze Data

Store, Distribute and Process Data

Deliver InformationSecureData

Infrastruc-ture &

ServicesProvision Data

Deliver Service

Manage Infor-

mation

Design & MaintainSolutions

Authentifi-cation

Data Encryption

Auditing

ComplexEvent

Processing

AnalyitcalAPIs

Dash-boarding

Planning & Simulation

Visual Analytics

BI Report & OLAP

Statistical Methods

Analytical Script

Data Warehouse

Analytical Databases

ETL Framework

Batch Processing

Data Access / APIs

On-Prem Platform

ApplicationDeployment

Hardware, Network, OS

Monitoring

LifecycleMgmt

Development Process & Methods

Master Data Mgmt

Data Lineage

AAP – AUDI ANALYTIC PLATTFORM

Page 9: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform9

Analytical Capabilities by 2015

! Data Domains

Finance

Purchase

Production

Quality

Sales

Car Data

Programs Projects Data Scientists

Embed Analytics

Analyze Data

Store, Distribute and Process Data

Deliver InformationSecureData

Infrastruc-ture &

ServicesProvision Data

Deliver Service

Manage Infor-

mation

Design & MaintainSolutions

Authentifi-cation

Data Encryption

Auditing

ComplexEvent

Processing

AnalyitcalAPIs

Dash-boarding

Planning & Simulation

Visual Analytics

BI Report & OLAP

Statistical Methods

Analytical Script

Data Warehouse

Analytical Databases

ETL Framework

Batch Processing

Data Access / APIs

On-Prem Platform

ApplicationDeployment

Hardware, Network, OS

Monitoring

LifecycleMgmt

Development Process & Methods

Master Data Mgmt

Data Lineage

AAP – AUDI ANALYTIC PLATTFORM

Page 10: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform10

Analytical Capabilities by 2015

! Data Domains

Finance

Purchase

Production

Quality

Sales

Car Data

Programs Projects Data Scientists

Embed Analytics

Analyze Data

Store, Distribute and Process Data

Deliver InformationSecureData

Infrastruc-ture &

ServicesProvision Data

Deliver Service

Manage Infor-

mation

Design & MaintainSolutions

Authentifi-cation

Data Encryption

Auditing

ComplexEvent

Processing

AnalyitcalAPIs

Dash-boarding

Planning & Simulation

Visual Analytics

BI Report & OLAP

Statistical Methods

Analytical Script

Data Warehouse

Analytical Databases

ETL Framework

Batch Processing

Data Access / APIs

On-Prem Platform

Cloud Platform

ApplicationDeployment

Hardware, Network, OS

Monitoring

LifecycleMgmt

Development Process & Methods

Master Data Mgmt

Data Lineage

AAP – AUDI ANALYTIC PLATTFORM

File Systems (HDFS)

Stream Processing

MachineLearning

Page 11: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform11

Our first Hadoop Cluster 2015

Hadoop per node Sum

# data nodes 1 4

RAM 128 GB 0,5 TB

Cores 24 96

HDD* 40 TB 160 TB

DEV

* Raw Capacity without replication and FS overhead!

Page 12: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform12

Our first attempt to walkwith Big Data Technologies

SCREWDRIVER ANALYSIS

COMPANY CAR ANALYSIS

Page 13: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform13

ENTERPRISE INTEGRATION VS SPEED OF DELIVERY

Page 14: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform14

Securing the Cluster as multi-tenant environment Step by step by step towards our target architecture …

Access Control:ACLs

User ManagementLocal OS users

Basic Security: iptables + ssh tunneling

Authentication:LDAP for Hive

Protection from outside:Knox

Protection from insideKerberos

Access ControlFile Attributes

Dedicated network: BI Zone

Access Control & AuditRanger

User ManagementLDAP

Page 15: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform15

Legend: password required no password required next step

Password Hell

HiveWebHDFSSparkUI

HDFS/YARNKnox

Audi Active Directory:[ AD User ]

Named UserTechnical Hive User

DATA NODE 1 - X

NAME NODE 1 - 2

EDGE NODE 1 - 2OS Level

[ Local User ]OS Named User

Technical Hive UserTechnical Project User

Hadoop User

SSH 2 EdgeNode

kinit

Hadoop KDC:[ Kerberos Principal ]

Name UserTechnical Hive User

Technical Project UserHadoop User

Page 16: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform16

DATAINGESTION

Page 17: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform17

Data ingestion: technical requirements from projects, security and ops

» Streaming data

» Batch data

» easy writing to HDFS/DWHINGESTION

» Data Sources should not directly be coupled to analytical backend jobs

» This allows adding new analytical jobs without changing the sourceDECOUPLING

» Data ingestion must be available 24x7

» Data must be buffered (persisted) in case backend or backend job is not availableHA & BUFFERING

» Source systems must not connect directly to the data zone (Hadoop, DWH) – by IT Sec» Authentication + Data in motion encryption (multi tenancy)» Protocol must be auditable» Some data sources run in the cloudSECURITY

» Amount of data will increase over time for most projects

» Number of projects will increaseSCALABILITY

Page 18: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform18

Solution: Kerberized Confluent Kafka Platform

FWBI

FWMSG

FWMSG

FWSRC#1

BI Data ZoneData Source network #1 AAP Messaging Zone

authenticationLegend: encrypted (SSL) not encrypted protocol / direction

Data Source network #n FWSRC#n

firewall pain point

Schema RegistryHTTP HTTP

none noneKafka Client

Kerberos

BIN / push

Kafka Client

Kerberos

BIN / push

HDFS ConnectorBIN / pull

Hadoop KDC

Kerberos

HD

FS

Kerberos

Spark StreamingBIN / pull

Kerberos

DataProxy KDC

Kafka Broker

Kerberos Kerberos

BIN BIN

Zookeeper

Kerberos Kerberos

Page 19: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform19

Edge Node

Kafka Distributed Connector: unsecured REST API

User Bob

Connector Java Process

Bob‘s Kafka keytab

Bob‘s HDFSkeytab

HDFS Sink Bob

HTTP

Bob’sdata

sinkconfig

Bob

topic Bob

User Eve

sinkconfig

Eve

File Sink Eve

Bob’sdata

HDFS Source

Eve

sourceconfig

Eve

Legend: evil connection good connection

Page 20: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform20

TODAYCURRENT STATE

Page 21: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform21

Architecture & Network Zones – Data Ingestion

Data Proxy

BI Data Zone

Messaging Zone

Data Warehouse

System A

System A

HDFSConnector

SparkStreaming

Cloud App

System

System

Legend: encrypted (SSL) not encrypted

S3 Backup

Page 22: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform22

Architecture & Network Zones – User & Developer Access

PIPE

BI Data Zone

Deployment Zone

BI Application Zone

AAP Data Warehouse

Audi Office LAN

Audi Laptop

Data Mining

Dashboarding

AAP Remote Desktop

Legend: encrypted (SSL) not encrypted

Page 23: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform23

Hadoop Cluster Sizing Production 2017

* Raw Capacity without replication and FS overhead!

Hadoop per node Sum

# data nodes 1 12

RAM 512 GB 6 TB

Cores 24 288

HDD* 96 TB 9.216 TB

PR

OD

Kafka per node Sum

# broker nodes 1 4

RAM 32 GB 128 GB

Cores 6 24

HDD* 4 TB 16 TBPR

OD

Page 24: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform24

Current state

Organisational Tasks

Page 25: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform25

Organisational Tasks

Data Ownership & Data Governance(Data Domain Modell with clear responsibility in each domain)

Lifecycle Management for each Shared Service in strong collaboration with the projects and programs

Defined SLAs for each Shared Service based on general availability, data loss, confidentiality and verifiability

Different Development Lifecycle between car and backend systems

Use of Open Source Software and Support requirements from IT continuity

Balance between multi tenant environment and flexibility

Very long lifecycle of cars > 10 years with various built in software versions

Page 26: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform26

TOMORROWWHAT’S UP NEXT

Page 27: Audi's journey to an enterprise big data platform

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform27

Hybrid Approach for the AAP

Public Cloud

On Premise / private Cloud

Entry Zone Application Zone Data Zone

Web GATEWAY

Full Client (Tableau, BO, etc.)

Web Client (Tableau, BO, etc.)

HDP

Data Warehouse

Messaging Zone

Kafka

Internet

RDP GATEWAY

Business User

Ingestor 1*

Repositories

Kn

ox

Direct Cloud Connect

Swarm VPC

KafkaData Inventory

Analytical VPC

Ingestor

HDP

Knox

Page 28: Audi's journey to an enterprise big data platform

WE ARE HIRINGhttps://www.audi.com/corporate/de/karriere/einstieg-bei-audi.html

https://karriere.audibusinessinnovation.com/