fifth elephant 2017 data pipeline workshop

10

Click here to load reader

Upload: ketan-khairnar

Post on 22-Jan-2018

90 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: Fifth elephant 2017 Data Pipeline workshop

Unless you measure it; you can’t improve it

Data pipelines to track KPIs and KRAs for your business

Page 2: Fifth elephant 2017 Data Pipeline workshop

What are we improving?• Airbnb clone yourbnb [ architecture discussion – 15 mins]

• Initial Setup Verification – 10 mins

• Workshop phases• Basic Instrumentation

1. Add host metrics and visualization [15 mins]

2. App/Services – Instrumentation with meters, gauges, counters, histograms [15 mins]

3. Audit trails, deployment history – 10 mins

• Event Sourcing• Theory and approach – 10 mins• Introduce – events, measurements, metrics, logs – 5 mins discussion + 15 mins hands on

• Data pipeline• Architectural pattern and options – 10 mins • Changes in ingestion and publishing - 20 mins

• Dashboards – 45 minutes

Page 3: Fifth elephant 2017 Data Pipeline workshop

1. Monitor all the infrastructure

• Gather system performance cpu, i/o, network stats and sent out to common data store

• Visualize these stats

Tech Stack

• Metrics – App and System health library

• Compute - S3 and Lambda

• Visualization – grafana

• Storage – influx /druid [TBD]

Page 4: Fifth elephant 2017 Data Pipeline workshop

2. Monitor services

• Add metrics for each service e.g. for web api it can be requests per second for each API endpoint and response distribution ( 200, 503,401 etc)

• Avg response time

• Version information for each service and it’s update history

Tech Stack

• Metrics – App and System health library

• Compute - S3 and Lambda

• Visualization – grafana

• Storage – influx /druid [TBD]

Page 5: Fifth elephant 2017 Data Pipeline workshop

3. Audit Trails

• Change capture system

• Annotations / Markers

Page 6: Fifth elephant 2017 Data Pipeline workshop

4. Polyglot Persistence

• Host and Service telemetry in time series database

• Master data – document store /RDBMS

• App Logs – Elastic Search

Page 7: Fifth elephant 2017 Data Pipeline workshop

5. Data Pipeline

• Architectural paradigm

• Event Logs as system of record

• Open source options

• Implement

Page 8: Fifth elephant 2017 Data Pipeline workshop

6. Event Sourcing and CQRS

Page 9: Fifth elephant 2017 Data Pipeline workshop

7. A/B testing

Page 10: Fifth elephant 2017 Data Pipeline workshop

8. Dashboards for KPIs