fifth elephant 2017 data pipeline workshop
TRANSCRIPT
Unless you measure it; you can’t improve it
Data pipelines to track KPIs and KRAs for your business
What are we improving?• Airbnb clone yourbnb [ architecture discussion – 15 mins]
• Initial Setup Verification – 10 mins
• Workshop phases• Basic Instrumentation
1. Add host metrics and visualization [15 mins]
2. App/Services – Instrumentation with meters, gauges, counters, histograms [15 mins]
3. Audit trails, deployment history – 10 mins
• Event Sourcing• Theory and approach – 10 mins• Introduce – events, measurements, metrics, logs – 5 mins discussion + 15 mins hands on
• Data pipeline• Architectural pattern and options – 10 mins • Changes in ingestion and publishing - 20 mins
• Dashboards – 45 minutes
1. Monitor all the infrastructure
• Gather system performance cpu, i/o, network stats and sent out to common data store
• Visualize these stats
Tech Stack
• Metrics – App and System health library
• Compute - S3 and Lambda
• Visualization – grafana
• Storage – influx /druid [TBD]
2. Monitor services
• Add metrics for each service e.g. for web api it can be requests per second for each API endpoint and response distribution ( 200, 503,401 etc)
• Avg response time
• Version information for each service and it’s update history
Tech Stack
• Metrics – App and System health library
• Compute - S3 and Lambda
• Visualization – grafana
• Storage – influx /druid [TBD]
3. Audit Trails
• Change capture system
• Annotations / Markers
4. Polyglot Persistence
• Host and Service telemetry in time series database
• Master data – document store /RDBMS
• App Logs – Elastic Search
5. Data Pipeline
• Architectural paradigm
• Event Logs as system of record
• Open source options
• Implement
6. Event Sourcing and CQRS
7. A/B testing
8. Dashboards for KPIs