techtalk #15 grokking: the data processing journey at ahamove
Post on 21-Jan-2017
194 Views
Preview:
TRANSCRIPT
The data processing journey at
Presented by Thuc Nguyen - Lead Operation Engineer
On-demand logistics service
Official launch on August 10th 2015
Our engineering team (in launch day)
1) Metrics: fulfillment rate, supplier&user growth, intraday dashboard, supplier performance
2) Reporting: multiple teams such as Business Development, Finance and Accounting,
Partners, Driver Relationship Management...
Data problem 1: Metrics & Reporting
Solution on early days
mostly on front-end to visualize data on web pages
data export (on general data collections like orders, users, transactions, etc...)
incremental calculated cache (in-memory and physical) when data load get bigger
specific parameterized metric pages
Scale-up stage
Exposure of:
customized data requests
data sources
technical and resource limitations
=> A solution should:
allows our staff to query and get their desire data by themselves.
be simple for non-tech persons.
be easy to maintain.
Scale-up stage (cont.)
We chose MetaBase - an open source BI tool
- 2 query modes: builder and native query
- visualize data
- saved query and dashboard
- rich apis and utilities
- alternatives: SaaS (chartio, tableau),
OSS (slamdata)
However, MetaBase is not quite good with NoSql, especially for native query (such as
relationship matching and transformative functions supporting)
Scale-up stage (cont.)
Our mongodb status:
- Mongodb 3.2 with WiredTiger storage engine and replication in replica set mode
Pros Cons
- Flexible data schema
- Strong query language with geo support
- Strong indexing (sparse/partial, expire)
- Oplog tailing
- Ineffective relationship query
- Poor utilities function support
- Unfriendly to sql geeks
Scale-up stage (cont.)
MongoDB Mosql Postgresql Metabase
docker image
11.01.XX
docker image
a replication tool
sync via mongodb oplog
We designed a data pipeline to transform mongodb data into postgresql data:
Result: fulfill all reporting requirements, automate 80% reporting works, resolve the bottleneck in data pipeline.
Results
Our productivity is booming on data-related works: 6 staff can write queries now (none of them knows sql before), every staff can access their defined metrics.
We reduced the client report preparation from 30mins to 5mins via Google Data Studio with state-of-art beauty (on the right)
We have 2-way integration with Google Sheets, so that other teams like Fund Accounting can easily get the data they want via IMPORTDATA functions.
Geospatial analytics help us answer some common questions such as:
- Which areas have low, high demands in a specific time frame?
- Which areas have low, high supplies at a given time?
- Can we ask our drivers to move from low demanding areas to high demanding ones?
- How we present such kind of data: administrative areas (districts, wards), heatmap,
hexagon, pin map?
Data problem 2: Geospatial Analytics
Geospatial analytics stack
MongoDB CartoDb CartoJs Leaflet
interactive base-map
11.01.XX
visualization layer
a geo-database
The main technologies we are using are Carto (SaaS) and Leaflet (front-end library).
- We’re making use of open source softwares as collective
intelligence to minimize our maintenance effort.
- We’re still exploring new ways to process and present data, and
we think chat app is such a potential channel for this.
- Our team motto: ‘Keep things simple’
Summary
THANK YOU FOR YOUR LISTENINGAHAMOVEYOUR PRIVATE MOVER
top related