a driver-rider matching application based on apache samza
TRANSCRIPT
A Driver-Rider Matching service based on Apache Samza
Pan Kong, Zhangfan Dong, and Kaiyue Sun
Design goals• Functionality
Drivers can update their locations to the server periodicallyRiders can sent ride request to find nearest driver
• PerformanceScalable, low latency
Problems• User locations are transitory data, no
need to persistently store in database• Database will become bottleneck
under high workload (Exclusive access, lock)
• Matching engine embedded within Web Server, will hog Server’s computing resources
Data Tier
User Profiles
Presentation Tier
Application Tier
UI on Browsers
Web ServerMatching Engine
Three-tier architecture
Stream processing framework
Stream Processing TierApache Samza
Presentation Tier
Application Tier
UI on Browsers
Web Server
Matching EngineUser Profiles
Desired properties• Samza is partitioned and distributed
at every level, and therefore is inherently scalable
• Samza guarantees low latency in big data processing compared tobatch processing system
Jakob Homan http://www.slideshare.net/blueboxtraveler/apache-samza
Architecture & Implementation
Matching result
HTTP Response
Client UI
SAMZA
Query Server
HTTP Serverkafka
REST Proxy
Input Topics: - Driver Location - Ride Request
Output Topics: - Match
Matching Engine
Matching result
Pull Kafka Msg
HTTP RequestDriver updatesRide requests
Push Kafka MsgDriver updatesRide requests
Push Kafka Msg
Matching result
Driver updatesRide requests
Pull Kafka Msg
User Interface
Driver location updating page Ride requesting page
Home page
Performance Testing
Throughput of driver_update_only workloadThroughput of ride_request_only workload Throughput of mixed workload
Latency of driver_update_only workload Latency of ride_request_only workload Latency of mixed workload
Samza: Implementation based on Apache SamzaMongoDB: Implementation based on three-tier architecture