insight recent demo
TRANSCRIPT
- 1. Crowd DetectorCrowd Detector Reza Asad Insight Data Engineering June 2015
- 2. Motivation Avoid waiting time in crowded areas.
- 3. Data Lets imagine we had data about people's location. This could be collected form people's cell phones. How can we use such data?
- 4. Naive Approach
- 5. Demo
- 6. Data But such data is not available to me ... Solution : Engineer the data! Take data from yelp Perform a random walk
- 7. Pipeline Data
- 8. Engineering Challenges Choosing K?
- 9. Engineering Challenges The area of SF: 46.87 mi For the purpose of this project each cluster is 0.09 mi This means k is roughly 500
- 10. Engineering Challenges Parameters to tune: Time it takes to produce the messages Processing time for k-means in Spark Streaming The update interval for a fixed data point in the database
- 11. Goal Tune the parameters in order to have a stable system The total delay after processing each batch must be constant and comparable to the batch interval. You can check this in the Spark API
- 12. Tackling Challenges Having multiple producers and consumers Kafka is fast with sending messages and is not the bottleneck Establishing some safe limits: Using spark.streaming.receiver.maxRate to control the input rate Understanding the complexity of the process in Spark Streaming Choosing the right batch interval
- 13. Raw Data
- 14. Data Process Data filteration in spark streaming
- 15. Data Process
- 16. About Me Long time ago - B.S in pure math, University of Toronto More recent - M.S in applied math, University of British Columbia The exciting now - A data engineer who wants to go camping with other data engineers