sdi/istc seminarlinkedin and became the lead of the apache samza team, which provides a scalable...
Post on 03-Jun-2020
3 Views
Preview:
TRANSCRIPT
SDI/ISTC Seminar
Yi PanLinkedIn
Yi Pan graduated from UCI with a Ph.D. in Computer Science in
2008. Since then, he has worked in distributed platforms for
Internet applications. He started at Yahoo! working on Yahoo!'s
NoSQL database project, leading the development of multiple
features, such as real-time notification of database
updates, secondary index, and live-migration from legacy
systems to NoSQL databases. Later, he led the development of
the Cloud Messaging System, which is used heavily as a
pub-sub service and transaction log for distributed databases at
Yahoo!. Since 2014, he joined LinkedIn and became the lead
of the Apache Samza team, which provides a scalable
stream processing service for the whole company.
Building a Lambda-less Stream Processing System using Local States and WindowingThis talk will provide an overview of LinkedIn's distributed stream processing platform, including Samza/Kafka/Databus. It will first cover the high level scenarios for stream processing in LinkedIn, followed by detailed requirements around scalability, re-processing, accuracy of results, and ease of programmability; then we will focus on the requirements of stateful stream processing applications and explain how Samza’s state management allows us to build applications that meet all the above requirements. The key concepts, architecture and usage in LinkedIn's stream processing pipeline will be explained, including state management in Samza, the use and configuration of Kafka and Databus as input/output and as a change log. We will also discuss in detail how we leverage the reliable, replayable messaging system (i.e. Kafka) together with durable state management in Samza to build a Lambda-less stream processing platform. The key mechanism to achieve a unified process model between batch and real-time stream is windowing. We will dive into the requirements and our solutions to windowing a real-time stream in this talk as well.
ThursdayApril 14, 2016
RMCIC 4th Floor Panther Hollow Room
12:00 - 1:00 pm
VISITOR HOSTS: Majd Sakr, Garth GibsonVISITOR COORD: Majd Sakr, msakr@cs.cmu.edu, 412-268-1161
For more information or questions:Karen Lindenfelser, 8-6716, karen@ece.cmu.edu
http://www.pdl.cmu.edu/SDI/
Partially funded by:
top related