building a company-wide data pipeline on apache kafka - engineering for 150 billion messages per day

27
Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day Yuto Kawamura LINE Corp

Upload: line-corporation

Post on 17-Mar-2018

21 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion

messages per day

Yuto KawamuraLINE Corp

Page 2: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

Speaker introduction

• Yuto Kawamura

• Senior software engineer of LINE server development

• Apache Kafka contributor

• Joined: Apr, 2015 (about 3 years)

Page 3: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

About LINE•Messaging service

•More than 200 million active users1 in countries with top market share like Japan, Taiwan and Thailand

•Many family services

•News

•Music

• LIVE (Video streaming)

1 As of June 2017. Sum of 4 countries: Japan, Taiwan, Thailand and Indonesia.

Page 4: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

Agenda

• Introducing LINE server

• Data pipeline w/ Apache Kafka

Page 5: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

LINE Server Engineering is about …

• Scalability

•Many users, many requests, many data

• Reliability

• LINE already is a communication infra in countries

Page 6: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

Scale metric: message delivery

LINE Server

25 billion /day (API call: 80

billion)

Page 7: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

Scale metric: Accumulated data (for analysis)

40PB

Page 8: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

Messaging System Architecture Overview

LINE Apps

LEGY JP

LEGY DE

LEGY SG

Thrift RPC/HTTP

talk-server

Distributed Data Store

Distributed async task processing

Page 9: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

LEGY• LINE Event Delivery Gateway

• API Gateway/Reverse Proxy

• Written in Erlang

• Deployed to many data centers all over the world

• Features focused on needs of implementing a messaging service

• Zero latency code hot swapping w/o closing client connections

• Durability thanks to Erlang process and message passing

• Single instance holds 100K ~ connection per instance => huge impact by single instance failure

Page 10: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

talk-server

• Java based web application server

• Implements most of messaging functionality + some other features

• Java8 + Spring + Thrift RPC + Tomcat8

Page 11: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

Datastore with Redis and HBase

• LINE’s hybrid datastore = Redis(in-memory DB, home-brew clustering) + HBase(persistent distributed key-value store)

• Cascading failure handling

• Async write in app

• Async write from background task processor

• Data correction batch

Primary/Backup

talk-server

Cache/Primary

Dual write

Page 12: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

Message DeliveryLEGY

LEGY

talk-server

Storage

1. Find nearest LEGY

2. sendMessage(“Bob”, “Hello!”)

3. Proxy request

4. Write to storage

talk-server

X. fetchOps()

6. Proxy request

7. Read message

8. Return fetchOps() with message

5. Notify message arrival

Alice

Bob

Page 13: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

There’re a lot of internal communication processing user’s request

talk-serverThreat

detection system

Timeline ServerData Analysis

Background Task

processing

Request

Page 14: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

Communication between internal systems

• Communication for querying, transactional updates:

• Query authentication/permission

• Synchronous updates

• Communication for data synchronization, update notification:

• Notify user’s relationship update

• Synchronize data update with another service

talk-server

Auth

Analytics

Another Service

HTTP/REST/RPC

Page 15: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

Apache Kafka

• A distributed streaming platform

• (narrow sense) A distributed persistent message queue which supports Pub-Sub model

• Built-in load distribution

• Built-in fail-over on both server(broker) and client

Page 16: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

How it works

Producer

Brokers

Consumer

Topic

TopicConsumer

Consumer

Producer

AuthEvent event = AuthEvent.newBuilder() .setUserId(123) .setEventType(AuthEventType.REGISTER) .build(); producer.send(new ProducerRecord(“events", userId, event));

consumer = new KafkaConsumer("group.id" -> "group-A"); consumer.subscribe("events"); consumer.poll(100)… // => Record(key=123, value=...)

Page 17: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

Consumer GroupA

Pub-Sub

Brokers

Consumer

Topic

Topic

Consumer

Consumer GroupB

Consumer

ConsumerRecords[A, B, C…]

Records[A, B, C…]

• Multiple consumer “groups” can independently consume a single topic

Page 18: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

Example: UserActivityEvent

Page 19: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

Scale metric: Events produced into Kafka

Service Service

Service

Service

Service

Service

150 billion msgs / day

(3 million msgs / sec)

Page 20: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

our Kafka needs to be high-performant

• Usages sensitive for delivery latency

• Broker’s latency impact throughput as well

• because Kafka topic is queue

Page 21: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

… wasn’t a built-in property

• KAFKA-4614 Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex

• // TODO fill-in

Page 22: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

Performance Engineering Kafka

• Application Level:

• Read and understand code

• Patch it to eliminate bottleneck

• JVM Level:

• JVM profiling

• GC log analysis

• JVM parameters tuning

• OS Level:

• Linux perf

• Delay Accounting

• SystemTap

Page 23: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

e.g, Investigating slow sendfile(2)

• Observe sendfile syscall’s duration

• => found that sendfile is blocking Kafka’s event-loop

• => patch Kafka to eliminate blocking sendfile

stap —e ' ... probe syscall.sendfile { d[tid()] = gettimeofday_us() } probe syscall.sendfile.return { if (d[tid()]) { st <<< gettimeofday_us() - d[tid()] delete d[tid()] } } probe end { print(@hist_log(st)) } '

Page 24: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

and we contribute it back

Page 25: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

More interested?

• Kafka Summit SF 2017

• One Day, One Data Hub, 100 Billion Messages: Kafka at LINE

• https://youtu.be/X1zwbmLYPZg

• Google “kafka summit line”

Page 26: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

Summary• Large scale + high reliability = difficult and exciting

Engineering!

• LINE’s architecture will be keep evolving with OSSs

• … and there are more challenges

• Multi-IDC deployment

• more and more performance and reliability improvements

Page 27: Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

End of presentation. Any questions?