Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

Download Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion messages per day

Post on 17-Mar-2018

11 views

Category:

Technology

0 download

TRANSCRIPT

  • Building a company-wide data pipeline on Apache Kafka - engineering for 150 billion

    messages per day

    Yuto Kawamura

    LINE Corp

  • Speaker introduction

    Yuto Kawamura

    Senior software engineer of LINE server development

    Apache Kafka contributor

    Joined: Apr, 2015 (about 3 years)

  • About LINEMessaging service

    More than 200 million active users1 in countries with top market

    share like Japan, Taiwan and Thailand

    Many family services

    News

    Music

    LIVE (Video streaming)

    1 As of June 2017. Sum of 4 countries: Japan, Taiwan, Thailand and Indonesia.

  • Agenda

    Introducing LINE server

    Data pipeline w/ Apache Kafka

  • LINE Server Engineering is about

    Scalability

    Many users, many requests, many data

    Reliability

    LINE already is a communication infra

    in countries

  • Scale metric: message delivery

    LINE Server

    25 billion /day (API call: 80

    billion)

  • Scale metric: Accumulated data (for analysis)

    40PB

  • Messaging System Architecture Overview

    LINE Apps

    LEGY JP

    LEGY DE

    LEGY SG

    Thrift RPC/HTTP

    talk-server

    Distributed Data Store

    Distributed async task processing

  • LEGY LINE Event Delivery Gateway

    API Gateway/Reverse Proxy

    Written in Erlang

    Deployed to many data centers all over the world

    Features focused on needs of implementing a messaging service

    Zero latency code hot swapping w/o closing client connections

    Durability thanks to Erlang process and message passing

    Single instance holds 100K ~ connection per instance => huge impact by single instance failure

  • talk-server

    Java based web application server

    Implements most of messaging functionality + some other features

    Java8 + Spring + Thrift RPC + Tomcat8

  • Datastore with Redis and HBase

    LINEs hybrid datastore = Redis(in-memory DB, home-brew clustering) + HBase(persistent distributed key-value store)

    Cascading failure handling

    Async write in app

    Async write from background task processor

    Data correction batch

    Primary/Backup

    talk-server

    Cache/Primary

    Dual write

  • Message DeliveryLEGY

    LEGY

    talk-server

    Storage

    1. Find nearest LEGY

    2. sendMessage(Bob, Hello!)

    3. Proxy request

    4. Write to storage

    talk-server

    X. fetchOps()

    6. Proxy request

    7. Read message

    8. Return fetchOps() with message

    5. Notify message arrival

    Alice

    Bob

  • Therere a lot of internal communication processing users request

    talk-serverThreat

    detection system

    Timeline ServerData Analysis

    Background Task

    processing

    Request

  • Communication between internal systems

    Communication for querying, transactional updates:

    Query authentication/permission

    Synchronous updates

    Communication for data synchronization, update notification:

    Notify users relationship update

    Synchronize data update with another service

    talk-server

    Auth

    Analytics

    Another Service

    HTTP/REST/RPC

  • Apache Kafka

    A distributed streaming platform

    (narrow sense) A distributed persistent message queue which supports Pub-Sub model

    Built-in load distribution

    Built-in fail-over on both server(broker) and client

  • How it works

    Producer

    Brokers

    Consumer

    Topic

    TopicConsumer

    Consumer

    Producer

    AuthEvent event = AuthEvent.newBuilder() .setUserId(123) .setEventType(AuthEventType.REGISTER) .build(); producer.send(new ProducerRecord(events", userId, event));

    consumer = new KafkaConsumer("group.id" -> "group-A"); consumer.subscribe("events"); consumer.poll(100) // => Record(key=123, value=...)

  • Consumer GroupA

    Pub-Sub

    Brokers

    Consumer

    Topic

    Topic

    Consumer

    Consumer GroupB

    Consumer

    ConsumerRecords[A, B, C]

    Records[A, B, C]

    Multiple consumer groups can independently consume a single topic

  • Example: UserActivityEvent

  • Scale metric: Events produced into Kafka

    Service Service

    Service

    Service

    Service

    Service

    150 billion msgs / day

    (3 million msgs / sec)

  • our Kafka needs to be high-performant

    Usages sensitive for delivery latency

    Brokers latency impact throughput as well

    because Kafka topic is queue

  • wasnt a built-in property

    KAFKA-4614 Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex

    // TODO fill-in

  • Performance Engineering Kafka

    Application Level:

    Read and understand code

    Patch it to eliminate bottleneck

    JVM Level:

    JVM profiling

    GC log analysis

    JVM parameters tuning

    OS Level:

    Linux perf

    Delay Accounting

    SystemTap

  • e.g, Investigating slow sendfile(2)

    Observe sendfile syscalls duration

    => found that sendfile is blocking Kafkas event-loop

    => patch Kafka to eliminate blocking sendfile

    stap e ' ... probe syscall.sendfile { d[tid()] = gettimeofday_us() } probe syscall.sendfile.return { if (d[tid()]) { st

  • and we contribute it back

  • More interested?

    Kafka Summit SF 2017

    One Day, One Data Hub, 100 Billion Messages: Kafka at LINE

    https://youtu.be/X1zwbmLYPZg

    Google kafka summit line

    https://youtu.be/X1zwbmLYPZghttps://youtu.be/X1zwbmLYPZg

  • Summary Large scale + high reliability = difficult and exciting

    Engineering!

    LINEs architecture will be keep evolving with OSSs

    and there are more challenges

    Multi-IDC deployment

    more and more performance and reliability improvements

  • End of presentation. Any questions?