Big Data Logging Pipeline with Apache Spark and Kafka

Download Big Data Logging Pipeline with Apache Spark and Kafka

Post on 16-Apr-2017

362 views

Category:

Data & Analytics

3 download

TRANSCRIPT

  • Shipping YaaS logs with Apache Spark and KafkaDogukan Sonmez

    Senior Software Engineer @hybris Software@dogukansonmez

  • Agenda

    Introduction to Yaas

    Architecture of Logging pipeline

    Technology behind logging pipeline

    Challenges

    Recap

    Q&A

  • What is YaaS

  • SAP hybris as a Service (YaaS)

    A micro-service based Business PaaS

    Integrated with hybris and SAP Solutions

    Build

    Publish

    Fast

  • yaas.io

  • Architecture of Logging pipeline

  • Architecture of Logging pipeline

  • Technology behind logging pipeline

    High Throughput messaging

    BrokerDistributed

    Scalable

    Fault Tolerant

    TopicPartition

    Replicated

    Offset

  • Technology behind logging pipeline

    Micro Batching RDD

    Streaming

    DAG

    Reliable

    ML

    Scalable

    Graph

    Fast

  • Big Data pipeline challenges

    Reliability of Kafka

    v 3 Brokers

    v 3 Zookeeper instances

    v default.replication.factor=2

    v Mainly with Default Configurations

    v 5 Brokers

    v 5 Zookeeper instances

    v unclean.leader.election.enable=false

    v min.insync.replicas=2

    v default.replication.factor=3

    BEFORE AFTER

  • Big Data pipeline challenges

    Spark Streaming Checkpointing

    v Spark checkpointing

    v All RDD serialized and stored at HDFS

    v Custom kafka checkpointing

    (Only latest offset stored at kafka)

    BEFORE AFTER

  • Big Data pipeline challenges

    Elasticsearch indexing big data

    v Default mapping

    v index.refresh_interval = 1s

    v Indices.memory_index_buffer_size= 10%

    v Custom mapping with disabled norms

    v Mapping using simple analyzer

    v index.refresh_interval = 30s

    v Indices.memory_index_buffer_size= 30%

    v spark.streaming.kafka.maxRatePerPartition=10000

    BEFORE AFTER

  • Recap

  • Recap

  • Q&A

  • https://hackingat.hybris.com