exponea - kafka and hadoop as components of architecture

Kafka and Hadoop as components of architecture - Martin Strycek

Upload: martinstrycek

Post on 16-Apr-2017

96 views

Category:

Technology

2 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Exponea - Kafka and Hadoop as components of architecture

Kafka and Hadoop as components of architecture - Martin Strycek

Kafka

Kafka is a distributed streaming platform.

Hadoop

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing

How Kafka and Hadoop got into Exponea?

Page 5: Exponea - Kafka and Hadoop as components of architecture

How Kafka and Hadoop got into Exponea?

Page 6: Exponea - Kafka and Hadoop as components of architecture

How Kafka and Hadoop got into Exponea

● We had our in memory database super fast,

but in memory

● Our customers were scared that they will have to pay a lot

● They want to have freedom to run analyses on all data

● We had some troubles processing data

Page 7: Exponea - Kafka and Hadoop as components of architecture

Kafka + MapR

Page 8: Exponea - Kafka and Hadoop as components of architecture

Kafka + MapR

● We were appending data to files that contain jsons

○ HDFS does not support append

● We started using Kafka 0.8.2.1

● We had no idea how to monitor the whole stack

Page 9: Exponea - Kafka and Hadoop as components of architecture

Where we are now

Page 10: Exponea - Kafka and Hadoop as components of architecture

How Kafka and Hadoop got into Exponea

● We are using Kafka to stream data to

● We have first Sparks jobs that are part of application

○ Recommendation

○ Predictions

○ Campaigns overview

○ Loading data to

Page 11: Exponea - Kafka and Hadoop as components of architecture

● We are using Oryx 2

○ But we need multitenancy

● We have MapR

○ But it ships with different Spark version that Oryx 2

● We are using Oryx 2

○ But it works with different version with Kafka

Recommendation

Page 12: Exponea - Kafka and Hadoop as components of architecture

Page 13: Exponea - Kafka and Hadoop as components of architecture

● How about we use

● How about we create better local storage for

● We need another cluster for testing

○ Bare metal? AWS? Google Cloud?

● library to be usable in

● We want to do a workshop for all of you that want to

try it out, but don’t have a place where.

What next?

Page 14: Exponea - Kafka and Hadoop as components of architecture

● Freedom & responsibility

● Big impact

● Team

● Proficiency

Exponea Culture

} Global ambitions, best company to work for

Page 15: Exponea - Kafka and Hadoop as components of architecture

Thanks!

Couchbase Live Europe 2015: Big Data Analytics with Couchbase including Hadoop, Kafka, Spark and More

Transportation Data Science at NREL · Transportation Data Science at NREL Adam Duran, Kenneth Kelly, Caleb Phillips ... • Hadoop • Kafka • Scalable Attached (Object) Storage

Federated SQL on Hadoop and Beyond: Leveraging MQTT • Kafka • Dynamic Router • Counters. Integration Stack Hadoop/HDFS Geode HAWQ SpringXD Ambari Zeppelin Apache HDFS Data Lake

Java One 2017: Open Source Big Data in the Cloud: Hadoop, M/R, Hive, Spark and Kafka

Six Things To Make Analytics Work - Exponea

Recommender systems in e-commerce. MLMU Kosice 2017, Exponea

Real Time Business Intelligence with Cassandra, Kafka and Hadoop - A Real Story... (Alexandra Klimova, Dominique Rond, Allianz Deutschland AG) | C* Summit 2016

Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at LinkedIn

Deep Dive of Kafka to HDFS/Hadoop Ingestion App Template

Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN

Franz Kafka - Letter to his · PDF fileFranz Kafka Pictures Home Page Kafka and Judaism The Holocaust photographs Galleries Franz Kafka Biography Franz Kafka-Wax Museum Kafka &

ArchitectureGuide · 2020-05-01 · Considerations Forpersistentstorage,usethevariant“Dsv2”VMsandpurchasePremiumStorage separately.” Hadoop Admin Compute/ Storage Kafka Broker

Big Data with NoSQL, Hadoop, Spark, and Kafka – Couchbase Connect 2016

Kafka and Hadoop at LinkedIn Meetup

Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core

Elastic...2020/04/25 · 16 4. Elastic 1.Hadoop Hive/Spark ES 2. API Proxy ES 3. Kafka-Offset

Real-Time Streaming: IMS to Apache Kafka and Hadoop - 2017€¦ · Hadoop - 2017 Scott Quillicy SQData. Virtual IMS User Group August 22nd 2017 Outline methods of streaming mainframe

Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations

Kafka & Hadoop - for NYC Kafka Meetup

PayPal: Creating a Central Data Backbone: Couchbase to Couchbase to Kafka to Hadoop and Back: Couchbase Connect 2015

Avro Tutorial - Records with Schema for Kafka and Hadoop

Cloudurablecloudurable.com/ppt/cloudurable-kafka-intro-with-simple-java-produc… · Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Kafka Fundamentals

Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka

Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at LinkedIn

Tomorrow’s Enterprise - Delivered Todaycfs22.simplicdn.net/ice9/docs/SL_Focus_Categories.pdf · Scala, Hadoop 2.7, Cassandra, Pig, Hive, Impala, Kafka, MongoDB, Storm Training in

MAURICE BLANCHOT, de Kafka a Kafka

Real-Time Streaming: IMS to Apache Kafka and … IMS User Group August 22nd 2017 Real-Time Streaming: IMS to Apache Kafka and Hadoop - 2017 Scott Quillicy SQData

How Apache Kafka is transforming Hadoop, Spark and Storm

ARM yourself with Exponea - 7 tips for growth

Kelly Technologies · Kafka Introduction to kafka kafka Architecture Zookeeper quorum and Brokers Creating Topics , producers and consumers Kafka API Flume and Kafka PIG HBASE Introduction

Kafka blr-meetup-presentation - Kafka internals

Enterprise Kafka: Kafka as a Service

Cassandra and Kafka Support on AWS/EC2cloudurable.com/ppt/kafka-tutorial-cloudruable-v2.pdf · Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Cassandra and

Kafka Streams: Hands-on Session - ce.uniroma2.it · Kafka Streams Kafka Streams: • Kafka Streams is a client library for processing and analyzing data stored in Kafka • Supports

Working with Kafka Advanced Consumers - Cloudurable · 2020-02-12 · Kafka / Cassandra Support in EC2/AWS. Kafka Training, Kafka Consulting, Kafka Tutorial Objectives Advanced Kafka