kafka connect: real-time data integration at scale with apache kafka, ewen cheslack-postava

27
Kafka Connect: Real- time Data Integration at Scale with Apache Kafka By Ewen Cheslack-Postava

Upload: confluent

Post on 16-Apr-2017

3.796 views

Category:

Engineering


4 download

TRANSCRIPT

Page 1: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava

Kafka Connect: Real-time Data Integration at Scale with Apache Kafka

By Ewen Cheslack-Postava

Page 2: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 3: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 4: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 5: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava

Data Integrationgetting data to all the right places

Page 6: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 7: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 8: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 9: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 10: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 11: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava

IntroducingKafka ConnectLarge-scale streaming data import/export for Kafka

Page 12: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 13: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 14: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 15: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 16: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 17: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 18: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 19: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 20: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 21: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava

Offsets automatically committed and restored

On restart: task checks offsets & rewinds

At least once delivery – flush data, then commit

Exactly once for connectors that support it (e.g. HDFS)

Delivery Guarantees

Page 22: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava

Abstract serialization: 1 connector, many serialization formats

Convert between Kafka Connect Data API (Connectors) and serialized bytes (Kafka)

JSON and Avro are currently well supported

Converters

Page 23: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava
Page 24: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava

Confluent Open Source – HDFS, JDBC

Connector Hub: connectors.confluent.io

Examples: MySQL, MongoDB, Twitter, Solr, S3, MQTT, Bloomberg, Apache Ignite, and more

Connectors Today

Page 25: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava

Jenkins connector – Aravind Yarram (Equifax)

Twitter semantic analysis and visualization – Ashish Singh (Cloudera)

Brain monitoring device connector – Silicon Valley Data Science

DynamoDB, Cassandra, Slack, Splunk, and many more

Connectors from the Hackathon

Page 26: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava

Improved connector control via REST API, standardized configs, metrics

Single record transformations

Data pipelines in an app - embedded mode & Kafka Streams integration

Many more connectors

Coming soon…

Page 27: Kafka Connect: Real-time Data Integration at Scale with Apache Kafka, Ewen Cheslack-Postava

THANK YOU@ewencp@confluentincTry it out: http://confluent.io/downloadMore like this, but in blog form: http://confluent.io/blog