kafka connect: real-time data integration at scale with apache kafka, ewen cheslack-postava

Post on 16-Apr-2017

3.796 Views

Category:

Engineering

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Kafka Connect: Real-time Data Integration at Scale with Apache Kafka

By Ewen Cheslack-Postava

Data Integrationgetting data to all the right places

IntroducingKafka ConnectLarge-scale streaming data import/export for Kafka

Offsets automatically committed and restored

On restart: task checks offsets & rewinds

At least once delivery – flush data, then commit

Exactly once for connectors that support it (e.g. HDFS)

Delivery Guarantees

Abstract serialization: 1 connector, many serialization formats

Convert between Kafka Connect Data API (Connectors) and serialized bytes (Kafka)

JSON and Avro are currently well supported

Converters

Confluent Open Source – HDFS, JDBC

Connector Hub: connectors.confluent.io

Examples: MySQL, MongoDB, Twitter, Solr, S3, MQTT, Bloomberg, Apache Ignite, and more

Connectors Today

Jenkins connector – Aravind Yarram (Equifax)

Twitter semantic analysis and visualization – Ashish Singh (Cloudera)

Brain monitoring device connector – Silicon Valley Data Science

DynamoDB, Cassandra, Slack, Splunk, and many more

Connectors from the Hackathon

Improved connector control via REST API, standardized configs, metrics

Single record transformations

Data pipelines in an app - embedded mode & Kafka Streams integration

Many more connectors

Coming soon…

THANK YOU@ewencp@confluentincTry it out: http://confluent.io/downloadMore like this, but in blog form: http://confluent.io/blog

top related