kafka y python

23
Define y gobierna tus APIs Kafka y python 17/05/2016

Upload: paradigma-digital

Post on 21-Jan-2017

90 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Kafka y python

Define y gobierna tus APIs

Kafka y python

17/05/2016

Page 2: Kafka y python

Python Madrid · Python y Kafka

Kafka y python

Page 3: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

¿Quién soy?

Ingeniero de Software @Paradigma Digital

@lvaroleonaleonsan

Page 4: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ Origen

2011

- Op

en so

urce

2012

- Ap

ache

Incu

bato

r

gra

duat

ion

2014

- Co

nflu

ent

($6,

9 M

)

Page 5: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ ¿Qué es?

If you think of Hadoop as long-term memory, the question then is how you get the memories in there to begin with

Apache Kafka is like the central nervous system, which collects all of these messages from the underlying systems and transmits them into the memory vault, or storage.

- Eric Vishria

Page 6: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ Motivation

To be able to act as a unified platform for handling all the real-time data feeds a large company might have.

Event Tracking

ApplicationLogs

ApplicationMessages

ApplicationMonitoringdata

Page 7: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ How to ?

● Distributed, the essence● Scalable● Efficient● Durable, fault tolerance

Page 8: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ Básicos

P PP

C C C C

Kafka Cluster

● Producers● Brokers● Consumers

Page 9: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ Cluster: Topics & Partitions

Kafka Cluster ● Topics● Partitions● Message

1 2 3 4 5

1 2 3 4

1 2 3

1 2 3

T1P0

T1P1

T2P0

T2P1

Page 10: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ Partitions & Replication

Kafka Cluster ● Replication factor○ Leader○ Followers

● ISR○ In-sync policies

1 2 3 4 5

Bro

ker1

Bro

ker2

1 2 3 4 5

PM

PM

Page 11: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ Producers

● Publish Messages● Choose partitions

○ policies

● Producer configuration○ ACKs○ Retries○ Batch size○ ...

Page 12: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ Consumers

● “Subscribe” to a feed● Consumer groups Kafka Cluster

Partition 0

Bro

ker1

Bro

ker2

Partition 1

○ Queue○ Publish-subscribe

CC

C

● Order guarantees

C

C

Page 13: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ Efficiency

● Small I/O problem○ Message sets

● Message set compression○ policies

● Standard binary message format○ Transfer without modifications

Page 14: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ Python Clients

● Kafka-python○ 0.8+, recomendada 0.9○ Python 3.6+○ Python 3.3+

https://github.com/dpkp/kafka-python

● Pykafka○ 0.8.2+○ Python 2.7+○ Python 3.4+

https://github.com/Parsely/pykafka

Page 15: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ Python Clients \ Kafka-python

● Producer class kafka.KafkaProducer: def __init__(self, **configs)

def send(self, topic, value=None, key=None, partition=None)● class RecordAccumulator:● class Partitioner:

def flush(self, timeout=None)

Page 16: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ Python Clients \ Kafka-python

● Consumer ○ message iterator

class kafka.KafkaConsumer(six.Iterator): def __init__(self, *topics, **configs) def __next__(self)

def subscribe(self, topics=(), pattern=None, listener=None) def unsubscribe(self) def assign(self, partitions) def seek(self, partition, offset) def commit(self, offsets=None)

Page 17: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ Python Clients \ Kafka-python

● Cluster○ client manages some cluster metadata

class kafka.ClusterMetadata: def __init__(self, **configs)

def available_partitions_for_topic(self, topic) def leader_for_partition(self, partition) def partitions_for_broker(self, broker_id)

def update_metadata(self, metadata)

● ConsumerCoordinator

Page 18: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ Python Clients \ pyKafka

● Producerclass pykafka.Producer: def __init__(self, . . . ) def produce(self, message, partition_key=None)

● Consumer class pykafka.SimpleConsumer: def __init__(self, . . .)

def consume(self, block=True)

Page 19: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ Python Clients \ Kafka-python

● Ejemplo

client = pykafka.KafkaClient(. . .) topic = client.topics[0] producer = topic.get_sync_producer()

. . .

consumer = topic.get_simple_consumer()for message in consumer:

Page 20: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ Python Clients \ Demo

Demo Time

Page 21: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka y Python \ Thanks

for your attentionThank you

Page 22: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka y Python \ Questions

¿ ?

Page 23: Kafka y python

Kafka y Python

Python Madrid · Python y Kafka

Kafka \ El Clúster