postgresql real time streaming in · sourcing - change data capture is a form of derived event...

26
Real time streaming in PostgreSQL Kaushik Iyer

Upload: others

Post on 24-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Real time streaming in PostgreSQL Kaushik Iyer

Page 2: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Agenda- Initial approach- Streaming techniques- Updated pipeline- Observations

Page 3: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Effective replication of data

All the necessary data is in postgresql

JDBC Connector to effectively pull the data

Perform ETL and push to ElasticSearch

Page 4: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Drawbacks

- Snapshot mechanism involved a lot of trial and error.

- A pertinent lag of 5 minutes was present.- Inconsistency in user experience.

- Non computable during bulk loads.

- We were storing relational data as such in a Document store.

Page 5: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Event Sourcing

Page 6: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Components of a Event Sourcing pipeline

Page 7: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture
Page 8: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Change Data Capture

Page 9: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Postgres

Page 10: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Components of a CDC pipeline

Database Capture change Buffer

Log store

Page 11: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

CDC vs Event Sourcing

- Change Data Capture is a form

of derived event sourcing.

- We cannot add event

generators throughout the

platform.

- Change Data Capture is more

flexible.

Page 12: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Available CDC software

- LinkedIn DataBus

- Stitch data

- Qlik data

- Oracle GoldenGate

- Netflix Delta

Page 13: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Debezium

Page 14: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Key features

- Detailed message structure, with a plethora of metadata.

- Requires no change to the schema of tables.

- Robust snapshot mechanisms.

- Built in filters and masking options.

- Monitoring through JMX.

- Embedded variant also available.

Page 15: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Debezium sample Event

Page 16: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

The CDC Pipeline

Page 17: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

PostgreSQL

- Logical Replication

- Publisher

- Subscriber

- Log Sequence numbers

- Replication Slots

- Logical Decoding

- Wal2json

- Pgoutput

Publication

SLOT

Decoder

Sender

Page 18: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Transaction Log

- We are using Apache Kafka as our transaction log.

- Kafka Connect runs Debezium.

- Kafka REST to manage the lifecycle of connector.

Page 19: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Sample Debezium configuration

Page 20: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

PostgreSQL

Kafka

Kafka RESTKafka ConnectDebezium

Elasticsearch

Page 21: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Results and observations

Page 22: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Performance in bulk loads

Page 23: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Performance of logical decoders

Page 24: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Dependency on number of users

Page 25: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Improvement

- The lag is now in the order of ms- Effective decoupling.- High fault tolerance.- Setup details:

- M4.xlarge container - ~ $0.2/hour

Page 26: PostgreSQL Real time streaming in · Sourcing - Change Data Capture is a form of derived event sourcing. - We cannot add event generators throughout the platform. - Change Data Capture

Thank youEmail: [email protected]: KaushikIyer16Twitter: @kaushiiyerLinkedin: www.linkedin.com/in/kaushiiyer