propagate data changes without triggers!

31
Propagate Data Changes Without Triggers! Jonathan S. Katz NYC PostgreSQL User Group December 14, 2017

Upload: others

Post on 15-May-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Propagate Data Changes Without Triggers!

Propagate Data Changes Without Triggers!

Jonathan S. KatzNYC PostgreSQL User Group

December 14, 2017

Page 2: Propagate Data Changes Without Triggers!

About• CTO, VenueBook

• Co-Organizer, NYC PostgreSQL User Group (NYCPUG)

• Director, United States PostgreSQL Association

• Volunteer on postgresql.org

• Co-Founder, PGConf US

• @jkatz05

2

Page 3: Propagate Data Changes Without Triggers!

Imagine...

• We manage the space at Workbench!

• We have a set of operating hours where we can book events

• Only one event can be booked at a time

3

Page 4: Propagate Data Changes Without Triggers!

The Problem: Availability

12/14/2017 00:00 12/15/2017 00:00

4

Page 5: Propagate Data Changes Without Triggers!

The Problem: Availability

12/14/2017 00:00 12/15/2017 00:00

12/14/2017 00:00 12/15/2017 00:00

12/14/2017 00:00 12/15/2017 00:00

12/14/2017 00:00 12/15/2017 00:00

5

Page 6: Propagate Data Changes Without Triggers!

Easy, Right?

6

Page 7: Propagate Data Changes Without Triggers!

But...• Availability

• Just for one day - what about other days?

• What happens with data in the past?

• What happens with data in the future?

• Unavailability

• Ensure no double-bookings

• Overlapping Events?

• Just one space

7

Page 8: Propagate Data Changes Without Triggers!

🤔

8

Page 9: Propagate Data Changes Without Triggers!

Managing Availability

• Can create rules that can generate availability

9

availability_ruleid <serial> PRIMARY KEYroom_id <int> REFERENCES (room)days_of_week <int[]>start_time <time>end_time <time>generate_weeks_into_future <int> DEFAULT 52

roomid <serial> PRIMARY KEYname <int><text>

Page 10: Propagate Data Changes Without Triggers!

Managing Availability

• The rules can then determines what the availability is for a given date

10

availability_ruleid <serial> PRIMARY KEYroom_id <int> REFERENCES (room)days_of_week <int[]>start_time <time>end_time <time>generate_weeks_into_future <int> DEFAULT 52

roomid <serial> PRIMARY KEYname <int>

availabilityid <serial> PRIMARY KEYroom_id <int> REFERENCES (room)availability_rule_id <int> REFERENCES (availabilityrule)available_date <date>available_range <tstzrange>

<text>

Page 11: Propagate Data Changes Without Triggers!

Managing Availability

• We need to know when a room is being used

11

availability_ruleid <serial> PRIMARY KEYroom_id <int> REFERENCES (room)days_of_week <int[]>start_time <time>end_time <time>generate_weeks_into_future <int> DEFAULT 52

roomid <serial> PRIMARY KEYname <int>

availabilityid <serial> PRIMARY KEYroom_id <int> REFERENCES (room)availability_rule_id <int> REFERENCES (availabilityrule)available_date <date>available_range <tstzrange>

unavailabilityid <serial> PRIMARY KEYroom_id <int> REFERENCES (room)unavailable_date <date>unavailable_range <tstzrange>

<text>

Page 12: Propagate Data Changes Without Triggers!

Managing Availability

• And we can have a calendar set up for quick lookups

12

availability_ruleid <serial> PRIMARY KEYroom_id <int> REFERENCES (room)days_of_week <int[]>start_time <time>end_time <time>generate_weeks_into_future <int> DEFAULT 52

roomid <serial> PRIMARY KEYname <text>

availabilityid <serial> PRIMARY KEYroom_id <int> REFERENCES (room)availability_rule_id <int> REFERENCES (availabilityrule)available_date <date>available_range <tstzrange>

unavailabilityid <serial> PRIMARY KEYroom_id <int> REFERENCES (room)unavailable_date <date>unavailable_range <tstzrange>

calendarid <serial> PRIMARY KEYroom_id <int> REFERENCES (room)status <text> DOMAIN: {available, unavailable, closed}calendar_date <date>calendar_range <tstzrange>

Page 13: Propagate Data Changes Without Triggers!

Semi-out-of-scope but...

• GiST vs SP-GiST on the tstzrange types

13

Page 14: Propagate Data Changes Without Triggers!

Keeping the Calendar in Sync

• Triggers!

• Triggers can fire BEFORE and AFTER write operations [INSERT, UPDATE, DELETE]

• Triggers must successfully execute before transaction commits

14

Page 15: Propagate Data Changes Without Triggers!

Demo #1: The Setup

15

Page 16: Propagate Data Changes Without Triggers!

Demo #2: Basic Management

16

Page 17: Propagate Data Changes Without Triggers!

Demo #2 Lessons• [Test your live demos before running them, and you will have

much success!]

• availability_rule inserts took some time, ~500ms

• availability: INSERT 52

• calendar: INSERT 52 from nontrivial function

• Updates on individual availability / unavailability are not too painful

• Lookups are faaaaaaaast

17

Page 18: Propagate Data Changes Without Triggers!

Demo #3: Go Big or Go Home

18

Page 19: Propagate Data Changes Without Triggers!

Demo #3 Lessons

• Even with only 100 more rooms with a few set of rules, rule generation time increased 30%

• Lookups are still lightning fast!

19

Page 20: Propagate Data Changes Without Triggers!

Logical Decoding• Added in PostgreSQL 9.4

• Replays all logical changes made to the database

• Create a logical replication slot in your database

• Only one receiver can consume changes from one slot at a time

• Slot keeps track of last change that was read by a receiver

• If receiver disconnects, slot will ensure database holds changes until receiver reconnects

20

Page 21: Propagate Data Changes Without Triggers!

Logical Decoding Out of the Box

• A logical replication slot has a name and a decoder

• PostgreSQL comes with the "test" decoder

• Have to write a custom parser to read changes from test decoder

21

Page 22: Propagate Data Changes Without Triggers!

Decoder Examples

• wal2json: https://github.com/eulerto/wal2json

• jsoncdc: https://github.com/posix4e/jsoncdc

22

Page 23: Propagate Data Changes Without Triggers!

Driver Support

• C: libpq

• pg_recvlogical

• PostgreSQL functions

• Python: psycopg2 - version 2.7

• JDBC: version 42

23

Page 24: Propagate Data Changes Without Triggers!

Demo #4: Prerequisites• wal2json

• In postgresql.conf (requires restart):

• wal_level = logical

• max_wal_senders = 2

• max_replication_slots = 2

• In pg_hba.conf, use these DEVELOPMENT ONLY settings (requires reload):

• local replication jkatz trust

• In the databases streaming changes, run:

• SELECT * FROM pg_create_logical_replication_slot('calendar', 'wal2json');

• ONLY WORKS ON TABLES WITH PRIMARY KEYS

24

Page 25: Propagate Data Changes Without Triggers!

Demo #4: Watch the Changes Fly By

25

Page 26: Propagate Data Changes Without Triggers!

Demo #4 Lessons

• Every change in the database is streamed

• Need to be aware of the logical decoding format

26

Page 27: Propagate Data Changes Without Triggers!

Thoughts

• We know it takes time to regenerate calendar

• Want to ensure change propagates, but want to make sure user has great experience

27

Page 28: Propagate Data Changes Without Triggers!

Demo #5: Calendar Streaming

Changes

28

Page 29: Propagate Data Changes Without Triggers!

Demo #5 Lessons• Logical decoding allows the bulk inserts to occur

significantly faster from a transactional view

• DELETEs are tricky if you need to do anything other than using the PRIMARY KEY

• Based on implementation, changes applied serially

• Potential bottleneck for long running queries

• Use a distributed streaming tool like Kafka to perform follow-up queries

29

Page 30: Propagate Data Changes Without Triggers!

Conclusions• Triggers will keep your data in sync but can have significant performance overhead

• Utilizing a logical replication slot can eliminate trigger overhead and transfer the computational load elsewhere

• Not a panacea: still need to use good architectural patterns!

• We also inadvertently covered a lot of other PostgreSQL goodies!

• Range types

• Recursive queries

• generate_series

• LATERAL

30

Page 31: Propagate Data Changes Without Triggers!

Thank You! Questions?

31

@jkatz05