for data-intensive services slos - usenix · for data-intensive services yoann fouquet booking.com...

29
SLOs for Data-Intensive Services Yoann Fouquet Booking.com

Upload: others

Post on 30-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

SLOsfor Data-Intensive Services

Yoann FouquetBooking.com

Page 2: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● Agenda

1 SLO Refresher

2 Our reservation system

3 SLO definition journey

4 Benefits

Page 3: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● SLIs, SLOs

Service LevelIndicatorquantitative measure

availability

Service LevelObjectiveSLI ≥ target

availability for 1 week over 99.99%

Page 4: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● Scale highlights

1,500,000+experiences bookedevery 24 hours

23years since launchfounded in 1996

50,000+physical serversacross 4 datacenters

Page 5: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● Reservation system

Search Service

ReservationService

CreationModification

... Search queries

Data nodesData nodesData nodes

Gateway

Stream Stream

Page 6: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● First SLOs

Search Service

ReservationService

AvailabilityLatency

Data nodesData nodesData nodes

Gateway

AvailabilityLatency

Res. success rate

Stream Stream

Page 7: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● Stakeholders reaction Reservation service

Page 8: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● Stakeholders reaction Search service

Page 9: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

Stream Stream

● Missing SLOs

Search Service

ReservationService

Data nodesData nodesData nodes

Gateway

Freshness?

Accuracy?

Consistency?

Durability?

Page 10: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● Consistency SLO

Search Service

ReservationService

Data nodesData nodesData nodes

Gateway

Probe

Get orders idSearch orders and compare

Page 11: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● Consistency SLO

99.99% of reservations are consistent among all data nodes

Page 12: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● Consistency SLO (2nd attempt)

Search Service

Data nodesData nodesData nodes

Gatewaycompare

Page 13: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

99.99% of search results are consistent

● Consistency SLO (2nd attempt)

Page 14: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● Freshness SLO

ReservationService

Data nodesData nodesData nodes

Gateway

Probe

Get recent ordersSearch orders

Page 15: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● Freshness SLO

99.9% of reservations are available within xx seconds

Page 16: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● Accuracy/Durability SLO

Page 17: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● Accuracy/Durability SLO

Page 18: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

Stream Stream

● Current data SLOs

Search Service

ReservationService

Data nodesData nodesData nodes

Gateway

Data freshnessData consistency

Page 19: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

Stream StreamReservationService

Hadoop MR Durability

Consumer

Probe

● Reservation SLOs

CompletenessLatency

Page 20: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits
Page 21: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● Availability / Latency SLOs

Page 22: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● Availability / Latency SLOs Buckets (manual)

Query 1Query 5

...

Query 8Query 2

...

Query 3Query 4Query 6Query 7

...

SLO latency: 50 msSLO availability

SLO latency: 100 msSLO availability

No objectives

Page 23: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● Availability / Latency SLOs Buckets (automated)

Score ≤ X X ≤ Score ≤ Y Score ≥ Y AND AND OR Timeout ≥ x Timeout ≥ y Low timeout

SLO latency: 50 msSLO availability

SLO latency: 100 msSLO availability

No objectives

Page 24: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits
Page 25: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

Was it worth it?

Page 26: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

Stream StreamReservationService

Search Service

● Auto. Mitigation

Gateway

Data nodesData nodesData nodes

Freshness Probe

Stop traffic

Page 27: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

Stream Stream

Search Service

ReservationService

Gateway

Hadoop MR DumpDaily snapshot push

Data nodesData nodesData nodes

Completeness Probe

Re-process

Fix

● Auto. Repair

Page 28: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

● Biggest gains

Awareness

Confidence

Page 29: for Data-Intensive Services SLOs - USENIX · for Data-Intensive Services Yoann Fouquet Booking.com Agenda 1 SLO Refresher 2 Our reservation system 3 SLO definition journey 4 Benefits

Thank you!

All references to “Booking.com", including any mention of “us”, “we” and “our” refer to Booking.com BV, the company behind Booking.com™

We’re Hiring

careers.booking.com