scylla summit 2017: from elasticsearch to scylla at zenly
TRANSCRIPT
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
From Elasticsearch To Scyllaat Zenly
10 Month in production
Head of infrastructure, Zenly - https://zen.ly
Jean-Baptiste Dalido
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Jean-Baptiste Dalido
2
● Zenly’s Head of infrastructure, now a Snap Inc.
Company
● Experience with Mobile Startups, previously
Appgratis and Batch.com
● Worked with Cassandra in production for quite
some time
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
FROM ELASTICSEARCH
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
X Use cases for X Databases
5
▪ Elasticsearch, as a main database
▪ Redis, for cache
▪ Cassandra, for gigantic datasets (chat messages)
▪ Postgres, for results of heavy computations
▪ Name it, it was in production ...
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
ElasticSearch as a main database ?
6
▪ Easy data-modelling
▪ Search as a query language ooffsets data-model changes and deficiencies
o comes at a cost, search is not cheap
▪ KV Semantics
▪ Sharding/Replication
▪ Document versioning
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Why leaving ?
7
▪ Data-model became more and more stable
▪ Particular Workload, Heavy updates
▪ Unification of databaseso Less operational cost (onboarding, maintenance…)
o Simpler monitoring
▪ No real need for ElasticSearch features anymore
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
WAIT, WHY NOT STICK WITH
CASSANDRA ?
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Total time for which application threads were stopped: (too many) seconds
AKA: Stop the world
9
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Cassandra ?
10
▪ Garbage Collection is a deal breaker for small clusters
▪ Overcome by a high number of nodes
▪ Proficiency requires knowledge and experience
▪ Cassandra without GC might be a close-to-perfect database
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
MOVING TO SCYLLA(after days of debate, and no GC)
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
WE WERE TOLD NOT TO USE DOCKER FOR
PRODUCTION(but we did it anyway)
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
First migration, cluster in Kubernetes
13
▪ Easily deploy/update within our infrastructureo 12 Kubernetes clusters in production
▪ Except Kafka, we have everything in Docker
▪ Scale within minutes
▪ Networking, logs, everything is included
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
First migration, cluster in Kubernetes
14
▪ Unexpected High latencies and load was over the roof
▪ ScyllaDB’s team to the rescue
▪ It’s all about Seastar internals
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Seastar and Kubernetes
15
▪ Seastar is sharding, one shard, one core
▪ Networking and data are sharded per core
▪ Pinned CPU in Kubernetes was not availableo keep an eye on Cpu-Manager
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Moving, part 2
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
New infrastructure
17
▪ 3 clusters
▪ 2 single region clusters
▪ 1 double region clusteroRedundancyoRead-Heavy workload without impact on production
▪ 7 machines per cluster/regiono 10 cpuso 36Gb of RAMo x2 - 375GB local SSD
▪ No Docker
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
GCP Deployment
18
▪ Ubuntu 16.04 LTS
▪ NVMeoCascade failure on GCP machines with kernel 4.8.0oUnexpected Chaos Monkey
▪ Resiliency theory proved to be right
▪ Failure did not provoke any downtime
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Clusters as feature isolation
19
▪ More nodes
▪ Better overview of workloado Feature teams can scale their needsoDevelopers can improve performances and see impactoOffer data teams a way to query production clusters
▪ Resiliency if implemented the right way
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Uptime
20
▪ 100% uptime for our 3 clusterso 0 cases of major failure due to ScyllaDB
▪ Few crashes that were always machine related
▪ No impact on production
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Everything is about performance
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
300k requests per second on 7 nodes
22
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Latencies distribution over the same traffic
23
▪ p95 in under 1ms, provided by Google Cloud Platform
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Scylla is production ready
24
▪ It’s fast, so fastoMake mistakeso Experiments
▪ You don’t need to be a Cassandra ninja
▪ It’s a supported product (Scylla 2 is out)
▪ Keep an eye on Kubernetes
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Go behind usual use-cases
25
▪ Using Scylla as a cache instead of Redis ?
▪ I/O scheduler
▪ Memstore
▪ Cache
▪ Persistence beyond the Memory
▪ Amazing performances
▪ Do you still need a cache ?
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
We don’t :)
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
THANK YOU
@jbaptistedalido
Please stay in touch