scaling db access for 100s of billions of queries per day · 25.06.2019 · microservice connect...
TRANSCRIPT
Scaling DB Access for 100s of billions of queries per day
1
TLDR - Hera
Open Source
High Eff iciency
Reliable Access
to data stores
2
Out Of DB Connections?
• Scaling
• Multiplexing
• Sharding
• Resiliency
• Manageability
• Our switch to Go Lang
About Petrica
• Works @ PayPal
• developing the database proxy.
• database backed publish-subscriber messaging system
• PayPal Here backend
• Golang contributor
• Previously @
• Worked on distributed systems, embedded system, desktop apps
©2019 PayPal Inc. Confidential and proprietary. 3
About Kenneth
• Works @ PayPal
• paypal.de
• PayPal Verisign Two Factor Authentication
• Coded shared memory for sharding
• Migrated to Go (hera)
• Previously @
• Visualization of Genomic Metabolism, Artificial Chemistry, and EPICS
Monitoring
©2019 PayPal Inc. Confidential and proprietary. 4
©2019 PayPal Inc. Confidential and proprietary. 5
About PayPal – 2018 facts
©2019 PayPal Inc. Confidential and proprietary. 6
100+Currencies
200+Markets
267MActive Customer Accounts
9.9BPayments Transactions
$578B Total Payments Volume
Up 27% YoY*Total Payment Volume
2,700Applications
4,500Engineers
17,000Releases
PayPal OLTP Environment scale
©2015 PayPal Inc. Confidential and proprietary. 7
2000+ Database Instances
100+ Billion Calls/day
3000+Database Nodes
74 PB Total Storage
300+ Schemas
10 +Availability Zones
200,000 +Application Servers
1+ Billion Calls/hour
8
Key Take-aways
• Massive scale
• Complexity
• Growth
©2019 PayPal Inc. Confidential and proprietary. 9
Mic ro se rv i ce
connect
request
request
request
disconnect
request
response
Typical Database Access by Application
©2019 PayPal Inc. Confidential and proprietary. 10
Mic ro se rv i ce
connect
request
request
request
disconnect
request
response
100 ms
1 ms
1 ms
1 ms
103 ms
Typical Database Access by Application
©2019 PayPal Inc. Confidential and proprietary. 11
Mic ro se rv i ce
connect
request
request
request
request
response
100 ms
~0 ms
1 ms
1 ms
~3 ms
Po o l
1 ms
Connection is pre-established, cached and re-used for each request
release
lease
Typical Database Access by Application
From Few to Thousands Microservices
©2019 PayPal Inc. Confidential and proprietary. 12
Mic ro se rv i ce
Configure the pool cache large
enough for availability
From Few to Thousands Microservices
©2019 PayPal Inc. Confidential and proprietary. 13
Mic ro se rv i ce
M i c ro se rv i ce
Configure the pool cache large
enough for availability
Add more service when scaling up
©2019 PayPal Inc. Confidential and proprietary. 14
Mic ro se rv i ce
Mic ro se rv i ce
M i c ro se rv i ce
Configure the pool cache large
enough for availability
Add more service when scaling up
From Few to Thousands Microservices
©2019 PayPal Inc. Confidential and proprietary. 15
Mic ro se rv i ce
Mic ro se rv i ce
M i c ro se rv i ce
M i c ro se rv i ce
Configure the pool cache large
enough for availability
Add more service when scaling up
From Few to Thousands Microservices
©2019 PayPal Inc. Confidential and proprietary. 16
Mic ro se rv i ce
Mic ro se rv i ce
Mic ro se rv i ce
M i c ro se rv i ce
M i c ro se rv i ce
Configure the pool cache large
enough for availability
Add more service when scaling up
Hit a connection limit
From Few to Thousands Microservices
17
Solutions?
• Lower the connection pool
size in each microservice
node
• Improve microservice code
to use the connection more
efficiently
• Buy better database
hardware
Partitioned Connection Pool
©2019 PayPal Inc. Confidential and proprietary. 18
Mic ro se rv i ce
M i c ro se rv i ce
Request
Request
While servicing requests, the
microservices use only some
DB connections
Partitioned Connection Pool
©2019 PayPal Inc. Confidential and proprietary. 19
Mic ro se rv i ce
M i c ro se rv i ce
Request
Request
While servicing requests, the
microservices use only some
DB connections
In some node all the
connections can be temporary
busy
Partitioned Connection Pool
©2019 PayPal Inc. Confidential and proprietary. 20
Mic ro se rv i ce
While servicing requests, the
microservices use only some
DB connections
In some node all the
connections can be temporary
busy
Client requests will incur
latency though the database
has enough capacity
Mic ro se rv i ce
Request
Request
21
Solutions?
Scale with shared pool !!!
S h a r e d c o n n e c t i o n p o o l
©2019 PayPal Inc. Confidential and proprietary. 22
Mic ro se rv i ce
M i c ro se rv i ce
Request
Request
Shared pool
©2019 PayPal Inc. Confidential and proprietary. 23
Se rv i ce
Service request
Service
response
SQL request
SQL request
SQL request
SQL request
DB connection
Database connection is
allocated but mostly idle
Typical service request
©2019 PayPal Inc. Confidential and proprietary. 24
Se rv i ce
Service request
Service
response
SQL request
SQL request
SQL request
SQL request
SQL request
SQL request
SQL request
DB connection
Serving other requests when client seems idle?
Multiplexing
©2019 PayPal Inc. Confidential and proprietary. 25
Mic ro se rv i ce
M i c ro se rv i ce
Request
RequestSha red Po o l
Services perceive that they have a database connection
Multiplexing
©2019 PayPal Inc. Confidential and proprietary. 26
Mic ro se rv i ce
M i c ro se rv i ce
Request
RequestSha red Po o l
Services perceive that they have a database connection
Hera – Highly Efficient Reliable Access
©2019 PayPal Inc. Confidential and proprietary. 27
Mic ro se rv i ce
M i c ro se rv i ce
Request
Request
Hera
Shared connection pool multiplexer
to data s tores
HERAMicroservice-
First
DB proxy
28
Principles and constraints
• Correctness
• Resiliency, -ilities, performance
• Minimal configuration
• Thin client
• Minimal change to wire protocol
29
Let’s talk about other features!
Scale
Features
30
• Multiplexing
• Read write split
• Sharding
Read-Write Split
©2019 PayPal Inc. Confidential and proprietary. 31
RW
RW RW
RW
Oracle RAC - Real
Application Cluster
Each database is exchanging
lock information
Read-Write Split
©2019 PayPal Inc. Confidential and proprietary. 32
RW RW
Oracle RAC - Real
Application Cluster
R
R R
Hera No de
Separate write queries
and direct to one node
Reduces traffic between
DB nodes
Sharding
33
• Client vs Server
• Finding the Shard
• Sample Query Run
• Convert Legacy App
• Move Scuttle Bin’s Data
Sharding: Client vs Server
©2019 PayPal Inc. Confidential and proprietary. 34
C++ Client
C++ Lib
Database (shard 0) Database (shard 1) Database (shard 2)
Sharding: Same Logic in Java Client
©2019 PayPal Inc. Confidential and proprietary. 35
C++ Client
C++ Lib
Database (shard 0) Database (shard 1) Database (shard 2)
Java Client
Java Lib
Sharding: Duplicate Logic in Go Client
©2019 PayPal Inc. Confidential and proprietary. 36
C++ Client
C++ Lib
Database (shard 0) Database (shard 1) Database (shard 2)
Java Client Go Client
Java Lib Go Lib
Sharding: Copy Logic in Python
©2019 PayPal Inc. Confidential and proprietary. 37
C++ Client
C++ Lib
Database (shard 0) Database (shard 1) Database (shard 2)
Java Client Go Client
Java Lib Go Lib
Python Client
Python Lib
Sharding: Node.js Client Too
©2019 PayPal Inc. Confidential and proprietary.38
Node.js ClientC++ Client
C++ Lib
Database (shard 1) Database (shard 2)
Java Client Go Client
Java Lib Go Lib
Python Client
Python Lib Node.js Lib
Database (shard 0)
WorkerWorker
WorkerWorker
WorkerWorker
WorkerWorker
WorkerWorker
WorkerWorker
Node.js Client
Sharding: Keep Logic in Server
© 2019 PayPal Inc. Confidential and proprietary. 39
Hera Multiplexer
C++ Client
C++ Lib
Database (shard 0) Database (shard 1) Database (shard 2)
Java Client Go Client
Hera Java Lib Hera Go Lib
Python Client
Python Lib Node.js Lib
Sharding: Implementation
©2019 PayPal Inc. Confidential and proprietary. 40
1. Shard Key: account_id=2000
…
Sha rd 0 Sha rd 1 Sha rd M
Hash function (shard key value) %K
Scuttle bin to logical shard mapping
Logical to physical shard mapping
Shard Key
DB Shard
0
DB Shard
N
…
…
…
©2019 PayPal Inc. Confidential and proprietary. 41
…
Sha rd 0 Sha rd 1 Sha rd M
Hash function (shard key value) %K
Scuttle bin to logical shard mapping
Logical to physical shard mapping
Shard Key
DB Shard
0
DB Shard
N
1. Shard Key: account_id=2000
2. Murmur3Hash(2000)%1024=280
…
…
…
Sharding: Implementation
©2019 PayPal Inc. Confidential and proprietary. 42
…
Sha rd 0 Sha rd 1 Sha rd M
Hash function (shard key value) %K
Scuttle bin to logical shard mapping
Logical to physical shard mapping
Shard Key
DB Shard
0
DB Shard
N
1. Shard Key: account_id=2000
2. Murmur3Hash(2000)%1024=280
3. Select shard_id from
hera_shard_map where
scuttle_id = 280;
shard_id
---------
1
…
…
…
Sharding: Implementation
©2019 PayPal Inc. Confidential and proprietary. 43
…
Sha rd 0 Sha rd 1 Sha rd M
Hash function (shard key value) %K
Scuttle bin to logical shard mapping
Logical to physical shard mapping
Shard Key
DB Shard
0
DB Shard
N
1. Shard Key: account_id=2000
2. Murmur3Hash(2000)%1024=280
3. Select shard_id from
hera_shard_map where scuttle_id =
280;
shard_id
---------
1
4. TWO_TASK_1 = LOAN_SH1
TWO_TASK_1=‘tcp(loan-sh1:
3306)/loan’
…
…
…
Sharding: Implementation
Sharding: Converting Application Query
- Shard Key: account_id
- select * from loan, appfile where loan.id = ? and appfile.loan_id = loan.id
©2019 PayPal Inc. Confidential and proprietary. 44
Sharding Compatible Query
- Shard Key: account_id
- select * from loan, appfile where loan.id = ? and appfile.loan_id = loan.id and loan.account_id = ? and appfile.account_id = loan.account_id
©2019 PayPal Inc. Confidential and proprietary. 45
Sharding: Hera JDBC SQL Rewrite
- Shard Key: account_id
- select * from loan, appfile where loan.id = :loan_id and appfile.loan_id = loan.id and loan.account_id = :account_id and appfile.account_id = loan.account_id
- Hera can now pick out the shard key name and value
©2019 PayPal Inc. Confidential and proprietary. 46
Sharding: Legacy App Conversion
©2019 PayPal Inc. Confidential and proprietary. 47
• All queries directed to shard 0
• Logs queries that don’t bind to shard
key
…
Sha rd 0
Hash function (shard key value) %K
Scuttle bin to logical shard mapping
Logical to physical shard mapping
Shard Key
DB Shard
0
DB Shard
N
…
…
Sharding: Legacy App Conversion
©2019 PayPal Inc. Confidential and proprietary. 48
• Whitelist sends one value to a
specific shard
• Limits risk of failures to 1 value
• Hera uses Shard 1, but same DB
• Fast rollback on errors
• Repeat for larger sets
…
Sha rd 0
Hash function (shard key value)
%K
Scuttle bin to logical shard
mapping
Logical to physical shard mapping
Shard
Key
DB Shard
0
DB Shard
N
Whitelist
Sha rd 1
…
…
Sharding: Physical Whitelist
©2019 PayPal Inc. Confidential and proprietary. 49
• Validates permissions and data
copy
…
Sha rd 0
Hash function (shard key value)
%K
Scuttle bin to logical shard
mapping
Logical to physical shard mapping
Shard
Key
DB Shard
0
DB Shard
N
Whitelist
Sha rd 1
…
Sharding: Logical Move for One Scuttle Bin
©2019 PayPal Inc. Confidential and proprietary. 50
• If successful, do a physical data move next
…
Sha rd 0
Hash function (shard key value)
%K
Scuttle bin to logical shard
mapping
Logical to physical shard mapping
Shard
Key
DB Shard
0
DB Shard
N
Whitelist
Sha rd 1
…
…
Sharding: Moving Scuttle Bin
©2019 PayPal Inc. Confidential and proprietary. 51
• Start data copy
• Typically, tables are partitioned by
scuttle bin
…
Sha rd 0 Sha rd 1 Sha rd M
Hash function (shard key value) %K
Scuttle bin to logical shard mapping
Logical to physical shard mapping
Shard Key
DB Shard
0
DB Shard
N
…
…
…
Sharding: Moving Scuttle Bin
©2019 PayPal Inc. Confidential and proprietary. 52
• Start data copy
• Typically, tables are partitioned by
scuttle bin
• Block writes
…
Sha rd 0 Sha rd 1 Sha rd M
Hash function (shard key value) %K
Scuttle bin to logical shard mapping
Logical to physical shard mapping
Shard Key
DB Shard
0
DB Shard
N
…
…
Sharding: Moving Scuttle Bin
©2019 PayPal Inc. Confidential and proprietary. 53
• Start data copy
• Typically, tables are partitioned by
scuttle bin
• Block writes
• Data fully copied
…
Sha rd 0 Sha rd 1 Sha rd M
Hash function (shard key value) %K
Scuttle bin to logical shard mapping
Logical to physical shard mapping
Shard Key
DB Shard
0
DB Shard
N
…
…
…
Sharding: Moving Scuttle Bin
©2019 PayPal Inc. Confidential and proprietary. 54
• Start data copy
• Typically, tables are partitioned by
scuttle bin
• Block writes
• Data fully copied
• Update map
• Use scuttle bin in new location
…
Sha rd 0 Sha rd 1 Sha rd M
Hash function (shard key value) %K
Scuttle bin to logical shard mapping
Logical to physical shard mapping
Shard Key
DB Shard
0
DB Shard
N
…
…
…
55
Sharding: Take Away
• Queries must bind shard key & value
• Manage risk converting legacy code
• Controlled data redistribution
Resiliency
Principles
56
• Deal with exceptional
cases
▪ Surge in load
▪ Slow SQL requests
▪ Issues with the DB
Resiliency
Principles
57
• Deal with exceptional
cases
▪ Surge in load
▪ Slow SQL requests
▪ Issues with the DB
• Limit or avoid impact
• Self heal
Resiliency
Features
58
• Bouncer
• Surge Queue
• Slow Query Eviction
• Read Replica
• Transparent Application
Failover
Resiliency: Oversubscription - Bouncer
©2019 PayPal Inc. Confidential and proprietary. 59
Se rv i ce 1
Hera No de
Se rv i ce 2
Se rv i ce 3
Se rv i ce 4
Service 4 connection is
tcp accepted and then
closed without SSL.
Resiliency: Oversubscription – Surge Queue
©2019 PayPal Inc. Confidential and proprietary. 60
Se rv i ce 1
Hera No de
Se rv i ce 2
Se rv i ce 3
Service 3 can make a
request
It waits--queued
Resiliency: Oversubscription – Surge Queue
©2019 PayPal Inc. Confidential and proprietary. 61
Se rv i ce 1
Hera No de
Se rv i ce 2
Se rv i ce 3
Generally, 1ms later,
Service 3 gets run.
Resiliency: Oversubscription – Surge Queue
©2019 PayPal Inc. Confidential and proprietary. 62
Se rv i ce 1
Hera No de
Se rv i ce 2
Se rv i ce 3
Waiting more than 1s gives
an error to Service 3.
Resiliency: Slow Query Eviction
©2019 PayPal Inc. Confidential and proprietary. 63
Se rv i ce 1
Hera No de
Se rv i ce 2
Se rv i ce 3
1s
Slow Query
When Hera is overloaded
Resiliency: Slow Query Eviction
©2019 PayPal Inc. Confidential and proprietary. 64
Se rv i ce 1
Hera No de
Se rv i ce 2
Se rv i ce 3
1s
Slow Query
Hundred of queries can run
once slow query is
removed.
Monitoring Usage
©2019 PayPal Inc. Confidential and proprietary. 65
Se rv i ce 1
Hera No de
Se rv i ce 2
Se rv i ce 3
Each Hera NodeFree DB
Connections
(acpt)
Time
Resiliency: Eviction – Actual event
©2019 PayPal Inc. Confidential and proprietary. 66
Each Hera NodeFree DB
Connections
(acpt)
Time
Happened few years back, when automatic eviction went unnoticed.
Evicting expensive query makes space for many typical queries.
Read Replicas
©2019 PayPal Inc. Confidential and proprietary. 67
MUX R1 Pool
Po o l 1
R1 PoolR1 Pool
MUX R1 PoolR1 Pool
R2 Pool
Po o l 2
Hera
Appl icat ion
Read Repl ica 1
Read Repl ica 2
Wri te DB
F a i l o v e r
m o d u l e
B u s i n e s s
l o g i c
R1 & R2: different data replication lag
Transparent Application Failover – Read Replica Retry
©2019 PayPal Inc. Confidential and proprietary. 68
MUX
R1 PoolR1 Pool
R1 Pool
R1 PoolR1 Pool
R2 Pool
Po o l
Hera
Appl icat ion
B u s i n e s s
l o g i c
Read Repl ica 1
Read Repl ica 2
Write DB
• Build failover on server side
• Also failover when query is slow
Manageability
69
• Maintenance without customer
impact
• Recommend clients recycle
often
• Oracle RAC maintenance
Recycle Connections– Node Going Away
©2019 PayPal Inc. Confidential and proprietary. 70
Hera No de
Se rv i ce 2
Se rv i ce 3
Hera No de
Hera No de
TCP Load
Balancer
Recycle Connections for Manageability
©2019 PayPal Inc. Confidential and proprietary. 71
Hera No de
Se rv i ce 2
Se rv i ce 3
Hera No de
Hera No de
TCP Load
Balancer
Recycle Connections– Node Returns
©2019 PayPal Inc. Confidential and proprietary. 72
Hera No de
Se rv i ce 2
Se rv i ce 3
Hera No de
TCP Load
Balancer
Hera No de
DB Maintenance
©2019 PayPal Inc. Confidential and proprietary. 73
RW 2
Oracle RAC - Real
Application Cluster
3
4 1
Hera No de
• Preparing Oracle RAC node 3
maintenance
• DBAs remove node 3 from Oracle
configs
• Insert into hera_maint (inst_id,
status, status_time, module,
machine) values (3, ‘F’, [unix
epoch]+2, [hera pool name], [host]
DB Maintenance
©2019 PayPal Inc. Confidential and proprietary. 74
RW 2
Oracle RAC -Real
Application Cluster
3
4 1
Hera No de
• After [unix epoch] +2
Take Aways
• Keep controls near those who need
it
• Avoid being unnecessarily involved
75
Transition to Go
- Written in C++, asynchronous code
- Scalable
- Efficient
- Complex
©2019 PayPal Inc. Confidential and proprietary. 76
Previous version
Transition to Go
- Written in C++, asynchronous code
- Scalable
- Efficient
- Complex
©2019 PayPal Inc. Confidential and proprietary. 77
Previous version
Transition to GoNew changes
- New functional requirements
- New non functional requirements (efficiency)
- Written in C++, asynchronous code
- Scalable
- Efficient
- Complex
©2019 PayPal Inc. Confidential and proprietary. 78
Previous version
Transition to GoNew changes
- New functional requirements
- New non functional requirements (efficiency)
- Go programing is synchronous
- Scale efficiently out of the box
- Does it meet our requirements?
- Is stop-the-world garbage collection time an issue?
How about writing in Go?
Proof-of-concept
1st Release: No Sharding
Full feature with C++
dependencies
Full feature all Go
©2019 PayPal Inc. Confidential and proprietary. 79
Going with Go
Transition to Go
- Running in production for 1 year
- Latency and CPU met the requirements
- No clients change
- Had some issues and learnings
©2019 PayPal Inc. Confidential and proprietary. 80
Mission Accomplished
func routine() {
time.Sleep(time.Second * 10)
}
func main() {
routines := 10000
wg.Add(routines)
for i := 0; i < routines; i++ {
go func() {
routine(i)
wg.Done()
}()
}
wg.Wait()
}
©2019 PayPal Inc. Confidential and proprietary. 81
Spawn 10000 go-routines sleep for 10 seconds to simulate processing
Running on my 2 cores test machine
How many OS threads I expect at runtime?
Learning Example
func routine() {
time.Sleep(time.Second * 10)
}
func main() {
routines := 10000
wg.Add(routines)
for i := 0; i < routines; i++ {
go func() {
routine(i)
wg.Done()
}()
}
wg.Wait()
}
©2019 PayPal Inc. Confidential and proprietary. 82
Spawn 10000 go-routines sleep for 10 seconds to simulate processing
Running on my 2 cores test machine
Learning Example
$> ./mytest &
[1] 12345
$> ps –L –C mytest –no-headers | wc –l
6
$>
func routine() {
//time.Sleep(time.Second * 10)
syscall.Select(0, nil, nil, nil, &syscall.Timeval{Sec: 10})
}
func main() {
routines := 10000
wg.Add(routines)
for i := 0; i < routines; i++ {
go func() {
routine(i)
wg.Done()
}()
}
wg.Wait()
}
©2019 PayPal Inc. Confidential and proprietary. 83
Learning ExampleSpawn 10000 go-routines sleep for
10 seconds to simulate processing
Running on my 2 cores test machine
Using system call to sleep how many OS threads I expect at runtime?
func routine() {
//time.Sleep(time.Second * 10)
syscall.Select(0, nil, nil, nil, &syscall.Timeval{Sec: 10})
}
func main() {
routines := 10000
wg.Add(routines)
for i := 0; i < routines; i++ {
go func() {
routine(i)
wg.Done()
}()
}
wg.Wait()
}
©2019 PayPal Inc. Confidential and proprietary. 84
Learning ExampleSpawn 10000 go-routines sleep for
10 seconds to simulate processing
Running on my 2 cores test machine
$ ./mytest
runtime: program exceeds 10000-thread limit
fatal error: thread exhaustion
runtime stack:
….
. . . “A Go program creates a new thread only when a goroutine is
ready to run but all the existing threads are blocked in system calls,
cgo calls, or are locked to other goroutines due to use of
runtime.LockOSThread”
©2019 PayPal Inc. Confidential and proprietary. 85
from “runtime/debug” func SetMaxThreads()
Learning Example
86
Key Take Away • System calls lock the OS
thread. Ex: os.File read
• When using C/C++ library,
be aware of the systems
calls (mutexes, file reads,
socket reads, sleep)
• Golang locks, socket reads,
sleep are not locking the OS
thread
• In your load tests, verify the
number of OS threads
Open source
github.com/paypal/hera
✓ Code, documentation, examples
✓ Standard clients
• Java: JDBC driver
• Golang: database/sql driver
Google group heradatalink
©2019 PayPal Inc. Confidential and proprietary. 87
github.com/paypal/hera
Internal Mirror
Build
Deploy
PayPal Config
Acknowledgements
Kamlakar
Mukundan
Liana
Sid
Yaping
Shuping
Stephanie
Varun
Qing
Chun
Dwain
©2019 PayPal Inc. Confidential and proprietary. 88
Paresh
Srini
Dheeraj
Pradeep
Pramod
Samrat
Akash
Ashwin
and countless others
Next Steps
Need more database connections?
Hera can help scale
• Multiplexing
• Sharding
• Resiliency
• Manageability
©2019 PayPal Inc. Confidential and proprietary. 89
Open Source github.com/paypal/hera
Google Groups – heradatalink
Our Future
• Scaling Tools
• More Clients
©2019 PayPal Inc. Confidential and proprietary. 90
github.com/paypal/hera Google group: heradatalink