the shard revisited: tools and techniques used at etsy
DESCRIPTION
This goes over an overview of the architecture, and then goes into the development data problem. It also talks about some tools we use to do data migrations and schema changes.TRANSCRIPT
![Page 1: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/1.jpg)
[email protected] / @johngoulah
The Shard RevisitedTools and Techniques Used at Etsy
Tuesday, November 12, 13
![Page 2: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/2.jpg)
Tuesday, November 12, 13
A marketplace for people around the world to connect, buy, and sell unique goodsEtsy is the marketplace that we all make together, and our mission is to re-imagine commerce in ways that build a more fulfilling and lasting world
![Page 3: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/3.jpg)
1.5B+ page views / mo.
895MM sales in 2012
60MM+ unique visitors/mo.
1M+ shops / 200 countries
Tuesday, November 12, 13
![Page 4: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/4.jpg)
Tuesday, November 12, 13
this talk consists of the architecture, our dev data problem/solution, and other toolsbig cluster, 35 shards
![Page 5: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/5.jpg)
100K+ queries/sec avg
6TB InnoDB buffer pool
30TB+ data stored
99.9% queries under 1ms
~1.8Gbps outbound (plain text)
Tuesday, November 12, 13
1/3 RAM not dedicated to the pool (OS, disk, network buffers, etc)
![Page 6: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/6.jpg)
~100 MySQL servers1100 15K rpm disks / 1600+ CPU’s
Server SpecHP DL 380 G8
96GB RAM16 spindles / 2TB RAID 10
24 CoreTuesday, November 12, 13
16 x 146GB
![Page 7: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/7.jpg)
Architecture
Tuesday, November 12, 13
2 key concerns when you reach scale....
![Page 8: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/8.jpg)
Redundancy
Tuesday, November 12, 13
the duplication of critical components of a system with the intention of increasing reliabilityexample: jet engines
![Page 9: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/9.jpg)
Master - Master
R/W R/W
Tuesday, November 12, 13
duplication of critical components....
![Page 10: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/10.jpg)
Master - Master
R/W R/W
Side A Side BTuesday, November 12, 13
we call these sides “replicants”
![Page 11: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/11.jpg)
Scalability
Tuesday, November 12, 13
the ability of a system to handle growing amount of work in a capable manner(grocery store example)
![Page 12: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/12.jpg)
shard 1 shard 2 shard N
. . .
Tuesday, November 12, 13
horizontal scaling
![Page 13: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/13.jpg)
shard 1 shard 2 shard N
shard N + 1
. . .
Tuesday, November 12, 13
horizontal scaling
![Page 14: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/14.jpg)
shard 1 shard 2 shard N
shard N + 1
. . .
Migrate Migrate Migrate
Tuesday, November 12, 13
horizontal scaling
![Page 15: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/15.jpg)
Bird’s-Eye View
Tuesday, November 12, 13
http://www.flickr.com/photos/feuilllu/36612719/sizes/l/in/photostream/
![Page 16: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/16.jpg)
tickets index
shard 1 shard 2 shard N
Tuesday, November 12, 13
3 main components
couple others, dbaux, dbtasks
![Page 17: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/17.jpg)
tickets index
shard 1 shard 2 shard N
Unique IDs
Tuesday, November 12, 13
![Page 18: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/18.jpg)
tickets index
shard 1 shard 2 shard N
Shard Lookup
Tuesday, November 12, 13
![Page 19: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/19.jpg)
tickets index
shard 1 shard 2 shard N
Store/Retrieve Data
Tuesday, November 12, 13
![Page 20: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/20.jpg)
Basics
Tuesday, November 12, 13
what is sharding?
![Page 21: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/21.jpg)
users_groups
user_id group_id
1 A
1 B
2 A
2 C
3 A
3 B
3 C
Tuesday, November 12, 13
![Page 22: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/22.jpg)
users_groups
user_id group_id
1 A
1 B
2 A
2 C
3 A
3 B
3 C
Tuesday, November 12, 13
creating horizontal partitions from a table
![Page 23: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/23.jpg)
users_groups
user_id group_id
1 A
1 B
2 A
2 C
user_id group_id
3 A
3 B
3 C
3 A
3 B
3 C
Tuesday, November 12, 13
![Page 24: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/24.jpg)
users_groups
user_id group_id
1 A
1 B
2 A
2 C
user_id group_id
3 A
3 B
3 C
shard 1
shard 2
Tuesday, November 12, 13
![Page 25: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/25.jpg)
Index Servers
Tuesday, November 12, 13
have to be able to find the data, these simply exist to look up where the data isto answer the question: what shard is the data on?
http://www.flickr.com/photos/mamsy/4175783446/sizes/l/in/photostream/
![Page 26: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/26.jpg)
index
shard 1 shard 2 shard N
Tuesday, November 12, 13
want to find details for a user
![Page 27: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/27.jpg)
index
shard 1 shard 2 shard N
select shard_id from user_index where user_id = X
Tuesday, November 12, 13
first get the shard id, have the PK
![Page 28: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/28.jpg)
index
shard 1 shard 2 shard N
select shard_id from user_index where user_id = X
returns 1
Tuesday, November 12, 13
![Page 29: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/29.jpg)
index
shard 1 shard 2 shard N
select join_date from users where user_id = X
Tuesday, November 12, 13
![Page 30: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/30.jpg)
index
shard 1 shard 2 shard N
select join_date from users where user_id = X
returns 2012-02-05
Tuesday, November 12, 13
![Page 31: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/31.jpg)
Ticket ServersTuesday, November 12, 13
http://www.flickr.com/photos/rexroof/5126088323/sizes/l/in/photostream/
![Page 32: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/32.jpg)
Globally Unique ID
Tuesday, November 12, 13
can’t use auto-increment with distributed system, hand out globally unique id’s
![Page 33: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/33.jpg)
CREATE TABLE `tickets` ( `id` bigint(20) unsigned NOT NULL auto_increment, `stub` char(1) NOT NULL default '', PRIMARY KEY (`id`), UNIQUE KEY `stub` (`stub`)) ENGINE=MyISAM
Tuesday, November 12, 13
only myisam tables, leverage myisam engine's lack of concurrency
![Page 34: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/34.jpg)
REPLACE INTO tickets (stub) VALUES ('a');SELECT LAST_INSERT_ID();
Ticket Generation
Tuesday, November 12, 13
since value ‘a’ exists, it replaces the row with the same value (and bumps the id)
if an old row in the table has the same value as a new row for aPK or a UNIQUE index, the old row is deleted before the new row is inserted
![Page 35: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/35.jpg)
REPLACE INTO tickets (stub) VALUES ('a');SELECT LAST_INSERT_ID();
SELECT * FROM tickets;
Ticket Generation
id stub
4589294 a
Tuesday, November 12, 13
![Page 36: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/36.jpg)
auto-increment-increment = 2auto-increment-offset = 1
auto-increment-increment = 2auto-increment-offset = 2
tickets A
tickets B
Tuesday, November 12, 13
ODD:offset=1EVEN: offset=2 http://openclipart.org/detail/94723/database-symbol-by-rg1024
![Page 37: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/37.jpg)
auto-increment-increment = 2auto-increment-offset = 1
auto-increment-increment = 2auto-increment-offset = 2
tickets A
tickets B
NOT master-masterTuesday, November 12, 13
failure is ok, only lose last ticket idcan bring another server up with new offset
http://openclipart.org/detail/94723/database-symbol-by-rg1024
![Page 38: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/38.jpg)
Shards
Tuesday, November 12, 13
shards hold the majority of the data
http://www.flickr.com/photos/merrickb/63999750/sizes/o/in/photostream/
![Page 39: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/39.jpg)
Object Hashing....aka pinning data to one side of the shard
Tuesday, November 12, 13
after we determine the shard we have to determine side A or side B given the replicant indexalso helps keep connections to a (relative) minimum since all stuff sharded by a specific instance will then pick the same side
![Page 40: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/40.jpg)
user_id : 500
A B
Tuesday, November 12, 13
so we know the shard, now which replicantobject id in this case is user_idside a/b are replicants
![Page 41: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/41.jpg)
user_id : 500 % (# active replicants)
A B
Tuesday, November 12, 13
![Page 42: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/42.jpg)
user_id : 500 % (# active replicants)
'etsy_index_A' => 'mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw', 'etsy_index_B' => 'mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw', 'etsy_shard_001_A' => 'mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_001_B' => 'mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_002_A' => 'mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_002_B' => 'mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_003_A' => 'mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_003_B' => 'mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
A B
Tuesday, November 12, 13
each master master pair in the config
![Page 43: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/43.jpg)
user_id : 500 % (2)
A B
Tuesday, November 12, 13
![Page 44: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/44.jpg)
user_id : 500 % (2) == 0
A B
Tuesday, November 12, 13
![Page 45: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/45.jpg)
user_id : 500 % (2) == 0 select ... insert ...update ...
A B
Tuesday, November 12, 13
![Page 46: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/46.jpg)
user_id : 500 % (2) == 0user_id : 501 % (2) == 1
A B
Tuesday, November 12, 13
![Page 47: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/47.jpg)
user_id : 500 % (2) == 0user_id : 501 % (2) == 1
select ... insert ...update ...
500select ... insert ...update ...
501A B
Tuesday, November 12, 13
![Page 48: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/48.jpg)
FailureTuesday, November 12, 13
http://www.flickr.com/photos/44124348109@N01/6467405231/
![Page 49: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/49.jpg)
user_id : 500 % (2) == 0user_id : 501 % (2) == 1
A B
Tuesday, November 12, 13
![Page 50: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/50.jpg)
user_id : 500 % (2) == 0user_id : 501 % (2) == 1
A B
Tuesday, November 12, 13
![Page 51: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/51.jpg)
user_id : 500 % (2) == 0user_id : 501 % (2) == 1
A B
Tuesday, November 12, 13
![Page 52: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/52.jpg)
user_id : 500 % (2) == 0user_id : 501 % (2) == 1
'etsy_index_A' => 'mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw', 'etsy_index_B' => 'mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw', 'etsy_shard_001_A' => 'mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_001_B' => 'mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_002_A' => 'mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_002_B' => 'mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_003_A' => 'mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_003_B' => 'mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
A B
Tuesday, November 12, 13
![Page 53: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/53.jpg)
user_id : 500 % (2) == 0user_id : 501 % (2) == 1
'etsy_index_A' => 'mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw', 'etsy_index_B' => 'mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw', 'etsy_shard_001_A' => 'mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_001_B' => 'mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_002_A' => 'mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_002_B' => 'mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_003_A' => 'mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_003_B' => 'mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
A B
Tuesday, November 12, 13
![Page 54: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/54.jpg)
user_id : 500 % (1) == 0user_id : 501 % (1) == 0
A B
Tuesday, November 12, 13
![Page 55: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/55.jpg)
VariantsTuesday, November 12, 13
variants are mirrors of the same data in different tables
http://www.flickr.com/photos/garibaldi/522196113/sizes/o/in/photostream/
![Page 56: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/56.jpg)
shard 1
user_id group_id
1 A
1 B
2 A
2 C
shard 2
user_id group_id
3 A
3 B
4 A
5 C
SELECT user_id FROM users_groups WHERE group_id = ‘A’
Tuesday, November 12, 13
![Page 57: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/57.jpg)
shard 1
user_id group_id
1 A
1 B
2 A
2 C
shard 2
user_id group_id
3 A
3 B
4 A
5 C
SELECT user_id FROM users_groups WHERE group_id = ‘A’Broken!
Tuesday, November 12, 13
![Page 58: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/58.jpg)
shard 1
user_id group_id
1 A
1 B
2 A
2 C
shard 2
user_id group_id
3 A
3 B
4 A
5 C
SELECT user_id FROM users_groups WHERE group_id = ‘A’Broken!
JOIN
Tuesday, November 12, 13
![Page 59: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/59.jpg)
users_groupsuser_id group_id
1 A
1 B
2 A
2 C
groups_usersgroup_id user_id
A 1
A 3
A 2
3 A
3 B
3 C
B 3
B 1
C 2
C 3
Tuesday, November 12, 13
mirror the data, map users to groups, groups to users
![Page 60: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/60.jpg)
indexuser_id shard_id
1 1
2 1
3 2
4 3
group_id shard_id
A 1
B 2
C 2
D 3
users_groups_index groups_users_index
separate indexes for different slices of data
Tuesday, November 12, 13
![Page 61: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/61.jpg)
indexuser_id shard_id
1 1
2 1
3 2
4 3
group_id shard_id
A 1
B 2
C 2
D 3
users_groups_index groups_users_index
shard 3user_id group_id
4 A
4 B
4 C
4 D
Tuesday, November 12, 13
look up the groups a user is part of
![Page 62: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/62.jpg)
Dev Data
Tuesday, November 12, 13
now lets talk about development data
![Page 63: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/63.jpg)
The Problem
Tuesday, November 12, 13
hit this a few years ago, every big company probably has this issue
![Page 64: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/64.jpg)
DATA
Tuesday, November 12, 13
sync prod to dev, until prod data gets too big
http://www.flickr.com/photos/uwwresnet/6280880034/sizes/l/in/photostream/
![Page 65: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/65.jpg)
Some Approaches
subsets of data
generated data
Tuesday, November 12, 13
subsets have to end somewhere (a shop has favorites that are connected to people, connected to shops, etc)generated data can be time consuming to fake
![Page 66: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/66.jpg)
But...
Tuesday, November 12, 13
but there is a problem with both of those approaches
![Page 67: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/67.jpg)
Edge CasesTuesday, November 12, 13
what about testing edge cases, difficult to diagnose bugs?hard to model the same data set that produced a user facing bug
http://www.flickr.com/photos/kalexanderson/6199793967/sizes/o/in/photostream/
![Page 68: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/68.jpg)
Complexity
Tuesday, November 12, 13
another issue is testing problems at scale, complex and large gobs of datareal social network ecosystem can be difficult to generate (favorites, follows) (activity feed, “similar items” search gives better results in prod)
http://www.flickr.com/photos/doug88888/4687906267/sizes/o/in/photostream/
![Page 69: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/69.jpg)
Copy prod data to dev ?
Tuesday, November 12, 13
what most people do before data gets too big, almost 3 days to sync 30Tb over 1Gbps link, close to 10 hrs over 10Gbps bringing prod dataset to dev was expensive hardware/maint, keeping parity with prod, and applying schema changes would take at least as long
![Page 70: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/70.jpg)
Use Production
(sometimes)
instead....
Tuesday, November 12, 13
so we did what we saw as the last resort - used production not for greenfield development, more for mature features and diagnosing bugswe still have a dev database but the data is sparse and unreliable
![Page 71: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/71.jpg)
Tuesday, November 12, 13
goes without saying this can be dangerous, and people have to be aware they are doing it
http://instagram.com/p/d8nw9aNqlt/http://www.flickr.com/photos/stuckincustoms/432361985/sizes/l/in/photostream/
![Page 72: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/72.jpg)
dev shard
introducing....
Tuesday, November 12, 13
dev shard, shard used for initial writes of data created when coming from dev env
![Page 73: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/73.jpg)
tickets index
shard 1 shard 2 shard N
Tuesday, November 12, 13
![Page 74: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/74.jpg)
tickets index
shard 1 shard 2 shard N
DEV shard
Tuesday, November 12, 13
![Page 75: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/75.jpg)
shard 1 shard 2 shard N
DEV shard
www.etsy.com www.goulah.vm
Initial Writes
Tuesday, November 12, 13
![Page 76: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/76.jpg)
shard 1 shard 2 shard N
DEV shard
www.etsy.com www.goulah.vm
Initial Writes
Tuesday, November 12, 13
writes from etsy.com go everywhere -except- dev shard
![Page 77: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/77.jpg)
shard 1 shard 2 shard N
DEV shard
www.etsy.com www.goulah.vm
Initial Writes
Tuesday, November 12, 13
writes from my vm -only- go to dev shard
![Page 78: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/78.jpg)
mysql proxy
Tuesday, November 12, 13
![Page 79: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/79.jpg)
Tuesday, November 12, 13
proxy hits all of the shards/index/tickets
http://www.oreillynet.com/pub/a/databases/2007/07/12/getting-started-with-mysql-proxy.html
![Page 80: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/80.jpg)
explicitly enabled
% dev_proxy onDev-Proxy config is now ON. Use 'dev_proxy off' to turn it off.
Tuesday, November 12, 13
Not on all the time
![Page 81: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/81.jpg)
visual notifications
Tuesday, November 12, 13
![Page 82: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/82.jpg)
Tuesday, November 12, 13
notify engineers they are using the proxy, this is read-only mode
![Page 83: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/83.jpg)
read/write mode
Tuesday, November 12, 13
![Page 84: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/84.jpg)
Tuesday, November 12, 13
read-write mode, needed for login and other things that write data
![Page 85: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/85.jpg)
% ./bin/myscript YOU CURRENTLY HAVE THE READ WRITE PROXY TURNED ON AND ARE RUNNING A CLI SCRIPT!!!You must type the phrase 'read write proxy' and press enter to continue...
Tuesday, November 12, 13
![Page 86: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/86.jpg)
known input/output
Tuesday, November 12, 13
we know where all of the queries from dev originate from
http://www.flickr.com/photos/medevac71/4875526920/sizes/l/in/photostream/
![Page 87: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/87.jpg)
dangerous/unnecessary queries
(DEV) etsy_rw@jgoulah [test]> select * from fred_test;
ERROR 9001 (E9001): Selects from tables must have where clauses
Tuesday, November 12, 13
-- filter dangerous queries - (queries without a WHERE)-- remove unnecessary queries - (instead of DELETE, have a flag, ALTER statements don’t run from dev)
![Page 88: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/88.jpg)
logging
Tuesday, November 12, 13
basics of anomaly detection is log collection
![Page 89: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/89.jpg)
2013-04-22 18:05:43 485370821 devproxy --
/* DEVPROXY source=10.101.194.19:40198
uuid=c309e8db-ca32-4171-9c4a-6c37d9dd3361
[htSp8458VmHlC] [etsy_index_B] [browse.php] */
SELECT id FROM table;
date thread id
source ip
unique id generated by proxy
app request id dest. shard script
Tuesday, November 12, 13
![Page 90: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/90.jpg)
Tuesday, November 12, 13
![Page 91: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/91.jpg)
stealth data
Tuesday, November 12, 13
hiding data from users (favorites go on dev and prod shard, making sure test user/shops don’t show up in search)
http://www.flickr.com/photos/davidyuweb/8063097077/sizes/h/in/photostream/
![Page 92: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/92.jpg)
overlays
Tuesday, November 12, 13
An overlay is a local copy of production data If there are overlays in place in dev, it will send the queries to the local db instead (it does this by overriding looking up the shard on index, and checks for table/pk pair).
![Page 93: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/93.jpg)
prod
user_id group_id
1 A
1 B
2 A
2 C
user_id group_id
3 A
3 B
3 C
3 A
3 B
3 C
dev
copy
store in memcache: <table, pk>Tuesday, November 12, 13
Any time we write to the other shards from dev, the shard migration copies to be affected rows to their local mysql instance over the dev proxyand then stores the table/pk for subsequent lookup
![Page 94: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/94.jpg)
Delayed Slaves
Tuesday, November 12, 13
pt-slave-delay watches a slave and starts and stops its replication SQL thread as necessary to hold it
http://www.flickr.com/photos/xploded/141295823/sizes/o/in/photostream/
![Page 95: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/95.jpg)
4 hour delay behind master
produce row based binary logs
Delayed Slaves
allow for quick recovery
Tuesday, November 12, 13
role of the delayed slavealso source of BCP (business continuity planning - prevention and recovery of threats)
![Page 96: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/96.jpg)
pt-slave-delay --daemonize
--pid /var/run/pt-slave-delay.pid --log /var/log/pt-slave-delay.log
--delay 4h --interval 1m --nocontinue
Tuesday, November 12, 13
last 3 options most important, 4h delay, interval is how frequently it should check whether slave should be started or stopped nocontinue - don’t continue replication normally on exit (don’t catch up with master)user/pass eliminated for brevity
![Page 97: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/97.jpg)
R/W R/W
Slave
Shard Pair
pt-slave-delayrow based binlogs
Tuesday, November 12, 13
![Page 98: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/98.jpg)
R/W R/W
Slave
Shard Pair
HDFS
VerticaParse/
Transform
Tuesday, November 12, 13
in addition can use slaves to send data to other stores for offline queries1)parse each binlog file to generate sequence file of row changes2)apply the row changes to a previous set for the latest version
![Page 99: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/99.jpg)
Schema Changes
Tuesday, November 12, 13
alters take forever, lock rows being altered (this is why we have new things like online schema change)
![Page 100: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/100.jpg)
shard 1 shard 2 shard N
Tuesday, November 12, 13
LOTS of servers to apply changes to PLUS the alter problem
![Page 101: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/101.jpg)
shard 1 shard 2 shard N
Tuesday, November 12, 13
apply to a side that is inactive
![Page 102: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/102.jpg)
Schemanator
Tuesday, November 12, 13
![Page 103: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/103.jpg)
Tuesday, November 12, 13
!! explain the config push process a bitalso this is used to apply the alters
![Page 104: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/104.jpg)
Tuesday, November 12, 13
![Page 105: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/105.jpg)
shard 1 shard 2 shard N
Tuesday, November 12, 13
![Page 106: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/106.jpg)
shard 1 shard 2 shard N
SET SQL_LOG_BIN = 0; ALTER TABLE user ....
Tuesday, November 12, 13
![Page 107: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/107.jpg)
Tuesday, November 12, 13
![Page 108: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/108.jpg)
Tuesday, November 12, 13
check two things in test phase:- schema applies to blank db- table validates against our sql standards
![Page 109: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/109.jpg)
shard migration
Tuesday, November 12, 13
migration of data from one shard to another
![Page 110: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/110.jpg)
Why?
Tuesday, November 12, 13
why migrate data?
![Page 111: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/111.jpg)
Prevent disk from filling
Tuesday, November 12, 13
![Page 112: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/112.jpg)
Prevent disk from fillingHigh traffic objects (shops, users)
Tuesday, November 12, 13
high traffic == disk usage and I/O util
![Page 113: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/113.jpg)
Prevent disk from fillingHigh traffic objects (shops, users)Shard rebalancing
Tuesday, November 12, 13
rebalancing when adding new shards or shards fill unequally
![Page 114: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/114.jpg)
When?
Tuesday, November 12, 13
![Page 115: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/115.jpg)
Tuesday, November 12, 13
users per shard
![Page 116: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/116.jpg)
Balance
Tuesday, November 12, 13
how many users on each shard
![Page 117: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/117.jpg)
# migrate_object User 5307827 2
per object migration <object type> <object id> <shard>
Tuesday, November 12, 13
![Page 118: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/118.jpg)
# migrate_pct User 25 3 6
percentage migration <object type> <percent> <old shard> <new shard>
Tuesday, November 12, 13
![Page 119: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/119.jpg)
index
shard 1 shard 2 shard N
user_id shard_id migration_lock old_shard_id
1 1 0 0
Tuesday, November 12, 13
![Page 120: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/120.jpg)
index
shard 1 shard 2 shard N
user_id shard_id migration_lock old_shard_id
1 1 1 0
•Lock
Tuesday, November 12, 13
explain about the lock, what happens in app, reads vs. writes
![Page 121: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/121.jpg)
index
shard 1 shard 2 shard N
user_id shard_id migration_lock old_shard_id
1 1 1 0
•Lock•Migrate
Tuesday, November 12, 13
![Page 122: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/122.jpg)
index
shard 1 shard 2 shard N
user_id shard_id migration_lock old_shard_id
1 1 1 0
•Lock•Migrate•Checksum
Tuesday, November 12, 13
checksum is a count(*) on each table
![Page 123: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/123.jpg)
index
shard 1 shard 2 shard N
user_id shard_id migration_lock old_shard_id
1 1 1 0
•Lock•Migrate•Checksum
Tuesday, November 12, 13
![Page 124: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/124.jpg)
index
shard 1 shard 2 shard N
user_id shard_id migration_lock old_shard_id
1 2 0 1
•Lock•Migrate•Checksum•Unlock
Tuesday, November 12, 13
![Page 125: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/125.jpg)
index
shard 1 shard 2 shard N
user_id shard_id migration_lock old_shard_id
1 2 0 1
•Lock•Migrate•Checksum•Unlock•Delete (from old shard)
Tuesday, November 12, 13
deletes are out of band, auto-back off by looking at connection metrics
![Page 126: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/126.jpg)
Logical Shards
Tuesday, November 12, 13
Writing data into the new shard, deleting data from the old shard and then optimizing every single table is a large amount of workInstead can run a mysql process with many databases
![Page 127: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/127.jpg)
dbshard38mysqldb_300db_301db_302db_303db_304db_305
....
Tuesday, November 12, 13
with this, slave replication is multiplied by the number of logical shards per box (assuming even distribution of writes)
![Page 128: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/128.jpg)
'etsy_shard_001_A' => 'mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_001_B' => 'mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_002_A' => 'mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw', 'etsy_shard_002_B' => 'mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw',
'etsy_shard_100_A' => 'mysql:host=dbshard50.ny4.etsy.com;port=3306;dbname=etsy_shard_100;user=etsy_rw', 'etsy_shard_100_B' => 'mysql:host=dbshard51.ny4.etsy.com;port=3306;dbname=etsy_shard_100;user=etsy_rw', 'etsy_shard_101_A' => 'mysql:host=dbshard50.ny4.etsy.com;port=3306;dbname=etsy_shard_101;user=etsy_rw', 'etsy_shard_101_B' => 'mysql:host=dbshard51.ny4.etsy.com;port=3306;dbname=etsy_shard_101;user=etsy_rw',
same mysql instance different database/schema
Tuesday, November 12, 13
![Page 129: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/129.jpg)
Advantages
• multi threaded slave• simpler migrations
Tuesday, November 12, 13
In MySQL 5.6 we have multi-threaded slave but it can only do parallel processing if we have multiple MySQL schemas (databases).
The cons is we have many more logical shards to maintain
![Page 130: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/130.jpg)
Logical Shard Migrations
Tuesday, November 12, 13
Lets walk through a logical shard migration...
![Page 131: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/131.jpg)
dbshard41
db_300db_301
....db_312
dbshard42
db_300db_301
....db_312
dbshard61
dbshard62
Tuesday, November 12, 13
Suppose dbshard 41/42 have shard dbs 300 - 312 and we want to move half of them to a new shard pair (61/62)
![Page 132: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/132.jpg)
dbshard41
db_300db_301
....db_312
dbshard42
db_300db_301
....db_312
dbshard61
dbshard62
restore backup
db_300db_301
....db_312
db_300db_301
....db_312
Tuesday, November 12, 13
We restore last night's backup from 41 onto 61 and 62
![Page 133: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/133.jpg)
dbshard41
db_300db_301
....db_312
dbshard42
db_300db_301
....db_312
dbshard61
dbshard62
db_300db_301
....db_312
db_300db_301
....db_312
slave
slave
Tuesday, November 12, 13
Set up 62 to slave from 61, and 61 to slave from 41 starting from where the backup stopped.
![Page 134: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/134.jpg)
dbshard41
db_300db_301
....db_312
dbshard42
db_300db_301
....db_312
dbshard61
dbshard62
db_300db_301
....db_312
db_300db_301
....db_312
slave
slave
Tuesday, November 12, 13
Once 61 and 62 are all caught up, change the config such dbshard42 is disabled, and all writes/reads go to dbshard41
![Page 135: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/135.jpg)
dbshard41
db_300db_301
....db_312
dbshard42
db_300db_301
....db_312
dbshard61
dbshard62
db_307
....db_312
db_300db_301
....db_312
slave
slave
config:db_307-312change from
dbshard 41 to 61
Tuesday, November 12, 13
Then change the config for db_307 through 312 on dbshard41 to point to dbshard61.
![Page 136: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/136.jpg)
dbshard41
db_300db_301
....db_312
dbshard42
db_300db_301
....db_312
dbshard61
dbshard62
db_307
....db_312
db_300db_301
....db_312
slave
Tuesday, November 12, 13
Reset dbshard61 slave to point to dbshard62 instead. So now we have Master-Master going.
![Page 137: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/137.jpg)
dbshard41
db_300db_301
....db_312
dbshard42
db_300db_301
....db_312
dbshard61
dbshard62
db_307
....db_312
db_307
....db_312
slaveconfig:db_307-312change from
dbshard 42 to 62
Tuesday, November 12, 13
Change db_307 through 312 on dbshard42 in the config to point to dbshard62.
![Page 138: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/138.jpg)
dbshard41
db_300db_301
....db_306
dbshard42
db_300db_301
....db_306
dbshard61
dbshard62
db_307
....db_312
db_307
....db_312
slave
Tuesday, November 12, 13
And we're done. Drop db_307 through db_312 on dbshard41/42, re-enable writes on 42
![Page 139: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/139.jpg)
Other Tools
Tuesday, November 12, 13
![Page 140: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/140.jpg)
mysqlsummary
Tuesday, November 12, 13
essentially just reformatting show processlist
![Page 141: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/141.jpg)
% mysqlsummary.pl --host dbshard31
Details for dbshard31==================================
COMMAND SUMMARY=============== Sleep 211 (96.79%) Execute 2 (0.92%) Connect 2 (0.92%) Binlog Dump 2 (0.92%) Query 1 (0.46%)
Tuesday, November 12, 13
![Page 142: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/142.jpg)
HOST SUMMARY============ meteor03 10 (4.59%) meteor01 8 (3.67%) web0228 3 (1.38%) api05 3 (1.38%) worker05 3 (1.38%) worker12 3 (1.38%)
SCRIPT SUMMARY============== Job: ShopStats/calculate 1 (0.46%) Job: NewsFeed/refresh 1 (0.46%)
SQL SUMMARY=========== select 1 (0.46%) SELECT 1 (0.46%) SHOW 1 (0.46%)
Tuesday, November 12, 13
![Page 143: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/143.jpg)
COMMAND TIMINGS===============----------------------------------------------------------------------+ HOST: worker19, USER: , DB: 2, TIME: 4----------------------------------------------------------------------select * from activity where owner_id = 7395036 and owner_type_id = 2 and deleted = 0 and creation_time >= 1382226430 and public = 1 order by creation_time desc limit 0,50
----------------------------------------------------------------------+ HOST: worker27, USER: , DB: 2, TIME: 4----------------------------------------------------------------------SELECT * FROM shop_stats WHERE shop_id = 5902046 AND currency_code = 'USD' AND sales_year = 2012 AND id != 2432609442
Tuesday, November 12, 13
![Page 144: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/144.jpg)
ORM REPL
Tuesday, November 12, 13
![Page 145: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/145.jpg)
% php-repl[1] etsy-php> EtsyORM::getFinder('User');
→ object(EtsyModel_UserFinder)( 0 => 'countAll( SELECT count(*) FROM User )', 1 => 'findByLoginName ( $login_name )', 2 => 'findByEmail ( $primary_email )',...
Tuesday, November 12, 13
![Page 146: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/146.jpg)
qtop
Tuesday, November 12, 13
we send queries over UDP from our ORM, stick them in a db and to analyze laterrequest context: request id, logged in user-id, what script is executingavoid the perf hit of slow query log, and its realtime across all shards because it originates from the client
![Page 147: The Shard Revisited: Tools and Techniques Used at Etsy](https://reader033.vdocuments.us/reader033/viewer/2022052823/5553f7ceb4c90544428b4aad/html5/thumbnails/147.jpg)
Thank you
etsy.com/jobs
Tuesday, November 12, 13