![Page 1: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/1.jpg)
Building a Social Platform
Part 3: Scaling the Data Feed
![Page 2: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/2.jpg)
Socialite
• Reference Implementation – Various Fanout Feed Models– User Graph Implementation– Content storage
• Configurable models and options• REST API in Dropwizard (Yammer)– https://dropwizard.github.io/dropwizard/
• Built-in benchmarking
https://github.com/10gen-labs/socialite
![Page 3: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/3.jpg)
Architecture
Graph Service
Proxy
Cont
ent
Prox
y
![Page 4: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/4.jpg)
Feed Service
• Two main functions :– Aggregating “followed” content for a user– Forwarding user’s content to “followers”
• Common implementation models :– Fanout on read
• Query content of all followed users on fly– Fanout on write
• Add to “cache” of each user’s timeline for every post• Various storage models for the timeline
![Page 5: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/5.jpg)
Fanout On Read
![Page 6: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/6.jpg)
Fanout On Read
Pros
Simple implementationNo extra storage for timelines
Cons
– Timeline reads (typically) hit all shards– Often involves reading more data than required– May require additional indexing on Content
![Page 7: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/7.jpg)
Fanout On Write
![Page 8: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/8.jpg)
Fanout On Write
Pros
Timeline can be single document readDormant users easily excludedWorking set minimized
Cons
– Fanout for large follower lists can be expensive– Additional storage for materialized timelines
![Page 9: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/9.jpg)
Fanout On Write
• Three different approaches– Time buckets– Size buckets– Cache
• Each has different pros & cons
![Page 10: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/10.jpg)
Timeline Buckets - Time
Upsert to time range buckets for each user> db.timed_buckets.find().pretty(){
"_id" : {"_u" : "jsr", "_t" : 516935},"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"},{"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"}
]}{
"_id" : {"_u" : "ian", "_t" : 516935},"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}]
}{
"_id" : {"_u" : "jsr", "_t" : 516934 },"_c" : [
{"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"}]
}
![Page 11: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/11.jpg)
Timeline Buckets - Size
More complex, but more consistently sized> db.sized_buckets.find().pretty(){
"_id" : ObjectId("...122"),"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"},{"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"},{"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"}
],"_s" : 3,"_u" : "jsr"
}{
"_id" : ObjectId("...011"),"_c" : [
{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}],"_s" : 1,"_u" : "ian"
}
![Page 12: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/12.jpg)
Timeline - CacheStore a limited cache, fall back to fanout on read
– Create single cache doc on demand with upsert– Limit size of cache with $slice– Timeout docs with TTL for inactive users
> db.timeline_cache.find().pretty(){
"_c" : [{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"},{"_id" : ObjectId("...dd2"), "_a" : "ian", "_m" : "message from ian"},{"_id" : ObjectId("...da7"), "_a" : "ian", "_m" : "earlier from ian"}
],"_u" : "jsr"
}{
"_c" : [{"_id" : ObjectId("...dc1"), "_a" : "djw", "_m" : "message from daz"}
],"_u" : "ian"
}
![Page 13: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/13.jpg)
Embedding vs Linking ContentEmbedded content for direct access– Great when it is small, predictable in size
Link to content, store only metadata
– Read only desired content on demand– Further stabilizes cache document sizes
> db.timeline_cache.findOne({”_id" : "jsr"}){
"_c" : [{"_id" : ObjectId("...dc1”)},{"_id" : ObjectId("...dd2”)},{"_id" : ObjectId("...da7”)}
],”_id" : "jsr"
}
![Page 14: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/14.jpg)
Socialite Feed Service
• Implemented four models as plugins– FanoutOnRead– FanoutOnWrite – Buckets (size)– FanoutOnWrite – Buckets (time)– FanoutOnWrite - Cache
• Switchable by config• Store content by reference or value• Benchmark-able back to back
![Page 15: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/15.jpg)
Benchmark by feed type
![Page 16: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/16.jpg)
Benchmarking the Feed
• Biggest challenge: scaling the feed• High cost of "fanout on write"
• Popular user posts => # operations:– Content collection insert: 1– Timeline Cache: on average, 130+ cache document
updates• SCATTER GATHER (slowest shard determines latency)
![Page 17: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/17.jpg)
Benchmarking the Feed
• Timeline is different from content! – "It's a Cache"
IT CAN BE REBUILT!
![Page 18: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/18.jpg)
Benchmarking the Feed
• MongoDB as a cache
![Page 19: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/19.jpg)
IT CAN BE REBUILT!
Effect of removing the cache and forcing drop-back to fanout on read and rebuilding of the cache:
Benchmarking the Feed
![Page 20: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/20.jpg)
Benchmarking the Feed
![Page 21: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/21.jpg)
Benchmarking the Feed
![Page 22: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/22.jpg)
Benchmarking the Feed
• Results– last two weeks– ran load with one million users– ran load with ten million users (currently running)– used avg send rate 1K/s; 2K/s; reads 10K-20k/s
– 22 AWS c3.2xlarge servers (7.5GB RAM)– 18 across six shards (3 content, 3 user graph)– 4 mongos and app machines
– 2 c2x4xlarge servers (30GB RAM)– timeline feed cache (six shards)
![Page 23: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/23.jpg)
Summary
![Page 24: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/24.jpg)
Socialite
• Real Working Implementation – Implements All Components– Configurable models and options
• Built-in benchmarking
• Questions? – We will be at "Ask The Experts" this afternoon!
https://github.com/10gen-labs/socialite
https://github.com/10gen-labs/socialite
![Page 25: Socialite, the Open Source Status Feed Part 3: Scaling the Data Feed](https://reader033.vdocuments.us/reader033/viewer/2022061105/540024a78d7f7261088b49dc/html5/thumbnails/25.jpg)
https://github.com/10gen-labs/socialite
Thank You!