how medium uses neo4j

38
How Medium uses Neo4j Nathaniel Felsen May 20th, 2015

Upload: nathaniel-felsen

Post on 14-Apr-2017

478 views

Category:

Technology


0 download

TRANSCRIPT

How Medium uses Neo4jNathaniel FelsenMay 20th, 2015

About Me

Nathaniel FelsenData & DevOps engineer

[email protected]@faitlezen

Agenda

• What is Medium and what problem we are solving through Neo4j ?• Why did we pick Neo4j ?• Our architecture• Steps taken and obstacles encountered while going live• Improve Neo4j’s performances• Live Demo

• Easiest, fastest way to create a

beautiful story• Seamless integration of photos,

audio & video• Optimized for web, tablet &

mobile

Medium is a beautiful publishing experience.

Medium is a home for influential contributors.

Medium is a place for important ideas.

• Follow, share, recommend• Personalized story feed• Customized daily emails &

notifications

Medium is a network that builds audience.

Datastore Selection Process

DynamoDB

• Need to nail the schema ≠ Experimentation • Limited ways of querying data• Things like short path between users won’t

perform well

Pros

• Expertise• Already used to store user info• No maintenance • No hardware

Cons

• Using Relational database to

create graphs• Sharding

Pros

• Used by lots of people and

heavily vetted.• Less rampup for learning

querying language.• Strong community.Cons

Pros

Flock DB

• Not maintained anymore• 2 tiers model• Deal with sharding in the near

future

• Expertise• SQL Lite syntax• Open Source / Free

Cons

• Not free• No expertise in house• Requires hardware

Pros

• Easy to start• Easy to experiment• Good community• Enteprise edition: HA, Backup, Support

Cons

Architecture

Our Social Service Architecture

Nodes• User• Post • Collection

Relationships• Edited• Wrote• Published• Recommended• Followed• …

Use queues for the writes

Write are done to the master only

If you lose your master, you need

to wait for a new election

Productionising Neo4j

Capacity planning

Initial Data Import

Metrics / Monitoring• Architecture• Systems• Neo4j• Dataset• Java• Services that interact with Neo4j

Logs aggregation / Indexing

• ElasticSearch• Logstash• Logstash forwarder• Kibana

Backups

• Incremental Backup• Full Backup

Runbook / Playbook

Getting optimal performance with Neo4j

Talk to the support

What Neo4j is good and not as good at

Long Traversal Where NOT Dense / Super Node

Cypher Trickshttp://watch.neo4j.org/video/84900121

Tune the configuration over time

• Java Garbage collection (stop the world)

• Neo4j settings

Cache Settings

Neo4

J 2.0

and

2.1

Neo4

J 2.2

Server Plugins & Unmanaged Extensions

• Easy to Deploy• The server’s functionality can be extended by adding plugins.• RESTful Web Services (JAX-RS)• Put more logic in the code like caching• Sharp tool

Demo

Followers who recommended a story

Top Recommended stories

People Recommended to follow

Collaborative Filtering

[email protected]

Questions ? Feedback ?We are hirin

g