linkedin graph presentation
DESCRIPTION
Chris Conrad (Senior Engineering Manager) and Igor Perisic (Senior Director Engineering) from LinkedIn gave this talk to UC Santa Barbara in 2012.TRANSCRIPT
![Page 1: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/1.jpg)
The Evolution of the Professional Graph at LinkedIn
Chris Conrad Senior Engineering Manager,
Social Graph
Igor Perisic Sr. Director of Engineering, SNA
![Page 2: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/2.jpg)
LinkedIn • The site officially launched on May 5, 2003. At the end of the first
month in operation, LinkedIn had a total of 4,500 members in the network.
• As of January 9, 2013, LinkedIn operates the world’s largest professional network on the Internet with more than 200 million members in over 200 countries and territories.
• As of September 30, 2012, LinkedIn counts executives from all 2012 Fortune 500 companies as members; its corporate talent solutions are used by 85 of the Fortune 100 companies.
• As of the school year ending May 2012, there are over 20 million students and recent college graduates on LinkedIn. They are LinkedIn's fastest-growing demographic.
![Page 3: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/3.jpg)
In the beginning…
![Page 4: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/4.jpg)
The Cloud • Cloud is the original name of our graph engine
• Responsible for read scaling graph queries (and it used to do search, too)
• Stored 4 primary sets of data:
Member Data
Group Membership
Network Cache
Connections
Cloud
![Page 5: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/5.jpg)
What was wrong? • Large memory footprint
– Network cache used simple but inefficient data structures
– The size and density of the graph was increasing
• Garbage Collector woes – Large JVM heap caused long GC pauses
– Long GC pauses reduces availability resulting in site outages
![Page 6: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/6.jpg)
C++ Graph • First project: migrate the network cache to a new data structure to
reduce memory usage
• Second project: implement a C++ JNI library to move the graph data off heap
• Result: Drastic reduction in JVM heap utilization
Member Data
Group Membership
Network Cache
Java Heap libGraphJNI.so
Connections
Cloud
![Page 7: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/7.jpg)
Several million users later
![Page 8: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/8.jpg)
New Problems • Growth
– The size and density of the graph was increasing
– We were running out of memory
– We were running out of CPU cycles
– Proliferation of services increased the overhead of maintaining client side software load balancer
– As of September 30, 2012, LinkedIn has 3,177 full-time employees located around the world. LinkedIn started off 2012 with about 2,100 full-time employees worldwide, up from around 1,000 at the beginning of 2011 and about 500 at the beginning of 2010.
• C++ code had a much higher maintenance cost – Coredumps are much less friendly than a NullPointerException
– LinkedIn didn’thave the expertise or infrastructure to support C++ development
![Page 9: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/9.jpg)
Split cloud • cloud-session: Move the load balancing logic into a service we
control
• rgraph: Extract the C++ graph into its own service
Member Data
Group Membership
Network Cache
Java Heap
Cloud
libGraphJNI.so
Connections
rgraph
cloud-session
![Page 10: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/10.jpg)
New problems, same as the old • rgraph instances still had a large memory footprint
– The density of the graph was increasing
– We were running out of memory
– We were running out of CPU cycles
• cloud-session’s software load balancer implementation was essentially a single point of failure
![Page 11: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/11.jpg)
Distribute the Graph • Introduce Norbert a new cluster management system
• Partition the graph data
• Partition the network cache service
Member Data
Java Heap
Cloud
cloud-session
Connections
Group Membership
dgraph
Network Cache Service
![Page 12: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/12.jpg)
Mission Accomplished
![Page 13: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/13.jpg)
So now what?
![Page 14: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/14.jpg)
My Connections
![Page 15: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/15.jpg)
Common Connections
![Page 16: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/16.jpg)
My Network
![Page 17: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/17.jpg)
How am I connected?
![Page 18: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/18.jpg)
What is the professional graph? • LinkedIn connections
• Current and past co-workers
• University colleagues and alumni
• Group members
• And what about geography, industry and skill overlap?
![Page 19: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/19.jpg)
New requirements • Members aren’t the only type of node in the professional graph
• LinkedIn connections aren’t the only type of edge in the profession graph
• We already supported groups and group membership
![Page 20: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/20.jpg)
Making changes was hard • Code was rigid
– Data was stored using class hierarchies, introducing data types was prohibitively slow
– Queries were built by combining object instances
• BDBJE
• Everything was back in the heap
– Garbage collection time was starting to go up
– GC pauses no longer caused outages, but flapping introduced high developer and operational overhead
![Page 21: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/21.jpg)
Graph as a Service • Custom persistence engine
– Log structured
– Memory mapped files keeps data out of the Java heap
– Data described using DDL like schema
• Custom SQL like query language – Query language understands DDL
– Text based language reduces code changes
![Page 22: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/22.jpg)
Graph Queries • Company(:id)[CompanyFollowers] • Member(:id)[MemberToMember{CreatedAt > :t}]
• Member(:id)[topN(MemberToMember, Score, 10)]
![Page 23: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/23.jpg)
What do we have in common?
![Page 24: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/24.jpg)
How am I connected?
![Page 25: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/25.jpg)
What’s next? • Online schema migration
• Automated repartitioning and data migration
• Automated provisioning
• Hierarchical data partitioning
• Monitoring and statistics
• Query optimization
• Query fragment caching
• Result set caching
• Query parallelization
• Very large data set handling
• …
![Page 26: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/26.jpg)
2 4 8
17
32
55
90
2004 2005 2006 2007 2008 2009 2010 2011 LinkedIn Members (Millions)
200M+
25th Most visit website worldwide (Comscore 6-12)
Company pages
>2.6M
63% non U.S.
2/sec
85% Fortune 100 Companies use LinkedIn to hire
And we’re still growing
![Page 28: LinkedIn Graph Presentation](https://reader033.vdocuments.us/reader033/viewer/2022060108/55502097b4c905af648b52de/html5/thumbnails/28.jpg)
Q&A