Data infrastructure and Hadoop at LinkedIn

Download Data infrastructure and Hadoop at LinkedIn

Post on 02-Nov-2014

903 views

Category:

Technology

1 download

DESCRIPTION

 

TRANSCRIPT

  • 1. Big data and HadoopSeptember 2012Hari Shankar MenonSoftware engineerLinkedIn 1
  • 2. About me LinkedIn Engineering Data warehouse team Previously, Software engineer @Clickable Worked on building the reporting and analytics platform on Hadoop and HBase. Hadoop and Open-source enthusiast 2
  • 3. Agenda About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges 3
  • 4. Our missionConnect the worlds professionals to make them more productive and successful 4
  • 5. LinkedIn by numbers 175M+ 90 ~2/sec New Members joining >2M 55 Company Pages 32 85% Fortune 100 Companies use LinkedIn to** hire 17 2 4 8 ~4.2B Professional2004 2005 2006 2007 2008 2009 2010 searches in 2011 LinkedIn Members (Millions) *as of Nov 4, 2011 **as of June 30, 2011
  • 6. About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges 6
  • 7. What is big data?* Chart from Philip Russom- Research Director: TDWI
  • 8. Infrastructure technologies Search technologies Primary data store (Front-end) Document-oriented store Distributed key-value store Distributed PubSub messaging Database change replication SenseiDB Zoie Bobo 8
  • 9. Open sourcehttp://data.linkedin.com/opensource 9
  • 10. About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges 10
  • 11. What is Hadoop Evolution of Hadoop Impact 11
  • 12. @ Recommendation systems Generating recommendations Modeling A/B Testing Grandfathering Data warehouse/ETL Raw data storage Aggregations Heavy lifting Data sciences Strategic analyses Experimentation sandbox 12
  • 13. The Recommendations opportunity Relevance/Late Pandora Search for People ncy Offline computation Events You Groups browse maps May Be Interested In Caching 13
  • 14. Improving recommendations Mathematical modeling A/B Testing Grandfathering 14
  • 15. Hadoop in the Data warehouse Longer retention Source of truth Complex Lower retention transformations Ad-hoc analysis Algorithmic computations 15
  • 16. Hadoop in Data Sciences Deep dives Sandbox Hackday projects 16
  • 17. Data Insights - 1 Job migration after financial collapse 17
  • 18. Data Insights - 2 18
  • 19. Data Insights - 3 19
  • 20. About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges 20
  • 21. Challenges1. User adoption of new technologies2. Real-time processing3. Graph/Network algorithms4. Making data accessible 21
  • 22. User adoption 22
  • 23. Real-time processing Challenges Random reads/writes Warm-up time Solutions Parts of the problem that can be moved offline? HBase, Voldemort 23
  • 24. Map-reduce-incompatible problems Graph problems Traditional joins 24
  • 25. Making data accessible Hadoop Tons of data 25
  • 26. Finally!No Silver bulletHadoop Offline processingScalability by design 26
  • 27. www.linkedin.com/in/harisreekumarwww.linkedin.com/company/linkedin/careers 27