data infrastructure and hadoop at linkedin

Post on 02-Nov-2014

946 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Big data and Hadoop

September 2012

Hari Shankar Menon

Software engineer

LinkedIn

1

2

LinkedIn Engineering Data warehouse team

Previously, Software engineer @Clickable– Worked on building the reporting and analytics platform on

Hadoop and HBase.

Hadoop and Open-source enthusiast

About me

3

About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges

Agenda

Our mission

Connect the world’s professionals to make them more productive and successful

4

5

*as of Nov 4, 2011**as of June 30, 2011

2004 2005 2006 2007 2008 2009 2010

2 48

17

32

55

90

LinkedIn Members (Millions)

175M+

85%Fortune 100 Companies use LinkedIn to hire

Company Pages

>2M

**

New Members joining

~2/sec

Professional searches in 2011

~4.2B

LinkedIn by numbers

6

About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges

* Chart from Philip Russom- Research Director: TDWI

What is big data?

8

Infrastructure technologies

Databus

Primary data store (Front-end)Distributed key-value store

Document-oriented store

Distributed PubSub messaging

Search technologies

Database change replication SenseiDB

Zoie Bobo

9

http://data.linkedin.com/opensource

Open source

10

About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges

11

What is Hadoop Evolution of Hadoop Impact

12

Recommendation systems– Generating recommendations– Modeling– A/B Testing– Grandfathering

Data warehouse/ETL– Raw data storage– Aggregations– Heavy lifting

Data sciences– Strategic analyses– Experimentation sandbox

@

13

Pandora Search for People

Events YouMay BeInterested In

Groups browse maps

The Recommendations opportunity

• Relevance/Latency

• Offline computation

• Caching

14

Improving recommendations

• Mathematical modeling

• A/B Testing

• Grandfathering

15

Hadoop in the Data warehouse

• Source of truth• Lower retention• Ad-hoc analysis

• Longer retention• Complex

transformations• Algorithmic

computations

16

Hadoop in Data Sciences

• Deep dives

• Sandbox

• Hackday projects

17

Data Insights - 1

Job migration after financial collapse

18

Data Insights - 2

19

Data Insights - 3

20

About LinkedIn Data Infrastructure overview Hadoop@LinkedIn Challenges

21

1. User adoption of new technologies

2. Real-time processing

3. Graph/Network algorithms

4. Making data accessible

Challenges

22

User adoption

23

• Challenges• Random reads/writes• Warm-up time

• Solutions• Parts of the problem that can be moved offline?• HBase, Voldemort

Real-time processing

24

• Graph problems• Traditional joins

Map-reduce-incompatible problems

25

• Hadoop Tons of data

Making data accessible

26

Finally!

No Silver bullet

Hadoop Offline processing

Scalability by design

27

www.linkedin.com/in/harisreekumar

www.linkedin.com/company/linkedin/careers

top related