pig at linkedin

Post on 20-Jan-2015

2.128 Views

Category:

Education

7 Downloads

Preview:

Click to see full reader

DESCRIPTION

Pig at LinkedIn by Chris Riccomini from LinkedInPig is an integral part of data analytics at LinkedIn. Learn about LinkedIn’s analytic stack, and see how Pig is used to design, develop, and deliver data products at LinkedIn. We’ll explore a successful example of Pig deployment at LinkedIn, pain points, and integration with Azkaban, Voldemort, Hadoop, and the rest of LinkedIn’s ecosystem.

TRANSCRIPT

Pig at LinkedinChris Riccomini

9/29/10

Who?

What?

LinkedIn Analytics

Pig at LinkedIn

Why?

Production Quality

Streaming

Serialization

VoldemortStorage ~ Avro

views = LOAD '/data/awesome' USING VoldemortStorage();

Voldemort ♥ Pig

Partitioning

YYYY/MM/DD

Last N days?

views = LOAD '/data/etl/tracking/extracted/profile-view' USING VoldemortStorage('date.range', 'num.days=90;days.ago=1’)

Some-file-YYYY-MM-DD

member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST' USING VoldemortStorage()

Scheduling

Azkaban

type=pig

pig.script=myscript.pig

Ad hoc?

Future at LinkedIn

Wishes

Dates

Fix Data Types

JSON

Cross Platform

Questions?

• criccomini@linkedin.com• http://www.riccomini.name• http://www.sna-projects.com• http://www.project-voldemort.com• @criccomini• LinkedIn is Hiring! Email me!

top related