pig at linkedin
DESCRIPTION
Pig at LinkedIn by Chris Riccomini from LinkedInPig is an integral part of data analytics at LinkedIn. Learn about LinkedIn’s analytic stack, and see how Pig is used to design, develop, and deliver data products at LinkedIn. We’ll explore a successful example of Pig deployment at LinkedIn, pain points, and integration with Azkaban, Voldemort, Hadoop, and the rest of LinkedIn’s ecosystem.TRANSCRIPT
Pig at LinkedinChris Riccomini
9/29/10
Who?
What?
LinkedIn Analytics
Pig at LinkedIn
Why?
Production Quality
Streaming
Serialization
VoldemortStorage ~ Avro
views = LOAD '/data/awesome' USING VoldemortStorage();
Voldemort ♥ Pig
Partitioning
YYYY/MM/DD
Last N days?
views = LOAD '/data/etl/tracking/extracted/profile-view' USING VoldemortStorage('date.range', 'num.days=90;days.ago=1’)
Some-file-YYYY-MM-DD
member_position = LOAD '/data/etl/replicated/member/member_position/#LATEST' USING VoldemortStorage()
Scheduling
Azkaban
type=pig
pig.script=myscript.pig
Ad hoc?
Future at LinkedIn
Wishes
Dates
Fix Data Types
JSON
Cross Platform
Questions?
• [email protected]• http://www.riccomini.name• http://www.sna-projects.com• http://www.project-voldemort.com• @criccomini• LinkedIn is Hiring! Email me!