apache pig introduction

Post on 31-May-2015

270 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Jackson Oliveira@cyber_jsoSoftware Engineer

APACHE PIG

A High Level Analysis Platform

Which can be plugged on Hadoop

How it works?

How it works?

What is the point in using PIG?!

MR is not difficult in theory...

But the reality can be different...

We want it easy to understand

Users = LOAD 'myfile.txt' ‘users’ USING PigStorage('\t') AS (name, age);

Filtered = FILTER Users BY age >= 18 AND age <= 25;

Pages = LOAD ‘pages’ AS (user, url);

Joined = JOIN Filtered BY name, Pages BY user;

Grouped = GROUP Joined BY url;

Summed = FOREACH Grouped generate GROUP, COUNT(Joined) AS clicks;

Sorted = ORDER Summed BY clicks DESC;

Also easy to extend (UDFs)...

It takes care of the execution plan for you

When use apache pig?

If you want thing being done faster

An active community

You might need rethink complicated things

top related