apache pig introduction

16
Jackson Oliveira @cyber_jso Software Engineer APACHE PIG

Upload: jackson-dos-santos-olveira

Post on 31-May-2015

270 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Apache PIG introduction

Jackson Oliveira@cyber_jsoSoftware Engineer

APACHE PIG

Page 2: Apache PIG introduction

A High Level Analysis Platform

Page 3: Apache PIG introduction

Which can be plugged on Hadoop

Page 4: Apache PIG introduction
Page 5: Apache PIG introduction

How it works?

Page 6: Apache PIG introduction

How it works?

Page 7: Apache PIG introduction

What is the point in using PIG?!

Page 8: Apache PIG introduction

MR is not difficult in theory...

Page 9: Apache PIG introduction

But the reality can be different...

Page 10: Apache PIG introduction

We want it easy to understand

Users = LOAD 'myfile.txt' ‘users’ USING PigStorage('\t') AS (name, age);

Filtered = FILTER Users BY age >= 18 AND age <= 25;

Pages = LOAD ‘pages’ AS (user, url);

Joined = JOIN Filtered BY name, Pages BY user;

Grouped = GROUP Joined BY url;

Summed = FOREACH Grouped generate GROUP, COUNT(Joined) AS clicks;

Sorted = ORDER Summed BY clicks DESC;

Page 11: Apache PIG introduction

Also easy to extend (UDFs)...

Page 12: Apache PIG introduction

It takes care of the execution plan for you

Page 13: Apache PIG introduction

When use apache pig?

Page 14: Apache PIG introduction

If you want thing being done faster

Page 15: Apache PIG introduction

An active community

Page 16: Apache PIG introduction

You might need rethink complicated things