poster hadoop summit 2011: pig embedding in scripting languages

1

PIG Scripting Making Pig Turing-complete through embedding in a scripting language Julien Le Dem - Yahoo Solution 2: using Pig scripting Embedded Pig calls in Python Python functions are automatically available as UDFs Python variables can be used in the pig scripts $n, $i, ... UDFs use standard Python constructs (tuple/list/dictionary ) automatically converted to Pig. UDFs take the elements of the tuple as parameters, not a tuple. The output schema is specified using a decorator (it can be a function if you need to manipulate the input schema) Unfortunately this solution does not fit here. It requires 7 artifacts: - 3 Java UDFs: 3 classes must be compiled and packaged in a jar. The average UDF size is 50 to 100 lines of code. - 3 Pig scripts in 3 separate files: init, main loop content, finalization. The average Pig script size is 5 to 10 lines. - Main program: executes and coordinates the Pig scripts. It contains about 50 lines of code. 1 file required: Modification flow: Modification flow: Overview Example: Transitive closure - Iterative process: requires a loop and a termination condition - Requires multiple join/group by: typical Pig usage - Requires User Defined Functions - Map/Reduce: a solution to process big data. - Pig: makes it easier to manipulate data on the grid. - Pig scripting: makes Pig easier with iterative algorithms and User Defined Functions. PIG-928 (in 0.8) Solution 1: plain Pig Pig execution model 7 files required: This solution does fit on the poster. It requires 1 artifact, containing: - 3 UDFs provided as functions: The average UDF size is 8 to 12 lines of code. - 3 Pig queries: init, main loop content, finalization. Each Pig query adds an average 5 to 10 lines. - Main function: executes and coordinates the Pig scripts. It adds about 10 lines to Access to the output of pig scripts

Upload: julien-le-dem

Post on 29-Jun-2015

1.024 views

Category:

Technology

0 download

Report

Download

Tags:

Embed Size (px):

DESCRIPTION

Poster presented at Hadoop summit 2011

TRANSCRIPT

Page 1: Poster Hadoop summit 2011: pig embedding in scripting languages

PIG ScriptingMaking Pig Turing-complete through embedding in a scripting language

Julien Le Dem - Yahoo

Solution 2: using Pig scripting

Embedded Pig calls in Python

Python functions are automatically available as UDFs

Python variables can be used in the pig scripts $n, $i, ...

UDFs use standard Python constructs (tuple/list/dictionary) automatically converted to Pig.

UDFs take the elements of the tuple as parameters, not a tuple.

The output schema is specified using a decorator (it can be a function if you need to manipulate the input schema)

Unfortunately this solution does not fit here.

It requires 7 artifacts:

- 3 Java UDFs: 3 classes must be compiled and packaged in a jar. The average UDF size is 50 to 100 lines of code.

- 3 Pig scripts in 3 separate files: init, main loop content, finalization. The average Pig script size is 5 to 10 lines.

- Main program: executes and coordinates the Pig scripts. It contains about 50 lines of code.

1 file required:

Modification flow: Modification flow:

OverviewExample: Transitive closure

- Iterative process: requires a loop and a termination condition

- Requires multiple join/group by: typical Pig usage

- Requires User Defined Functions

- Map/Reduce: a solution to process big data.

- Pig: makes it easier to manipulate data on the grid.

- Pig scripting: makes Pig easier with iterative algorithms and User Defined Functions.

Jiras: PIG-1479, PIG-1794 (in 0.9), PIG-928 (in 0.8)

Solution 1: plain Pig

Pig execution model

7 files required:

This solution does fit on the poster.

It requires 1 artifact, containing:

- 3 UDFs provided as functions: The average UDF size is 8 to 12 lines of code.

- 3 Pig queries: init, main loop content, finalization. Each Pig query adds an average 5 to 10 lines.

- Main function: executes and coordinates the Pig scripts. It adds about 10 lines to the script.

Access to the output of pig scripts

Scripting - Home | IHO · Part 50 - Scripting 7 Figure 2 - Scripting Catalogue / Host interaction within a Scripting Domain 50-6.1 Distribution The distribution mechanism of a scripting

Scripting - Client-side vs. Server-side Scripting

Adobe Photoshop CC Scripting Guide · 8 2 Photoshop Scripting Basics This chapter provides an overview of scripting for Photoshop, describes scripting support for the scripting languages

Event Guide · PEPPA PIG ˜LIVE˚ Peppa Pig is a loveable, cheeky little pig who lives with her little brother George, Mummy Pig and Daddy Pig. Join Peppa Pig as she plays games and

Dachis Group Pig Hackday: Pig 202

The Sheep-Pig (Babe, the Gallant Pig)

APIs for Scripting - 2014 - Perforce · APIs for Scripting APIs for Scripting v Instance Methods ..... 32

AE6382 Introduction to Scripting AE 6382. What is a scripting l Scripting is the process of programming using a scripting language l A scripting language,

Keysight Technologies De-Embedding and Embedding …literature.cdn.keysight.com/litweb/pdf/5980-2784EN.pdf · Keysight Technologies De-Embedding and Embedding S ... Advanced Design

Pig Veterinary Society THE CASUALTY PIG - … Pig - April 2013-1... · 1 Pig Veterinary Society THE CASUALTY PIG Interim Update April 2013

Client side scripting and server side scripting

Embedding Environmental Sustainability - Family Repositoryfamilyrepository.lppkn.gov.my/601/1/Embedding Environmental Sustainability.pdf · Embedding Environmental Sustainability

Scripting versus Programming Scripting 1 bash

11 Jornadas Data Mining & Business Intelligence€¦ · • Scripting (Python) • Programación R • SQL / No SQL • Procesamiento paralelo • MapReduce •• Hive / Pig •

Embedding Pig in scripting languages

Open Basic Interpreter User Guide Version 1obasic.sourceforge.net/read_eng.pdf · Open Basic is developed for embedding into user application as a scripting language. Open Basic is

Next Basic AI Technique: Scripting Scripting

Introduction to Linux Scripting (Part 2): Advanced Scripting

Embedding High-Level Formal Speci cations into Applications · a scripting language that allows easy embedding of domain speci c languages (DSLs). For this, we picked and embraced

Pig and Hive Hadoop with Programming in · 2019-11-14 · for Hadoop-- Pig and Hive •Pig is a "scripting language" that excels in specifying a processing pipeline that is automatically

Scripting Languages. Client Side Scripting Languages Server side Scripting Languages

Apache Big Data Europe 2016 Power Pig with Spark...Apache Pig Procedural scripting language Pig Latin: similar to sql Heavily used for ETL Schema / No schema data, Pig eats everything

Distributed Systems - Rutgers Universitypxk/417/notes/content/20-graph-computing-slides.pdfApache Pig •Why? –Make it easy to use MapReduce via scripting instead of Java –Make

Scripting Languages By Md. Mahfuzur Rahman. Outline Overview of Scripting Languages Different Scripting Languages JavaScript (A Client-side Scripting

Introduction to Apache Pig - ut€¦ · Introduction to Apache Pig Pelle Jakovits 28 September 2016, Tartu. Outline •MapReduce recollection •Apache Pig –How to run Pig –Pig

MODULE 2 - pace.edu.in · Apache Pig Apache Pig is high level language that enables programmer to write complex MapReduce transformation using simple scripting language. Pig often

scripting. Animate audio and video embedding, and ActionScript … · 2020-05-17 · animation for television programs, online video, websites, web applications, rich internet applications,

More Scripting Techniques Scripting Process Example Script 1

MapReduce vs Pig | MapReduce Pig Integration

1 Topics: Scripting What is game scripting Scripting Features Torque Script

Adobe Photoshop CC 2014 Scripting · PDF file8 2 Photoshop Scripting Basics This chapter provides an overview of scripting for Photoshop, describes scripting support for the scripting

Developing Pig on Tez - events.static.linuxfound.org...Developing Pig on Tez Mark Wagner Committer, Apache Pig LinkedIn Cheolsoo Park VP, Apache Pig Netflix. What is Pig Apache project

Adobe Photoshop CC Scripting · PDF file8 2 Photoshop Scripting Basics This chapter provides an overview of scripting for Photoshop, describes scripting support for the scripting languages

Introduction to Apache Hadoop & Pig - SALSAHPCsalsahpc.indiana.edu/CloudCom2010/slides/PDF/tutorials/Yahoo... · Hadoop & Pig Milind Bhandarkar ... (hadoop, pig) (apache, pig) (hadoop,