karmasphere hadoop-productivity-tools

25
This slide intentionally left blank.

Upload: hadoop-user-group

Post on 20-Jan-2015

1.634 views

Category:

Education


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Karmasphere hadoop-productivity-tools

This slide intentionally left blank.

Page 2: Karmasphere hadoop-productivity-tools

State-of-the-Art Productivity Tools for Developers & Analysts

Shevek

Page 3: Karmasphere hadoop-productivity-tools

About Karmasphere

● Productivity suite for Developers and Analysts.● Point-and-drool GUI for Hadoop, Hive, Cascading, Pig.

● MapReduce development and debugging on-cluster.

● Integrated with Eclipse and NetBeans IDEs.

● Interface between a human (you!) and a Hadoop cluster.● Does the boring, tedious or repetitive bits.

● Finds the errors fast before you do.

● Works anywhere with anything.

HALP!

Karmasphere

Hockey sticks!

Page 4: Karmasphere hadoop-productivity-tools

The Idea

● Collect Underpants● ....?● Profit

But what goes in the middle?

Page 5: Karmasphere hadoop-productivity-tools

The Problem

● Collect Data● Convert to MapReduce● Execute● Debug● Tune● … Profit

Get someone else to do it!

Page 6: Karmasphere hadoop-productivity-tools

How long will it take?

● Performance

Of what? Surely not the computer.

Page 7: Karmasphere hadoop-productivity-tools

Computational Performance

Time (faster considered better)

Make this algorithm as fast as you can.

Page 8: Karmasphere hadoop-productivity-tools

Analytics Performance

But what aboutthis bit?

Or this bit?

Analytics is slightly different.

Page 9: Karmasphere hadoop-productivity-tools

Analytics Performance

But what aboutthis bit?

Or this bit?

That the human understands the problem does not mean that the computer understands the problem.

Page 10: Karmasphere hadoop-productivity-tools

Analytics Performance

But what aboutthis bit?

Or this bit?

The computer knowing the answer is not the same asthe human understanding the answer.

Page 11: Karmasphere hadoop-productivity-tools

Common MapReduce Challenges

● How do I write a Hadoop job?● Did my job work?

● If it didn't throw an exception, it worked. Right?

● Did I get the correct answer?● Are you sure?● Do you have enough information to prove that?● … to your accountants or customers?

● What happened? or What do I need to know?● Please note, this feature is now officially called the

“Job Profiler”, not the “What?! Window.”

Page 12: Karmasphere hadoop-productivity-tools

Karmasphere Studio

Page 13: Karmasphere hadoop-productivity-tools

Karmasphere Studio

Page 14: Karmasphere hadoop-productivity-tools

Common Analytical Tasks

So common, in fact, that ...

group

sort

aggregate

intersection

unique

limit

scan

join

function

hash

materialize

condition

set operations

store

catindex

Page 15: Karmasphere hadoop-productivity-tools

High Level Languages

Hive PigCascading

Page 16: Karmasphere hadoop-productivity-tools

Cascading

A workflow based language

Perfect for dylsexics like me.

Page 17: Karmasphere hadoop-productivity-tools

Pig

An imperative scripting language

data =    LOAD '$input'    AS (query:CHARARRAY,        count:INT);

queries_group =     GROUP data     BY query    PARALLEL $reducers;

queries_sum =    FOREACH queries_group     GENERATE         group AS query,         SUM(data.count) AS count;

queries_ordered = ORDER queries_sum     BY count DESC    PARALLEL $reducers;

Simple and accessible to all.

Page 18: Karmasphere hadoop-productivity-tools

Hive

An SQL-like language

FROM ( FROM (  FROM src src1 SELECT src1.key AS c1, src1.value AS c2 WHERE src1.key > 10 and src1.key < 20  ) a FULL OUTER JOIN (  FROM src src2 SELECT src2.key AS c3, src2.value AS c4 WHERE src2.key > 15 and src2.key < 25 ) b  ON (a.c1 = b.c3) SELECT a.c1 AS c1, a.c2 AS c2, b.c3 AS c3, b.c4 AS c4) cSELECT c.c1, c.c2, c.c3, c.c4

I can parse that in my head, honest.

Page 19: Karmasphere hadoop-productivity-tools

Karmasphere Analyst

FROM (  FROM src select src.key, src.value WHERE src.key < 100  UNION ALL  FROM src SELECT src.* WHERE src.key > 100) unioninputINSERT OVERWRITE DIRECTORY 'union.out' SELECT unioninput.*

Page 20: Karmasphere hadoop-productivity-tools

Karmasphere Analyst

Page 21: Karmasphere hadoop-productivity-tools

Conclusions

How long does it take to get your answers?

Page 22: Karmasphere hadoop-productivity-tools

How to get involved

● Getting started as a Hadoop Java Developer?● Download Karmasphere Studio FREE!

● Deploying Hadoop jobs in production?● Use Karmasphere Studio Professional Edition.

● Want to use high level languages like SQL?● Talk to us about Karmasphere Analyst.● Join the beta programme!

Page 23: Karmasphere hadoop-productivity-tools

Questions, Errata, Heckling

● Some questions suggested by others:● Where can I download Karmasphere Studio Community Edition?

– Visit http://www.karmasphere.com/ for free downloads and great justice.

● What about building production-ready jobs for enterprise deployment?

– Ask us about introductory offers on Karmasphere Studio Professional Edition.

● How can I use graphical SQL on Hadoop?

– Talk to us about the Karmasphere Analyst Sekrit(!) Beta.

● Some questions I thought up:● How do I (something awfully complicated)?

– Please talk to us, we enjoy the challenges.

● Is there any tea on this spaceship?

● And some from the audience, please!● I get paid by the answer. I need questions.

Page 24: Karmasphere hadoop-productivity-tools
Page 25: Karmasphere hadoop-productivity-tools

BAY AREA HADOOP USER GROUP ;KARMASPHERE®PRODUCTION

K A R M A S P H E R E S T U D I OP R O D U C T I V I T Y S U I T E F O R D E V E L O P E R S A N D A N A LY S T S

SHEVEK CTO, KARMASPHERE MARTIN HALL CEO, KARMASPHERE

kDARREN ARONOFSKY pCLAUDE BESSON cMETALLICA _ENNIO MORRICONE nJK ROWLING

dJACQUELINE DURRAN zJIM HENSON uINDUSTRIAL LIGHT AND MAGIC Ä Ç À