karmasphere hadoop-productivity-tools

Post on 20-Jan-2015

1.634 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

This slide intentionally left blank.

State-of-the-Art Productivity Tools for Developers & Analysts

Shevek

About Karmasphere

● Productivity suite for Developers and Analysts.● Point-and-drool GUI for Hadoop, Hive, Cascading, Pig.

● MapReduce development and debugging on-cluster.

● Integrated with Eclipse and NetBeans IDEs.

● Interface between a human (you!) and a Hadoop cluster.● Does the boring, tedious or repetitive bits.

● Finds the errors fast before you do.

● Works anywhere with anything.

HALP!

Karmasphere

Hockey sticks!

The Idea

● Collect Underpants● ....?● Profit

But what goes in the middle?

The Problem

● Collect Data● Convert to MapReduce● Execute● Debug● Tune● … Profit

Get someone else to do it!

How long will it take?

● Performance

Of what? Surely not the computer.

Computational Performance

Time (faster considered better)

Make this algorithm as fast as you can.

Analytics Performance

But what aboutthis bit?

Or this bit?

Analytics is slightly different.

Analytics Performance

But what aboutthis bit?

Or this bit?

That the human understands the problem does not mean that the computer understands the problem.

Analytics Performance

But what aboutthis bit?

Or this bit?

The computer knowing the answer is not the same asthe human understanding the answer.

Common MapReduce Challenges

● How do I write a Hadoop job?● Did my job work?

● If it didn't throw an exception, it worked. Right?

● Did I get the correct answer?● Are you sure?● Do you have enough information to prove that?● … to your accountants or customers?

● What happened? or What do I need to know?● Please note, this feature is now officially called the

“Job Profiler”, not the “What?! Window.”

Karmasphere Studio

Karmasphere Studio

Common Analytical Tasks

So common, in fact, that ...

group

sort

aggregate

intersection

unique

limit

scan

join

function

hash

materialize

condition

set operations

store

catindex

High Level Languages

Hive PigCascading

Cascading

A workflow based language

Perfect for dylsexics like me.

Pig

An imperative scripting language

data =    LOAD '$input'    AS (query:CHARARRAY,        count:INT);

queries_group =     GROUP data     BY query    PARALLEL $reducers;

queries_sum =    FOREACH queries_group     GENERATE         group AS query,         SUM(data.count) AS count;

queries_ordered = ORDER queries_sum     BY count DESC    PARALLEL $reducers;

Simple and accessible to all.

Hive

An SQL-like language

FROM ( FROM (  FROM src src1 SELECT src1.key AS c1, src1.value AS c2 WHERE src1.key > 10 and src1.key < 20  ) a FULL OUTER JOIN (  FROM src src2 SELECT src2.key AS c3, src2.value AS c4 WHERE src2.key > 15 and src2.key < 25 ) b  ON (a.c1 = b.c3) SELECT a.c1 AS c1, a.c2 AS c2, b.c3 AS c3, b.c4 AS c4) cSELECT c.c1, c.c2, c.c3, c.c4

I can parse that in my head, honest.

Karmasphere Analyst

FROM (  FROM src select src.key, src.value WHERE src.key < 100  UNION ALL  FROM src SELECT src.* WHERE src.key > 100) unioninputINSERT OVERWRITE DIRECTORY 'union.out' SELECT unioninput.*

Karmasphere Analyst

Conclusions

How long does it take to get your answers?

How to get involved

● Getting started as a Hadoop Java Developer?● Download Karmasphere Studio FREE!

● Deploying Hadoop jobs in production?● Use Karmasphere Studio Professional Edition.

● Want to use high level languages like SQL?● Talk to us about Karmasphere Analyst.● Join the beta programme!

Questions, Errata, Heckling

● Some questions suggested by others:● Where can I download Karmasphere Studio Community Edition?

– Visit http://www.karmasphere.com/ for free downloads and great justice.

● What about building production-ready jobs for enterprise deployment?

– Ask us about introductory offers on Karmasphere Studio Professional Edition.

● How can I use graphical SQL on Hadoop?

– Talk to us about the Karmasphere Analyst Sekrit(!) Beta.

● Some questions I thought up:● How do I (something awfully complicated)?

– Please talk to us, we enjoy the challenges.

● Is there any tea on this spaceship?

● And some from the audience, please!● I get paid by the answer. I need questions.

BAY AREA HADOOP USER GROUP ;KARMASPHERE®PRODUCTION

K A R M A S P H E R E S T U D I OP R O D U C T I V I T Y S U I T E F O R D E V E L O P E R S A N D A N A LY S T S

SHEVEK CTO, KARMASPHERE MARTIN HALL CEO, KARMASPHERE

kDARREN ARONOFSKY pCLAUDE BESSON cMETALLICA _ENNIO MORRICONE nJK ROWLING

dJACQUELINE DURRAN zJIM HENSON uINDUSTRIAL LIGHT AND MAGIC Ä Ç À

top related