karmasphere hadoop-productivity-tools
DESCRIPTION
TRANSCRIPT
This slide intentionally left blank.
State-of-the-Art Productivity Tools for Developers & Analysts
Shevek
About Karmasphere
● Productivity suite for Developers and Analysts.● Point-and-drool GUI for Hadoop, Hive, Cascading, Pig.
● MapReduce development and debugging on-cluster.
● Integrated with Eclipse and NetBeans IDEs.
● Interface between a human (you!) and a Hadoop cluster.● Does the boring, tedious or repetitive bits.
● Finds the errors fast before you do.
● Works anywhere with anything.
HALP!
Karmasphere
Hockey sticks!
The Idea
● Collect Underpants● ....?● Profit
But what goes in the middle?
The Problem
● Collect Data● Convert to MapReduce● Execute● Debug● Tune● … Profit
Get someone else to do it!
How long will it take?
● Performance
Of what? Surely not the computer.
Computational Performance
Time (faster considered better)
Make this algorithm as fast as you can.
Analytics Performance
But what aboutthis bit?
Or this bit?
Analytics is slightly different.
Analytics Performance
But what aboutthis bit?
Or this bit?
That the human understands the problem does not mean that the computer understands the problem.
Analytics Performance
But what aboutthis bit?
Or this bit?
The computer knowing the answer is not the same asthe human understanding the answer.
Common MapReduce Challenges
● How do I write a Hadoop job?● Did my job work?
● If it didn't throw an exception, it worked. Right?
● Did I get the correct answer?● Are you sure?● Do you have enough information to prove that?● … to your accountants or customers?
● What happened? or What do I need to know?● Please note, this feature is now officially called the
“Job Profiler”, not the “What?! Window.”
Karmasphere Studio
Karmasphere Studio
Common Analytical Tasks
So common, in fact, that ...
group
sort
aggregate
intersection
unique
limit
scan
join
function
hash
materialize
condition
set operations
store
catindex
High Level Languages
Hive PigCascading
Cascading
A workflow based language
Perfect for dylsexics like me.
Pig
An imperative scripting language
data = LOAD '$input' AS (query:CHARARRAY, count:INT);
queries_group = GROUP data BY query PARALLEL $reducers;
queries_sum = FOREACH queries_group GENERATE group AS query, SUM(data.count) AS count;
queries_ordered = ORDER queries_sum BY count DESC PARALLEL $reducers;
Simple and accessible to all.
Hive
An SQL-like language
FROM ( FROM ( FROM src src1 SELECT src1.key AS c1, src1.value AS c2 WHERE src1.key > 10 and src1.key < 20 ) a FULL OUTER JOIN ( FROM src src2 SELECT src2.key AS c3, src2.value AS c4 WHERE src2.key > 15 and src2.key < 25 ) b ON (a.c1 = b.c3) SELECT a.c1 AS c1, a.c2 AS c2, b.c3 AS c3, b.c4 AS c4) cSELECT c.c1, c.c2, c.c3, c.c4
I can parse that in my head, honest.
Karmasphere Analyst
FROM ( FROM src select src.key, src.value WHERE src.key < 100 UNION ALL FROM src SELECT src.* WHERE src.key > 100) unioninputINSERT OVERWRITE DIRECTORY 'union.out' SELECT unioninput.*
Karmasphere Analyst
Conclusions
How long does it take to get your answers?
How to get involved
● Getting started as a Hadoop Java Developer?● Download Karmasphere Studio FREE!
● Deploying Hadoop jobs in production?● Use Karmasphere Studio Professional Edition.
● Want to use high level languages like SQL?● Talk to us about Karmasphere Analyst.● Join the beta programme!
Questions, Errata, Heckling
● Some questions suggested by others:● Where can I download Karmasphere Studio Community Edition?
– Visit http://www.karmasphere.com/ for free downloads and great justice.
● What about building production-ready jobs for enterprise deployment?
– Ask us about introductory offers on Karmasphere Studio Professional Edition.
● How can I use graphical SQL on Hadoop?
– Talk to us about the Karmasphere Analyst Sekrit(!) Beta.
● Some questions I thought up:● How do I (something awfully complicated)?
– Please talk to us, we enjoy the challenges.
● Is there any tea on this spaceship?
● And some from the audience, please!● I get paid by the answer. I need questions.
BAY AREA HADOOP USER GROUP ;KARMASPHERE®PRODUCTION
K A R M A S P H E R E S T U D I OP R O D U C T I V I T Y S U I T E F O R D E V E L O P E R S A N D A N A LY S T S
SHEVEK CTO, KARMASPHERE MARTIN HALL CEO, KARMASPHERE
kDARREN ARONOFSKY pCLAUDE BESSON cMETALLICA _ENNIO MORRICONE nJK ROWLING
dJACQUELINE DURRAN zJIM HENSON uINDUSTRIAL LIGHT AND MAGIC Ä Ç À