hadoop gets groovy

17
© Hortonworks Inc. 2012 Hadoop gets Groovy Steve Loughran– Hortonworks stevel at hortonworks.com @steveloughran Berlin, June 2012

Upload: steve-loughran

Post on 12-Nov-2014

3.686 views

Category:

Technology


0 download

DESCRIPTION

Presentation on using Hadoop with the Groovy Language from Berlin Buzzwords 2012

TRANSCRIPT

Page 1: Hadoop gets Groovy

© Hortonworks Inc. 2012

Hadoop gets Groovy

Steve Loughran– Hortonworksstevel at hortonworks.com@steveloughran

Berlin, June 2012

Page 2: Hadoop gets Groovy

© Hortonworks Inc. 2012Page 2

Hadoop SkillsG

roov

y S

kills

Doug,Owen

Arun, Jakob

@steveloughran

James Strachan

Guillamue Laforge

Where are you in this diagram?

Page 3: Hadoop gets Groovy

© Hortonworks Inc. 2012

Grumpy : Groovy Hadoop Library

• Something lightweight for testing

• Wanted to play in the M/R layer

• Already using Groovy

• Liked: JVM integration, tooling, libraries, IntelliJ IDEA,

Books…

[email protected]:steveloughran/grumpy.git

Page 3

Page 4: Hadoop gets Groovy

© Hortonworks Inc. 2012

What is Groovy?

A dynamic language within the JVM

• Java++–Maps, lists, tuples, Closures

• Flavours of Ruby and Python–'Duck' typing, Grails, (Scripting)

A way to do things in the JVM that Sun didn't imagine

Page 4

Page 5: Hadoop gets Groovy

© Hortonworks Inc. 2012

Can use & subclass java classes:

class LineCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

static final def emitKey = new Text("lines")static final def one = new IntWritable(1)

void map(LongWritable key, Text value, Mapper.Context context) { context.write(emitKey, one) }}

Page 5

Page 6: Hadoop gets Groovy

© Hortonworks Inc. 2012

Closures & lists

class CountReducer2 extends Reducer {

def reduce(Text k, Iterable values, Reducer.Context ctx) {

def sum = values.collect() {it.get() }.sum()

ctx.write(k, new IntWritable(sum)); }

}

Page 6

Page 7: Hadoop gets Groovy

© Hortonworks Inc. 2012

Closures & lists

values.collect() { it.get() }.sum()

List<values> -> List<int> -> int

Page 7

Page 8: Hadoop gets Groovy

© Hortonworks Inc. 2012

Result: MR jobs in Groovy

In:gate1,b46cca4d3f5f313176e50a0e38e7fde3,,2006-10-30,16:06:17,Fleurballgate1,f1191b79236083ce59981e049d863604,,2006-10-30,16:06:20,vklaptopgate1,b45c7795f5be038dda8615ab44676872,,2006-10-30,16:06:21,Franky Pankygate1,02e73779c77fcd4e9f90a193c4f3e7ff,,2006-10-30,16:06:23,gate1,eef1836efddf8dbfe5e2a3cd5c13745f,,2006-10-30,16:06:24,Vasgate1,b46cca4d3f5f313176e50a0e38e7fde3,,2006-10-30,16:06:32,Fleurballgate1,f1191b79236083ce59981e049d863604,,2006-10-30,16:06:36,vklaptopgate1,b45c7795f5be038dda8615ab44676872,,2006-10-30,16:06:37,Franky Pankygate1,eef1836efddf8dbfe5e2a3cd5c13745f,,2006-10-30,16:06:38,Vasgate1,02e73779c77fcd4e9f90a193c4f3e7ff,,2006-10-30,16:06:43,gate1,2afaf990ce75f0a7208f7f012c8d12ad,,2006-10-30,16:06:54,Smiley

Out: 163,198,223 device sightings!

Page 8

Page 9: Hadoop gets Groovy

© Hortonworks Inc. 2012

why no Pig? Sliding Window Debounce

void map(LongWritable key, BlueEvent event, Mapper.Context context) {

BlueEvent ev2 = window.insert(event) List<BlueEvent> expired = window.purgeExpired(event) expired.each { evt -> emit(context, evt) }}

void cleanup(Mapper.Context context) { window.each { evt -> emit(context, evt) }}

Page 9

Page 10: Hadoop gets Groovy

© Hortonworks Inc. 2012

Device sightings by day for 2007

Page 10

Dec

15

Aug

27

Tue-

Wed

Pea

k D

ays

Page 11: Hadoop gets Groovy

© Hortonworks Inc. 2012

Improving Hadoop APIs

Configuration.metaClass.setAt = { key, val -> set(key.toString(), val.toString())}

Configuration.metaClass.getAt = { key -> get(key)}

Configuration.metaClass.add = {map -> map.each {elt -> set((elt.key).toString(), (elt.value).toString() )}

Page 11

Page 12: Hadoop gets Groovy

© Hortonworks Inc. 2012

& Configuration gets better

conf['mapscript'] = new File(src).text

String scriptText = conf['mapscript']

conf.add([ window:60000, 'redscript':reduceScript ])

Extending to Job class trickier –subclassing better

Page 12

Page 13: Hadoop gets Groovy

© Hortonworks Inc. 2012

New today! script driven MR jobs!

protected void setup(Mapper.Context ctx) { this.ctx = ctx this.conf = ctx.configuration ScriptCompiler comp = new ScriptCompiler(conf) String scriptText = conf['mapscript'] map = comp.parse(scriptText, this, ctx) }

protected void map(Writable key, Writable value, Mapper.Context ctx) { map.setProperty('key',key) map.setProperty('value',value) map.run() }

Page 13

Page 14: Hadoop gets Groovy

© Hortonworks Inc. 2012

Things to consider

•Performance: Groovy 2 on Java7• 'False friends' -Types, if(), exceptions

• If you can use Pig, use it. •Use Groovy for testing, extending Hadoop classes (output formatter, etc)

•Play with YARN and Giraph with it

Page 14

Page 15: Hadoop gets Groovy

© Hortonworks Inc. 2012

Questions?

hortonworks.com

Page 15

Page 16: Hadoop gets Groovy

© Hortonworks Inc. 2012

hortonworks.com

Page 16

Page 17: Hadoop gets Groovy

© Hortonworks Inc. 2012

Performance?

•Groovy 1 over-introspects•HLL hides a lot of overhead

• If your work is I/O bound, less important•Speed of development vs execution•Need to benchmark on Java 7

Page 17