ewan higgs - fosdem€¦ · native n16 script myscript.sh elide arguments with environment...

34
Hanythingondemand – Hadoop clusters on HPC clusters FOSDEM, 31 January 2016 Ewan Higgs DICT - UGent, VSC [email protected] http://www.ugent.be/hpc - http://www.vscentrum.be Empowering reseachers

Upload: others

Post on 09-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

Hanythingondemand – Hadoop clusters on HPC clusters

FOSDEM, 31 January 2016

Ewan HiggsDICT - UGent, VSC

[email protected]://www.ugent.be/hpc - http://www.vscentrum.be

Empowering reseachers

Page 2: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

About Me

Ewan Higgs

Big Data Coordinator for HPC

Ghent University

Page 3: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

Agenda

● What is hanythingondemand?● Why have we made hanythingondemand?● Wade into hod (no deep dives)● Use cases● Developer things

Page 4: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

4

HOD

HOD – Hanythingondemand– https://github.com/hpcugent/hanythingondemand

Run a Hadoop cluster in our HPC clusters

Extensive good documentation:– https://hod.readthedocs.org

Page 5: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

5

hod - commands

hod create – Create a new cluster.

hod connect – Connect to your cluster.

hod batch – create a new cluster to run a script.

hod list – list your clusters.

Page 6: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

6

Create a Hadoop Cluster

Create a new cluster$ hod create ­­label mycluster ­n 4 ­­dist Hadoop­2.6.0­cdh5.4.5­native 

Connect to the cluster

$ hod connect mycluster

Run jobs on the cluster:

$ yarn jar wordcount.jar WordCount wordcount/input wordcount/output 

Page 7: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

7

Batch

Create a new cluster and run a script$ hod batch ­­dist Hadoop­2.6.0­cdh5.4.5­native ­n16 ­­script myscript.sh

Elide arguments with environment variables:export HOD_BATCH_DIST=Hadoop­2.6.0­cdh5.4.5­native

hod batch ­n16 ­­script myscript.sh

Page 8: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

8

IPython Notebook

Create an IPython Notebook$ hod create ­­dist IPython­notebook­3.2.3 ­n2

To use, make an SSH tunnel to the head node and set a proxy.

Page 9: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

9

Why?

Why not just buy a big data system?

Why not just go cloud?

Page 10: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

10

HPC | Big Data

Page 11: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

11

European HPC

Tier-1regionalnational

Tier-2university

Tier-0Europe

Tier-3desktop

Page 12: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

12

Pokemon clusters

raichu delcatty phanpy golett swalot

# nodes 64 158 16 200 128

# cores 1024 2528 384 4800 2560

Interconnect Ethernet Infiniband FDR Infiniband FDR Infiniband FDR-10 Infiniband FDR

CPU Intel Xeon Sandy Bridge

Intel Xeon Sandy Bridge

Intel Xeon Haswell Intel Xeon Haswell Intel Xeon Haswell

Clock (GHz) 2.6 2.6 2.5 2.5 2.6

Memory per node (GiB) 32 64 512 64 128

Installed 2012 2013 2015 2015 2016

HPC-UGent:STEVIN

infrastructure

Page 13: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

13

Tier 1

muk (Ghent) Tier1b (Leuven) Swalot (Tier2)

# nodes 528 580 128

# cores 8448 16240 2560

Interconnect Infiniband FDR Infiniband EDR Infiniband FDR

CPU Intel Xeon Sandy Bridge

Intel Xeon Broadwell Intel Xeon Haswell

Clock (GHz) 2.6 3.2 2.6

Memory per node (GiB) 64 128/256 128

Installed 2013 2016 2016

Page 14: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

14

Hadoop

Page 15: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

15

HOD

Page 16: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

16

Disk Locality?

Page 17: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

17

Big Data in Practice

…as much as 90% most jobs fit on a single node based on a report from Cloudera’s (n.b.: data from 2012).

But locality of data...

Page 18: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

18

Wade into HOD

Wade into HOD

Page 19: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

19

HOD - Overview

Page 20: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

20

What's a dist?

Page 21: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

21

hod.conf

Page 22: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

22

nodemanager.conf

Page 23: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

23

Big Feature

Auto generated configurations

Page 24: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

24

Config Overrides

Page 25: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

25

User Stories

User Stories

Page 26: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

26

Halvade

Page 27: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

27

Halvade

Page 28: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

28

Big Data Course

Page 29: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

29

Developer stuff

Code

Limitations

Community

Page 30: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

30

Code

https://github.com/hpcugent/hanythingondemand● Python 2.7● GPL v2● ~80% code coverage● Jenkins builds

Page 31: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

31

Limitations

● Only PBS/Torque

● Server coding in Python2 and without twisted

Page 32: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

32

Community

Would you like to use this at your site?

Are there any tools you need?

Do you need slurm or Grid Engine?

Page 33: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

33

Wrapping up

● HOD lets HPC users use Hadoop ecosystem

● Auto generated configurations

● Being used for actual research

● HPC clusters can make good Big Data clusters

● Check it out!

Page 34: Ewan Higgs - FOSDEM€¦ · native n16 script myscript.sh Elide arguments with environment variables: export HOD_BATCH_DIST=Hadoop2.6.0cdh5.4.5native hod batch n16 script myscript.sh

34

Thanks

Questions?

Further contact:

[email protected]