ha do op tutorial

Upload: sanasri87

Post on 30-May-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Ha Do Op Tutorial

    1/13

    Hands-On HadoopHands-On Hadoop

    TutorialTutorialChris SosaChris Sosa

    Wolfgang RichterWolfgang RichterMay 23, 2008May 23, 2008

  • 8/14/2019 Ha Do Op Tutorial

    2/13

    General InformationGeneral Information

    Hadoop uses HDFS, a distributed fileHadoop uses HDFS, a distributed filesystem based on GFS, as its sharedsystem based on GFS, as its sharedfilesystemfilesystem

    HDFS architecture divides files intoHDFS architecture divides files intolarge chunks (~64MB) distributedlarge chunks (~64MB) distributed

    across data serversacross data servers

    HDFS has a global namespaceHDFS has a global namespace

  • 8/14/2019 Ha Do Op Tutorial

    3/13

    General Information (contd)General Information (contd)

    Provided a script for your convenienceProvided a script for your convenience Run source /localtmp/hadoop/setupVars fromRun source /localtmp/hadoop/setupVars from

    centurtion064centurtion064

    Changes all uses of {somePath}/command to justChanges all uses of {somePath}/command to just

    commandcommand

    GotoGoto http://www.cs.virginia.edu/~cbs6n/hadoophttp://www.cs.virginia.edu/~cbs6n/hadoopfor web access. These slides and morefor web access. These slides and moreinformation are also available there.information are also available there.

    Once you use the DFS (put something in it),Once you use the DFS (put something in it),relative paths are from /usr/{your usr id}. E.G. ifrelative paths are from /usr/{your usr id}. E.G. ifyour id is tb28 your home dir is /usr/tb28your id is tb28 your home dir is /usr/tb28

    http://www.cs.virginia.edu/~cbs6n/hadoophttp://www.cs.virginia.edu/~cbs6n/hadoophttp://www.cs.virginia.edu/~cbs6n/hadoop
  • 8/14/2019 Ha Do Op Tutorial

    4/13

    Master NodeMaster Node

    Hadoop currently configured withHadoop currently configured with

    centurion064 as the master nodecenturion064 as the master node

    Master nodeMaster node

    Keeps track of namespace andKeeps track of namespace and

    metadata about itemsmetadata about items

    Keeps track of MapReduce jobs in theKeeps track of MapReduce jobs in the

    systemsystem

  • 8/14/2019 Ha Do Op Tutorial

    5/13

    Slave NodesSlave Nodes

    Centurion064 also acts as a slaveCenturion064 also acts as a slave

    nodenode

    Slave nodesSlave nodes

    Manage blocks of data sent from masterManage blocks of data sent from master

    nodenode

    In terms of GFS, these are theIn terms of GFS, these are the

    chunkserverschunkservers

    Currently centurion060 is alsoCurrently centurion060 is also

  • 8/14/2019 Ha Do Op Tutorial

    6/13

    Hadoop PathsHadoop Paths

    Hadoop is locally installed on eachHadoop is locally installed on eachmachinemachine Installed location is inInstalled location is in

    /localtmp/hadoop/hadoop-0.15.3/localtmp/hadoop/hadoop-0.15.3

    Slave nodes store their data inSlave nodes store their data in/localtmp/hadoop/hadoop-dfs (this is/localtmp/hadoop/hadoop-dfs (this isautomatically created by the DFS)automatically created by the DFS)

    /localtmp/hadoop is owned by group gbg/localtmp/hadoop is owned by group gbg

    (someone in this group must administer this or(someone in this group must administer this ora cs admin)a cs admin)

    Files are divided into 64 MB chunks (this isFiles are divided into 64 MB chunks (this is

    configurable)configurable)

  • 8/14/2019 Ha Do Op Tutorial

    7/13

    Starting / Stopping HadoopStarting / Stopping Hadoop

    For the purposes of this tutorial, weFor the purposes of this tutorial, we

    assume you have run the setupVarsassume you have run the setupVars

    from earlierfrom earlier

    start-all.sh starts all slave nodesstart-all.sh starts all slave nodes

    and master nodeand master node

    stop-all.sh stops all slave nodes andstop-all.sh stops all slave nodes and

    master nodemaster node

  • 8/14/2019 Ha Do Op Tutorial

    8/13

    Using HDFS (1/2)Using HDFS (1/2)

    hadoop dfshadoop dfs [-ls ][-ls ] [-du ][-du ] [-cp ][-cp ] [-rm ][-rm ] [-put ][-put ]

    [-copyFromLocal ][-copyFromLocal ] [-moveFromLocal ][-moveFromLocal ] [-get [-crc] ][-get [-crc] ] [-cat ][-cat ] [-copyToLocal [-crc] ][-copyToLocal [-crc] ] [-moveToLocal [-crc] ][-moveToLocal [-crc] ] [-mkdir ][-mkdir ] [-touchz ][-touchz ] [-test -[ezd] ][-test -[ezd] ] [-stat [format] ][-stat [format] ] [-help [cmd]][-help [cmd]]

  • 8/14/2019 Ha Do Op Tutorial

    9/13

    Using HDFS (2/2)Using HDFS (2/2)

    Want to reformat?Want to reformat?

    EasyEasy

    hadoop namenode formathadoop namenode format

    Basically we see most commands lookBasically we see most commands look

    similarsimilar hadoop some command optionshadoop some command options

    If you just type hadoop you get all possibleIf you just type hadoop you get all possible

    commands (including undocumented ones commands (including undocumented ones

    hooray)hooray)

  • 8/14/2019 Ha Do Op Tutorial

    10/13

    To Add Another SlaveTo Add Another Slave

    This adds another data node / jobThis adds another data node / jobexecution site to the poolexecution site to the pool Hadoop dynamically uses filesystemHadoop dynamically uses filesystem

    underneath itunderneath it

    If more space is available on the HDD, HDFSIf more space is available on the HDD, HDFSwill try to use it when it needs towill try to use it when it needs to Modify the slaves fileModify the slaves file

    In centurion064:/localtmp/hadoop/hadoop-In centurion064:/localtmp/hadoop/hadoop-0.15.3/conf0.15.3/conf

    Copy code installation dir toCopy code installation dir tonewMachine:/localtmp/hadoop/hadoop-0.15.3newMachine:/localtmp/hadoop/hadoop-0.15.3(very small)(very small)

    Restart HadoopRestart Hadoop

  • 8/14/2019 Ha Do Op Tutorial

    11/13

    Configure HadoopConfigure Hadoop

    Can configure in {$installation dir}/confCan configure in {$installation dir}/conf hadoop-default.xml for globalhadoop-default.xml for global

    hadoop-site.xml for site specific (overrideshadoop-site.xml for site specific (overrides

    global)global)

  • 8/14/2019 Ha Do Op Tutorial

    12/13

    Thats it for Configuration!Thats it for Configuration!

  • 8/14/2019 Ha Do Op Tutorial

    13/13

    Real-time AccessReal-time Access