linuxclustersinstute: turning&hpc&cluster&into&a ......linuxclustersinstute:...
TRANSCRIPT
![Page 1: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/1.jpg)
Linux Clusters Ins.tute: Turning HPC cluster into a
Big Data Cluster
Fang (Cherry) Liu, PhD [email protected]
A Partnership for an Advanced Compu@ng Environment (PACE)
OIT/ART, Georgia Tech
![Page 2: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/2.jpg)
18-22 May 2015 2
Targets for this session
Target audience: HPC system admins who wants to support Hadoop cluster
Points of interests: • Big Data is Common • Challenges and Tools • Hadoop vs HPC • Hadoop Core Characteris@cs • Hadoop EcoSystem
• Core parts – HDFS and Mapreduce • Distribu@ons • Projects
• Hadoop Basic Opera@ons • Configure Hadoop Cluster on HPC cluster – PACE Hadoop Cluster • Hadoop Advance Opera@ons
![Page 3: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/3.jpg)
Big Data is Common • The en@re rendering of Avatar reportedly requires over 1 Petabyte of storage space according to BBC’s Clickbits, which is the equivalent of 500 hard drives of 2TB each. That’s equal to a 32 year long MP3 file. The compu@ng core – 34 racks, each with four chassis of 32 machines each – adds up to some 40,000 processors and 104 terabytes of RAM. hZp://thenextweb.com/2010/01/01/avatar-‐takes-‐1-‐petabyte-‐storage-‐space-‐equivalent-‐32-‐year-‐long-‐mp3/
• Facebook’s data warehouses grow by “Over half a petabyte … every 24 hours” (2012) hZp://www.theregister.co.uk/2012/11/09/facebook_open_sources_corona/
• An average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day (2009). hZp://dl.acm.org/[email protected]?id=1327492&dl=ACM&coll=DL&CFID=508148704&CFTOKEN=44216082
18-22 May 2015 3
![Page 4: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/4.jpg)
Big Data: Three challenges
• Volume (Storage) • The size of the data
• Velocity (Processing speed) • The latency of data processing rela@ve to the growing demand of interac@vity
• Variety (structure vs. unstructured data) • The diversity of sources, formats, quality, structures
18-22 May 2015 4
![Page 5: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/5.jpg)
What are Tools to use?
• Hadoop is an open-‐source sonware for reliable, scalable, distributed compu@ng:
• Created by Doug Cuong and Michael Cafarella while at Yahoo, named aner Doug’s son’s toy elephant
• WriZen in Java • Scale to thousands of machines with linear scalability • Uses simple programming model (MapReduce) • Fault tolerant (HDFS)
• Spark • it is a fast and general engine for large-‐scale data processing, it claims to run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.
• It provides high-‐level tools including Spark SQL, Mllib for machine learning, GraphX, and Spark Streaming.
• It runs on Hadoop, Mesos, standalone, or in the cloud. • It can access data sources including HDFS, Cassandra, Hbase, Hive, and Amazon S3.
18-22 May 2015 5
![Page 6: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/6.jpg)
Hadoop Vs. Conversional HPC
4-8 August 2014 6
Hadoop HPC
A cluster of machines collocate the data with the compute node
A cluster of machines access a shared filesystem
Move computa@on to data Move data to computa@on, network bandwidth is the boZleneck
MapReduce operates at the higher level, data flow is implicit
Message Passing Interface (MPI) explicitly handle the mechanics of the data flow
Fault tolerance through data replica@ons, easier to rerun the Map or Reduce tasks.
Needs to explicitly manage checkpoin@ng and recovery
Restrict to data-‐processing problem Can solve more complex algorithm
![Page 7: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/7.jpg)
Hadoop Core Characteris.cs
• Distribu@on -‐ instead of building one big supercomputer, storage and processing are spread across a cluster of smaller machines that communicate and work together.
• Horizontal scalability -‐ it is easy to extend a Hadoop cluster by just adding new machines. Every new machine increases total storage and processing power of the Hadoop cluster.
• Fault-‐tolerance -‐ Hadoop con@nues to operate even when a few hardware or sonware components fail to work properly. Cost-‐op@miza@on -‐ Hadoop runs on standard hardware; it does not require expensive servers.
• Programming abstrac@on -‐ Hadoop takes care of all messy details related to distributed compu@ng. Using a high-‐level API, users can focus on implemen@ng business logic that solves their real-‐world problems.
• Data locality – don’t move large datasets to where applica@on is running, but run the applica@on where the data already is.
18-22 May 2015 7
![Page 8: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/8.jpg)
Hadoop EcoSystem
18-22 May 2015 8
disk disk disk disk disk disk disk
HDFS
MapReduce HBase
Impata Pig Scalding Mahout Hive
Akaban Oozie Workflow
Analysis
Hue
![Page 9: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/9.jpg)
Hadoop EcoSystem – Core part
• HDFS: Hadoop Distribute File System (HDFS) is a distributed file system designed to run on a commodity cluster of machines. HDFS is highly fault tolerant and is useful for processing large data sets.
• MapReduce: MapReduce is a sonware framework for processing large data sets, petabyte scale, on a cluster of commodity hardware. When MapReduce jobs are run, Hadoop splits the input and locates the nodes on the cluster. The actual jobs are then run at or close to the node where the data is residing so that the data is as close to the computa@on node as possible. This avoids transfer of huge amount of data across the network so that the network does not become a boZleneck or get flooded.
18-22 May 2015 9
![Page 10: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/10.jpg)
Hadoop EcoSystem -‐ Distribu.ons
• Apache: Purely Open Source distribu@on of Hadoop maintained by the community at Apache Sonware Founda@on.
• Cloudera: Cloudera’s distribu@on of Hadoop that is built on top of Apache Hadoop. The distribu@on includes capabili@es such as management, security, high availability and integra@on with a wide variety of hardware and sonware solu@ons. Cloudera is the leading distributor of Hadoop.
• Horton Works: This also builds on the Open Source Apache Hadoop with claims to enterprise readiness. It also claims to be the only distribu@on that is available for Windows servers.
• MapR: Hadoop distribu@on with some unique features, most notably the ability to mount the Hadoop cluster over NFS.
• Amazon EMR : Amazon’s hosted version of MapReduce is called Elas@c Map Reduce. This is part of the Amazon Web Services (AWS). EMR allows a Hadoop cluster to be deployed and MapReduce jobs to be run in the AWS cloud with just a few clicks.
18-22 May 2015 10
![Page 11: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/11.jpg)
Hadoop EcoSystem -‐ Related Projects • Pig : A high level language to analyzing large data sets which eases development of MapReduce jobs. Hundreds of lines of code can be wriZen with just few lines of Pig. At Yahoo > 60% of Hadoop usage is on Pig.
• Hive : Hive is a data warehouse framework that stores querying of large data sets stored in Hadoop. Hive provides a high-‐level SQL like language called HiveQL.
• HBase : HBase is a distributed scalable data store based on Hadoop. HBase is a distributed, versioned, column-‐oriented database modeled aner Google’s BigTable.
• Mahout : Mahout is a scalable Machine learning library. Mahout u@lizes Hadoop to achieve massive scalability.
• YARN : YARN is the next genera@on of MapReduce a.k.a MapReduce 2. The MapReduce framework was overhauled using YARN to overcome the scalability boZlenecks in earlier version of MapReduce when it was run over a very large cluster(thousands of nodes).
• Ozzie : . Ozzie is a workflow scheduler system that eases the crea@on and management of the sequence of MapReduce jobs.
• Flume : A distributed, reliable and available service for collec@ng, aggrega@ng and moving log data to HDFS. This is typically useful in systems where log data needs to be moved to HDFS periodically for processing.
18-22 May 2015 11
![Page 12: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/12.jpg)
Hadoop Basics -‐ HDFS • HDFS is designed to store a very large amount of informa@on (terabytes or petabytes). This requires spreading the data across a large number of machines. It also supports much larger file sizes than NFS.
• HDFS should store data reliably. If individual machines in the cluster malfunc@on, data should s@ll be available.
• HDFS should provide fast, scalable access to this informa@on. It should be possible to serve a larger number of clients by simply adding more machines to the cluster.
• HDFS should integrate well with Hadoop MapReduce, allowing data to be read and computed upon locally when possible.
18-22 May 2015 12
![Page 13: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/13.jpg)
Hadoop Basics – HDFS (Cont.)
18-22 May 2015 13
![Page 14: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/14.jpg)
Hadoop Basics -‐ MapReduce
18-22 May 2015 14
Mapping and reducing tasks run on nodes where individual records of data are already present.
![Page 15: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/15.jpg)
Example : Word Count
18-22 May 2015 15
![Page 16: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/16.jpg)
Basic HDFS Commands
• Create a directory in HDFS • hadoop fs –mkdir <paths> (e.g. hadoop fs –mkdir /user/hadoop/dir1)
• List files • hadoop fs –ls <args> (e.g. hadoop fs –ls /user/hadoop/dir1)
• Upload data from local system to HDFS • hadoop fs -‐put <localsrc> ... <HDFS_dest_Path> (e.g. hadoop fs –put ~/foo.txt /user/hadoop/dir1/foo.txt)
• Download file from HDFS • hadoop fs -‐get <hdfs_src> <localdst> (e.g. hadop fs –get /user/hadoop/dir1/foo.txt /home/)
• Check how much space u@liza@on in a HDFS dir • hadoop fs –du URI (e.g. hadoop fs –du /user/hadoop )
• Get help • hadoop fs -‐help
18-22 May 2015 16
![Page 17: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/17.jpg)
Case Study : PACE Hadoop cluster
• Consists of 5x24-‐core Altus 2704 servers, each with 128 GB of RAM. Each node has 3TB local disk Raid (6x500GB)
• All nodes are connected via 40 GB/sec IB connec@on • There is one node serving as NameNode + JobTracker + DataNode and named as hadoop-‐nn1
• The rest of four nodes serving as DataNode namely • Hadoop-‐dn1 • Hadoop-‐dn2 • Hadoop-‐dn3 • Hadoop-‐dn4
• Check the cluster status and running jobs at: • Overview hZp://<NameNode Full Qualified Name>:50070 • JobTracker hZp://<NameNode Full Qualified Name>:8088
18-22 May 2015 17
![Page 18: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/18.jpg)
Installing Hadoop on academic cluster
• Download release from official website hZp://hadoop.apache.org/, most recent release is 2.7.0 at April 21, 2015.
• Puong the binary distribu@on on NFS loca@on so that all par@cipated nodes can access.
• Adding the local disk to each node to serve as HDFS file system • Configuring the nodes into name nodes and data nodes.
4-8 August 2014 18
![Page 19: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/19.jpg)
Configuring HDFS
• All configura@ons are ${HADOOP_PREFIX}/etc/hadoop (e.g. /usr/local/packages/hadoop/2.6/etc/hadoop)
• In core-‐site.html
• In hdfs-‐site.html
4-8 August 2014 19
Key Value Example
Fs.defaultFS Protocol://servername:port
Hdfs://127.0.0.1:9000
Hadoop.tmp.dir Pathname /dfs/hadoop/tmp
Key Value Example
Dfs.name.dir Pathname /dfs/hadoop/name
Dfs.data.dir Pathname /dfs/hadoop/data
Dfs.replica@on Number of replica@on 3
![Page 20: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/20.jpg)
Configuring Yarn
• In ${HADOOP_PREFIX}/yarn-‐site.xml
4-8 August 2014 20
<property> <name>yarn.resourcemanager.resource-‐tracker.address</name> <value>hdfs://<hostname>:8025</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>hdfs://<hostname>:8030</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>hdfs://<hostname>:8040</value> </property>
![Page 21: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/21.jpg)
Configure Slaves file ${HADOOP_HOME}/etc/hadoop/slaves
4-8 August 2014 21
Hadoop-‐nn1
Hadoop-‐dn1
Hadoop-‐dn2
Hadoop-‐dn4
Hadoop-‐dn3
DataNodes
NameNode
hadoop-‐nn1: hadoop-‐nn1 hadoop-‐dn1 hadoop-‐dn2 hadoop-‐dn3 hadoop-‐dn4 hadoop-‐dn1: hadoop-‐dn1 hadoop-‐dn2: hadoop-‐dn2 hadoop-‐dn3: hadoop-‐dn3 hadoop-‐dn4: hadoop-‐dn4
![Page 22: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/22.jpg)
Configure environment
• Add following two lines in ${HADOOP_PREFIX}/etc/hadoop/hadoop-‐env.sh export JAVA_HOME=/usr/local/packages/java/1.7.0 export HADOOP_LOG_DIR=/nv/ap2/logs/hadoop
• Add following line in ${HADOOP_PREFIX}/etc/hadoop/yarn-‐env.sh export YARN_LOG_DIR=/nv/ap2/logs/hadoop
• Add following line in ${HADOOP_PREFIX}/etc/hadoop/mapred-‐env.sh export HADOOP_MAPRED_LOG_DIR=/nv/ap2/logs/hadoop
4-8 August 2014 22
![Page 23: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/23.jpg)
Star.ng HDFS
• Create a user named hadoop, and start all hadoop services with hadoop user to avoid security issue.
• Format the file system based on the above configura@on:
${HADOOP_HOME}/bin/hdfs namenode –format <cluster_name>
• Start the HDFS • start-‐dfs.sh Make sure its running correctly: ps –ef |grep –I hadoop
• Start the Yarn • start-‐yarn.sh Make sure its running correctly : ps –ef | grep –I resourcemanager
4-8 August 2014 23
![Page 24: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/24.jpg)
User management on Hadoop Cluster
• Login as hadoop user ssh hadoop@hadoop-‐nn1
• Create a directory for given user Hadoop fs –mkdir /user/userID
• Set directory ownership to given user Hadoop fs –chown –R userID:groupID /user/userID
• Change the permission to user only Hadoop fs –chmod –R 700 /user/userID
4-8 August 2014 24
![Page 25: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/25.jpg)
User runs a job
• User login to Hadoop NameNode, and submit job from there ssh userID@hadoop-‐nn1
• Load Hadoop environment variables, such as Jave, python, HADOOP_PREFIX , HADOOP_YARN_HOME Module load hadoop/2.6.0
• Upload input file from local file system to hadoop fs –put example.txt /user/userID
• Run wordcount on example.txt hadoop jar /usr/local/packages/hadoop/2.6.0/share/hadoop/mapreduce/hadoop-‐mapreduce-‐examples-‐2.6.0.jar wordcount /user/userID/ /user/userID/testout
• Check the result hadoop fs –cat /user/userID/testout > output
4-8 August 2014 25
![Page 26: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/26.jpg)
Stopping HDFS
• Stops resource manager stop-‐yarn.sh
• Stops name nodes and data nodes stop-‐hdfs.sh
Note: Some@mes may require manually shutdown on each individual data nodes.
4-8 August 2014 26
![Page 27: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/27.jpg)
Hadoop Cluster overview
4-8 August 2014 27
![Page 28: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/28.jpg)
Hadoop DataNodes
4-8 August 2014 28
![Page 29: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/29.jpg)
Hadoop Job Tracker History
4-8 August 2014 29
![Page 30: LinuxClustersInstute: Turning&HPC&cluster&into&a ......LinuxClustersInstute: Turning&HPC&cluster&into&a& BigDataCluster Fang%(Cherry)%Liu,%PhD%%%%% % %%%%fang.liu@oit.gatech.edu% %](https://reader030.vdocuments.us/reader030/viewer/2022041014/5ec5c2da4b59e275ef4fa8f9/html5/thumbnails/30.jpg)
References:
• Yahoo Hadoop Tutorial hZps://developer.yahoo.com/hadoop/tutorial/index.html
• “Introduc@on to Data Science”, Bill Howe, University of Washington
4-8 August 2014 30