hadoop distributed file system (hdfs) implementation in geni wei kou – university of connecticut...

13
Hadoop Distributed File System (HDFS) implementation in GENI Wei Kou – University of Connecticut Madhav –Missouri University of Science and Technology Sheyda – University of Missouri Kansas City Min Sang Yoon – Iowa State University

Upload: derrick-whitehead

Post on 22-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Hadoop Distributed File System (HDFS) implementation in GENI Wei Kou – University of Connecticut Madhav –Missouri University of Science and Technology

Hadoop Distributed File System (HDFS) implementation in GENI

Wei Kou – University of Connecticut Madhav –Missouri University of Science and Technology

Sheyda – University of Missouri Kansas City Min Sang Yoon – Iowa State University

Page 2: Hadoop Distributed File System (HDFS) implementation in GENI Wei Kou – University of Connecticut Madhav –Missouri University of Science and Technology

• Introduction of Hadoop

• Hadoop configuration in GENI(single site)

• Multiple sites configuration

• Simulation result

Contents

Page 3: Hadoop Distributed File System (HDFS) implementation in GENI Wei Kou – University of Connecticut Madhav –Missouri University of Science and Technology

• Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.

• Composed with a single name node and data node clusters.

• Use Map-reduce programming model to distribute single file

HDFS (Hadoop Distributed File Systems)

Page 4: Hadoop Distributed File System (HDFS) implementation in GENI Wei Kou – University of Connecticut Madhav –Missouri University of Science and Technology

Hadoop configuration in GENI (single site)

- One maser node- 5 data nodes- Configured the cluster

Master node172.16.1.1

172.16.1.10172.16.1.14

172.16.1.12

172.16.1.13 172.16.1.12

Page 5: Hadoop Distributed File System (HDFS) implementation in GENI Wei Kou – University of Connecticut Madhav –Missouri University of Science and Technology

Hadoop configuration in GENI (single site)

• We configured 128 GB Capacity HDFS.

Page 6: Hadoop Distributed File System (HDFS) implementation in GENI Wei Kou – University of Connecticut Madhav –Missouri University of Science and Technology

Hadoop configuration in GENI (single site)

• Each data node allocate 25.6GB for HDFS

Page 7: Hadoop Distributed File System (HDFS) implementation in GENI Wei Kou – University of Connecticut Madhav –Missouri University of Science and Technology

Hadoop configuration in GENI (single site)File distribution command

File list on HDFS

Master

Worker-0

Page 8: Hadoop Distributed File System (HDFS) implementation in GENI Wei Kou – University of Connecticut Madhav –Missouri University of Science and Technology

Hadoop configuration in GENI (multiple sites)• Purpose : To observe how physical distance affect to performance of network.• We generated 4 slices configured in different sites. • Master node is located in same site in all scenarios.• Two data nodes are assigned in same site with master node and three

data nodes are assigned in different sites.• All are connected to same subnet.

case 1: GPO(master) – Texas A&Mcase 2: GPO(master) – UC Daviscase 3: GPO(master) - Wayne State Universitycase 4: GPO(master) – University of Florida

GPO(master) – UC Davis

Page 9: Hadoop Distributed File System (HDFS) implementation in GENI Wei Kou – University of Connecticut Madhav –Missouri University of Science and Technology

Hadoop configuration in GENI (multiple sites)

Wayne State University

UC Davis

Texas A&M

University of Florida

Page 10: Hadoop Distributed File System (HDFS) implementation in GENI Wei Kou – University of Connecticut Madhav –Missouri University of Science and Technology

Simulation configuration

• We generated 1Gb dataset for each case.

• We measure data transmission time of each case.

• 128 GB HDFS capacity.

• 25GB from each data node.

Page 11: Hadoop Distributed File System (HDFS) implementation in GENI Wei Kou – University of Connecticut Madhav –Missouri University of Science and Technology

Simulation ResultDistribution time (case 2 result)

Distribution time (case 3 result)

Case Distance Distribution time

Single site 0 miles 19 seconds

GPO – Wayne State University 717 miles 7min

GPO – University of Florida 1220 miles 7min 36 seconds

GPO – Texas A&M 1862 miles 7min 55 seconds

GPO – UC Davis 3027 miles 8 min 30 seconds

Page 12: Hadoop Distributed File System (HDFS) implementation in GENI Wei Kou – University of Connecticut Madhav –Missouri University of Science and Technology

Case 1 Case 2 Case 3 Case 4 Case 50

100

200

300

400

500

600

Distribution Time (sec)

Simulation Result

second

Page 13: Hadoop Distributed File System (HDFS) implementation in GENI Wei Kou – University of Connecticut Madhav –Missouri University of Science and Technology

Conclusion & future work

• Hadoop distributed file system can be implemented in GENI successfully.

• We could observe the relationship between physical distance and network time.

• However, the affection of physical distance is not very significant than our expectation.

• We should consider other factors more carefully in deciding load distribution in network.