setting up hadoop with mongodb on windows 7...

22
7701 Greenbelt Road, Suite 400, Greenbelt, MD 20770 Tel: (301) 614-8600 Fax: (301) 614-8601 www.sgt-inc.com ©2015 SGT, Inc. All Rights Reserved SGT WHITE PAPER Setting up Hadoop with MongoDB on Windows 7 64-bit HCCP Big Data Lab

Upload: nguyencong

Post on 02-Apr-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

7701 Greenbelt Road, Suite 400, Greenbelt, MD 20770 Tel: (301) 614-8600 Fax: (301) 614-8601 www.sgt-inc.com

©2015 SGT, Inc. All Rights Reserved

SGT WHITE PAPER

Setting up Hadoop with MongoDB on Windows 7 64-bit

HCCP Big Data Lab

Page 2: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

Setting up Hadoop with MongoDB on Windows 7 64‐bit 

Tools and Technologies used in this article:

1. Apache Hadoop 2.2.0 Source codes

2. Windows 7 OS

3. Microsoft Windows SDK v7.1

4. Maven 3.1.1

5. Protocol Buffers 2.5.0

6. Cygwin

7. JDK 1.6

8. MongoDB

9. Mongo-Hadoop Connector

10. Mongo Java Driver

Build Hadoop bin distribution for Windows

1. Download and install Microsoft Windows SDK v7.1.

2. Download and install JDK 1.6 (must be JDK, not JRE)

3. Download and install Unix command-line tool Cygwin.

4. Download and install Maven 3.1.1.

5. Unzip the distribution archive, i.e. apache-maven-3.2.1-bin.zip to the directory you wish to install Maven 3.2.1. These instructions assume you chose C:\Program Files\Apache Software Foundation. The subdirectory apache-maven-3.2.1 will be created from the archive.

6. Add the M2_HOME environment variable by opening up the system properties (WinKey + Pause), selecting the "Advanced" tab, and the "Environment Variables" button, then adding the M2_HOME variable in the user variables with the value C:\Program Files\Apache Software Foundation\apache-maven-3.2.1. Be sure to omit any quotation marks around the path even if it contains spaces.

7. In the same dialog, add the M2 environment variable in the user variables with the value %M2_HOME%\bin.

8. Optional: In the same dialog, add the MAVEN_OPTS environment variable in the user variables to specify JVM properties, e.g. the value -Xms256m -Xmx512m. This environment variable can be used to supply extra options to Maven.

9. In the same dialog, update/create the Path environment variable in the user variables and prepend the value %M2% to add Maven available in the command line.

10. In the same dialog, make sure that JAVA_HOME exists in your user variables or in the system variables and it is set to the location of your JDK, e.g. C:\Program Files\Java\jdk1.5.0_02 and that %JAVA_HOME%\bin is in your Path environment variable.

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 3: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

11. Open a new command prompt (Winkey + R then type cmd) and run mvn --version to verify that it is correctly installed.

5. Download Protocol Buffers 2.5.0 and extract to a folder (say c:\protobuf).

6. Add Environment Variables JAVA_HOME, M2_HOME and Platform if not added already.

Add Environment Variables:

Note :

1. Variable name Platform is case sensitive. And value will be either x64 or Win32 for building on a 64-bit

or 32-bit system.

2. If JDK installation path contains any space then use Windows shortened name (say 'PROGRA~1' for

'Program Files') for the JAVA_HOME environment variable.

Edit Path Variable to add bin directory of Cygwin (say C:\cygwin64\bin), bin directory of Maven (say

C:\maven\bin) and installation path of Protocol Buffers (say c:\protobuf).

Edit Path Variable:

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 4: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

7. Download hadoop-2.2.0-src.tar.gz and extract to a folder having short path (say c:\hdfs) to avoid runtime

problem due to maximum path length limitation in Windows. To extract a tar file in Windows, open cygwin

and cd to the directory that contains hadoop-2.2.0-src.tar.gz. Enter tar xvzf hadoop‐2.2.0‐src.tar.gz  

/cygdrive/c/hdfs . The file should now be extracted to C:\hdfs

8. A patch needs to be added to C:\hdfs\hadoop-common-project\hadoop-auth\pom.xml, open the file with

any text editor and add the following highlighted section after line 57:

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 5: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

9. Select Start --> All Programs --> Microsoft Windows SDK v7.1 and open Windows SDK 7.1 Command

Prompt. Change directory to Hadoop source code folder (c:\hdfs). Execute mvn package with options

-Pdist,native-win -DskipTests -Dtar to create Windows binary tar distribution.

Windows SDK 7.1 Command Prompt

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

Setting SDK environment relative to C:\Program Files\Microsoft

SDKs\Windows\v7.1\.

Targeting Windows 7 x64 Debug

C:\Program Files\Microsoft SDKs\Windows\v7.1>cd c:\hdfs

C:\hdfs>mvn package -Pdist,native-win -DskipTests -Dtar

[INFO] Scanning for projects...

[INFO] ---------------------------------------------------------------------

---

[INFO] Reactor Build Order:

[INFO]

[INFO] Apache Hadoop Main

[INFO] Apache Hadoop Project POM

[INFO] Apache Hadoop Annotations

[INFO] Apache Hadoop Assemblies

[INFO] Apache Hadoop Project Dist POM

[INFO] Apache Hadoop Maven Plugins

[INFO] Apache Hadoop Auth

[INFO] Apache Hadoop Auth Examples

[INFO] Apache Hadoop Common

[INFO] Apache Hadoop NFS

[INFO] Apache Hadoop Common Project

Note : I have pasted only the starting few lines of huge logs generated by maven. This building step

requires Internet connection as Maven will download all the required dependencies.

10. If everything goes well in the previous step, then native distribution hadoop-2.2.0.tar.gz will be created

inside C:\hdfs\hadoop-dist\target\hadoop-2.2.0 directory.

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 6: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

Install Hadoop

1. Extract hadoop-2.2.0.tar.gz to a folder (say c:\hadoop).

2. Add Environment Variable HADOOP_HOME and edit Path Variable to add bin directory of

HADOOP_HOME (say C:\hadoop\bin).

Add Environment Variables:

Configure Hadoop

Make following changes to configure Hadoop

File: C:\hadoop\etc\hadoop\core-site.xml

?

1

2

3

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 7: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

fs.defaultFS:

The name of the default file system. A URI whose scheme and authority determine the

FileSystem implementation. The uri's scheme determines the config property

(fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is

used to determine the host, port, etc. for a filesystem.

File: C:\hadoop\etc\hadoop\hdfs-site.xml

?

1

2

3

4

5

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 8: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/hadoop/data/dfs/namenode</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:/hadoop/data/dfs/datanode</value>

</property>

</configuration>

dfs.replication:

Default block replication. The actual number of replications can be specified when the file

is created. The default is used if replication is not specified in create time.

dfs.namenode.name.dir:

Determines where on the local filesystem the DFS name node should store the name

table(fsimage). If this is a comma-delimited list of directories then the name table is

replicated in all of the directories, for redundancy.

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 9: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

dfs.datanode.data.dir:

Determines where on the local filesystem an DFS data node should store its blocks. If

this is a comma-delimited list of directories, then data will be stored in all named

directories, typically on different devices. Directories that do not exist are ignored.

Note : Create namenode and datanode directory under c:/hadoop/data/dfs/.

File: C:\hadoop\etc\hadoop\yarn-site.xml

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

<?xml version="1.0"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<configuration>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<property>

<name>yarn.application.classpath</name>

<value>

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 10: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

27

28

29

30

31

32

33

34

35

36

37

38

%HADOOP_HOME%\etc\hadoop,

%HADOOP_HOME%\share\hadoop\common\*,

%HADOOP_HOME%\share\hadoop\common\lib\*,

%HADOOP_HOME%\share\hadoop\mapreduce\*,

%HADOOP_HOME%\share\hadoop\mapreduce\lib\*,

%HADOOP_HOME%\share\hadoop\hdfs\*,

%HADOOP_HOME%\share\hadoop\hdfs\lib\*,

%HADOOP_HOME%\share\hadoop\yarn\*,

%HADOOP_HOME%\share\hadoop\yarn\lib\*

</value>

</property>

</configuration>

yarn.nodemanager.aux-services:

The auxiliary service name. Default value is omapreduce_shuffle

yarn.nodemanager.aux-services.mapreduce.shuffle.class:

The auxiliary service class to use. Default value is

org.apache.hadoop.mapred.ShuffleHandler

yarn.application.classpath:

CLASSPATH for YARN applications. A comma-separated list of CLASSPATH entries.

File: C:\hadoop\etc\hadoop\mapred-site.xml

?

1

2

3

4

5

6

7

8

9

10

11

12

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 11: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

13

14

15

16

17

18

19

20

21

22

23

24

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

mapreduce.framework.name:

The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn.

Format namenode

Before you start Hadoop for the first time only, namenode needs to be formatted.

Command Prompt

?

1

2

3

4

5

6

7

8

9

10

11

12

Microsoft Windows [Version 6.1.7601]

Copyright (c) 2009 Microsoft Corporation. All rights reserved.

C:\Users\abhijitg>cd c:\hadoop\bin

c:\hadoop\bin>hdfs namenode -format

13/11/03 18:07:47 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG: host = ABHIJITG/x.x.x.x

STARTUP_MSG: args = [-format]

STARTUP_MSG: version = 2.2.0

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 12: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

STARTUP_MSG: classpath = <classpath jars here>

STARTUP_MSG: build = Unknown -r Unknown; compiled by ABHIJITG on 2013-11-

01T13:42Z

STARTUP_MSG: java = 1.7.0_03

************************************************************/

Formatting using clusterid: CID-1af0bd9f-efee-4d4e-9f03-a0032c22e5eb

13/11/03 18:07:48 INFO namenode.HostFileManager: read includes:

HostSet(

)

13/11/03 18:07:48 INFO namenode.HostFileManager: read excludes:

HostSet(

)

13/11/03 18:07:48 INFO blockmanagement.DatanodeManager:

dfs.block.invalidate.limit=1000

13/11/03 18:07:48 INFO util.GSet: Computing capacity for map BlocksMap

13/11/03 18:07:48 INFO util.GSet: VM type = 64-bit

13/11/03 18:07:48 INFO util.GSet: 2.0% max memory = 888.9 MB

13/11/03 18:07:48 INFO util.GSet: capacity = 2^21 = 2097152 entries

13/11/03 18:07:48 INFO blockmanagement.BlockManager:

dfs.block.access.token.enable=false

13/11/03 18:07:48 INFO blockmanagement.BlockManager:

defaultReplication = 1

13/11/03 18:07:48 INFO blockmanagement.BlockManager:

maxReplication = 512

13/11/03 18:07:48 INFO blockmanagement.BlockManager:

minReplication = 1

13/11/03 18:07:48 INFO blockmanagement.BlockManager:

maxReplicationStreams = 2

13/11/03 18:07:48 INFO blockmanagement.BlockManager:

shouldCheckForEnoughRacks = false

13/11/03 18:07:48 INFO blockmanagement.BlockManager:

replicationRecheckInterval = 3000

13/11/03 18:07:48 INFO blockmanagement.BlockManager:

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 13: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

encryptDataTransfer = false

13/11/03 18:07:48 INFO namenode.FSNamesystem: fsOwner = ABHIJITG

(auth:SIMPLE)

13/11/03 18:07:48 INFO namenode.FSNamesystem: supergroup =

supergroup

13/11/03 18:07:48 INFO namenode.FSNamesystem: isPermissionEnabled = true

13/11/03 18:07:48 INFO namenode.FSNamesystem: HA Enabled: false

13/11/03 18:07:48 INFO namenode.FSNamesystem: Append Enabled: true

13/11/03 18:07:49 INFO util.GSet: Computing capacity for map INodeMap

13/11/03 18:07:49 INFO util.GSet: VM type = 64-bit

13/11/03 18:07:49 INFO util.GSet: 1.0% max memory = 888.9 MB

13/11/03 18:07:49 INFO util.GSet: capacity = 2^20 = 1048576 entries

13/11/03 18:07:49 INFO namenode.NameNode: Caching file names occuring more

than 10 times

13/11/03 18:07:49 INFO namenode.FSNamesystem:

dfs.namenode.safemode.threshold-pct = 0.9990000128746033

13/11/03 18:07:49 INFO namenode.FSNamesystem:

dfs.namenode.safemode.min.datanodes = 0

13/11/03 18:07:49 INFO namenode.FSNamesystem:

dfs.namenode.safemode.extension = 30000

13/11/03 18:07:49 INFO namenode.FSNamesystem: Retry cache on namenode is

enabled

13/11/03 18:07:49 INFO namenode.FSNamesystem: Retry cache will use 0.03 of

total heap and retry cache entry expiry time

is 600000 millis

13/11/03 18:07:49 INFO util.GSet: Computing capacity for map Namenode Retry

Cache

13/11/03 18:07:49 INFO util.GSet: VM type = 64-bit

13/11/03 18:07:49 INFO util.GSet: 0.029999999329447746% max memory = 888.9 MB

13/11/03 18:07:49 INFO util.GSet: capacity = 2^15 = 32768 entries

13/11/03 18:07:49 INFO common.Storage: Storage directory

\hadoop\data\dfs\namenode has been successfully formatted.

13/11/03 18:07:49 INFO namenode.FSImage: Saving image file

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 14: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

\hadoop\data\dfs\namenode\current\fsimage.ckpt_00000000000000

00000 using no compression

13/11/03 18:07:49 INFO namenode.FSImage: Image file

\hadoop\data\dfs\namenode\current\fsimage.ckpt_0000000000000000000 o

f size 200 bytes saved in 0 seconds.

13/11/03 18:07:49 INFO namenode.NNStorageRetentionManager: Going to retain 1

images with txid >= 0

13/11/03 18:07:49 INFO util.ExitUtil: Exiting with status 0

13/11/03 18:07:49 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at ABHIJITG/x.x.x.x

************************************************************/

Start HDFS (Namenode and Datanode)

Command Prompt

?

1

2

C:\Users\abhijitg>cd c:\hadoop\sbin

c:\hadoop\sbin>start-dfs

Two separate Command Prompt windows will be opened automatically to run Namenode and Datanode.

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 15: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

Start MapReduce aka YARN (Resource Manager and Node Manager)

Command Prompt

?

1

2

3

C:\Users\abhijitg>cd c:\hadoop\sbin

c:\hadoop\sbin>start-yarn

starting yarn daemons

Similarly, two separate Command Prompt windows will be opened automatically to run Resource

Manager and Node Manager.

Verify Installation

If everything goes well then you will be able to open the Resource Manager and Node Manager at

http://localhost:8042 and Namenode at http://localhost:50070.

Node Manager: http://localhost:8042/

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 16: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

Namenode: http://localhost:50070

Install MongoDB

Download MongoDB and unzip the file to any location, preferably something simple without spaces in the

path (Cygwin doesn’t work well with spaces in file names) such as C:\mongodb. You will also need to

create a directory to store the data from mongodb. Create a folder called data and a folder called db

inside of data so that you have the default directory structure C:\data\db (if you want to use another

location for data you can, but you will need to override the default location when you load data into

MongoDB).

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 17: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

Add “C:\mongodb\bin” to the “Path” system variable

To start MongoDB open a windows command terminal and type “mongod”

In a second command terminal, type “mongo”

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 18: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

If it is setup properly it should connect and you will see the next line in the prompt change from the

windows prompt of “C:\Users\UserName>” (or whatever directory you’re in) to the MongoDB prompt “>”.

Install Mongo-Hadoop Connector

Download the Mongo-Hadoop Connector and unzip it into your Cygwin user directory (i.e.

C:\cygwin\home\UserName\mongo-hadoop”. In a Cygwin terminal, go to this directory by entering the

command “cd ~/mongo-hadoop”.

Open the file ~/mongo-hadoop/build.sbt with Notepad++ (or another text editor, but if you use a standard

windows text editor, you may have to run unix2dos and dos2unix on the file) and change the line:

hadoopRelease in ThisBuild := "x.x"

To:

hadoopRelease in ThisBuild := "2.2"

Save the file and close it. To compile the connector, enter “./sbt package” into the cygwin terminal while

still in the same directory.

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 19: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

Note: Internet connection is required, because the script will be downloading files also.

Note: If you are using a 64-bit version of windows, Cygwin will give you a warning that states that

the compiler is using a Windows path, but prefers a Cygwin path, but you can ignore this warning,

it will still compile fine, However, if you are using a 32-bit version of windows, the connector won’t

build properly without a Cygwin Path. To fix this, open ~/mongo-hadoop/sbt and change the value

of sbt_jar to the Cygwin location of your sbt-launch.jar (should be something like

/cygdrive/c/home/UserName/.sbt/launch/0.12.2/sbt-launch.jar) right before the final command is

run (execRunner) at line 460.

The build process should generate a JAR file in ~/mongo-hadoop/core/target. You must copy the jars to

the lib directory on each node in your hadoop cluster. This should be located at the location

$HADOOP_HOME/share/hadoop/common/lib, assuming you are using Hadoop v2.2.0. Download the

latest stable version of the mongo java driver and place this jar in this directory as well.

Testing the Mongo-Hadoop Connector

Everything should now be installed correctly, so all we have left to do is test the configuration. To do this

we will run the Treasury_Yield example that came with the mongo-hadoop connector.

If you don’t already have MongoDB and Hadoop running, start both of them in the way described above.

Open a windows command prompt and enter (change the path to your path to yield_historical_in.json):

mongoimport ‐d mongo_hadoop ‐c yield_historical.in c:\cygwin64\home\UserName\mongo‐

hadoop\examples\treasury_yield\src\main\resources\yield_historical_in.json 

To verify this worked go to your mongo.exe that should be running for MongoDB and type:

use mongo_hadoop 

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 20: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

db.yield_historical.in.find().forEach(printjson) 

You should see a long list of MongoDB documents that look like the following:

        "_id" : ISODate("2010‐09‐30T00:00:00Z"), 

        "dayOfWeek" : "THURSDAY ", 

        "bc3Year" : 0.64, 

        "bc5Year" : 1.27, 

        "bc10Year" : 2.53, 

        "bc20Year" : 3.38, 

        "bc1Month" : 0.14, 

        "bc2Year" : 0.42, 

        "bc3Month" : 0.16, 

        "bc30Year" : 3.69, 

        "bc1Year" : 0.27, 

        "bc7Year" : 1.91, 

        "bc6Month" : 0.19 

From your Cygwin home directory, find the file mongo-hadoop/examples/treasury_yield/src/main/

resources/mongo-treasury_yield. Open the file and change the Output value class for the mapper (lines

91-95) to:

<property>

<!-- Output value class for the mapper [optional] -->

<name>mongo.job.mapper.output.value</name>

<value>org.apache.hadoop.io.DoubleWritable</value>

</property>

Input & Output Databases: in this file, line 21,

“<value>mongodb://127.0.0.1/mongo_hadoop.yield_historical.in</value>”, is your input MongoDB

database.collection, and line 26,

“<value>mongodb://127.0.0.1/mongo_hadoop.yield_historical.out</value>”, is your output MongoDB

database.collection. If the output database or collection doesn’t exist yet, it will be automatically created.

Note: In this file you will find many important options that must be set to run a map reduce job in hadoop.

For this example, they are already set for us, however, when you are creating your own map reduce jobs,

you will have to set these:

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 21: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

Class for Mapper: This must match the class path to the java Mapper class in the jar file.

Class for the Reducer: This must match the class path to the java Reducer class in the jar file.

Input Format Class: This should always be set to com.mongodb.hadoop.MongoInputFormat if input is

from MongoDB.

Output Format Class: This should always be set to com.mongodb.hadoop.MongoOutputFormat if output

needs to write to MongoDB.

Output Key Class for the Output Format: This class path needs to match the class of the key that is

being used.

Output Key Class for the Mapper: This class path needs to match the class that the third class that the

Mapper class extends. For example, in the TreasuryYieldMapper.java file, the class that it matches is the

following:

public class TreasuryYieldMapper 

    extends Mapper<Object, BSONObject, IntWritable, DoubleWritable> { 

 

Class path in the xml file:

<value>org.apache.hadoop.io.IntWritable</value> 

Output Value Class for the Mapper: This class path needs to match the class that the fourth class that

the Mapper class extends. For example, in the TreasuryYieldMapper.java file, the class that it matches is

the following:

public class TreasuryYieldMapper 

    extends Mapper<Object, BSONObject, IntWritable, DoubleWritable> { 

 

Class path in the xml file:

<value>org.apache.hadoop.io.DoubleWritable</value> 

Open the file mongo-hadoop/examples/treasury_yield/src/main/java/com/mongodb/hadoop/examples/

treasury/TreasuryXMLConfig.java. Change lines 28-30 to:

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved

Page 22: Setting up Hadoop with MongoDB on Windows 7 64-bitinnovation.sgt-inc.com/wp-content/uploads/2015/02/SGTiCenter...Setting up Hadoop with MongoDB on Windows 7 64-bit ... Setting up Hadoop

        // Configuration.addDefaultResource("src/examples/hadoop‐local.xml"); 

        Configuration.addDefaultResource("mongo‐defaults.xml"); 

        Configuration.addDefaultResource("mongo‐treasury_yield.xml"); 

Once this is done, to compile this example cd to ~/mongo-hadoop. Run ./sbt treasury‐example/package.

This should create the jar file ~/mongo-hadoop/examples/treasury_yield/target/treasury-example_2.2.0-

1.2.0.jar. Copy this file to your %HADOOP_HOME% directory that you set in windows. Open a command

prompt and enter:

cd %HADOOP_HOME% 

hadoop jar treasury‐example_2.2.0‐1.2.0.jar com.mongodb.hadoop.examples.treasury.TreasuryYieldXMLConfig 

 

If all has goes well, we should be able to see the output in the mongo.exe. From your mongo.exe enter

the following:

use mongo_hadoop 

db.historical_yield.out.find().forEach(printjson) 

You should now see the output from the map reduce job.

Sources:

http://www.srccodes.com/p/article/38/build-install-configure-run-apache-hadoop-2.2.0-microsoft-windows-os

http://maven.apache.org/download.cgi

http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/

SGT Innovation Center Hadoop & MongoDB on Windows 7 64‐bit White Paper ©2015 SGT, Inc. All Rights Reserved