create aws account - homepages.cwi.nlhomepages.cwi.nl/.../bdba-isfbd-aws-intro.docx  · web...

18
BD&BA – BDI&T Practical Disclaimer: the screens you see live might not look entirely identical as the screenshot provided, here. However, the important / necessary button and fields should still be recognizable. Likewise, some version number might have changed; in that case, please simply use the latest / default one. Please don’t hesitate to contact me ([email protected] ) in case of any problems. Create AWS account http://aws.amazon.com Continue with the signup process, choosing the “Basic (Free)” support plan. It will ask for a credit card number, which will not be billed during this course. Once you're done, send the email address you have used to sign up for AWS to [email protected] to have charges taken over by ABS. Accessing AWS console Select a region; "US East" worked fine for me, but any other region should be fine as well.

Upload: nguyenanh

Post on 31-Jan-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Create AWS account - homepages.cwi.nlhomepages.cwi.nl/.../BDBA-ISfBD-AWS-Intro.docx  · Web viewPlease don’t hesitate to contact me (Stefan.Manegold@cwi.nl) in case of any problems

BD&BA – BDI&T PracticalDisclaimer: the screens you see live might not look entirely identical as the screenshot provided, here. However, the important / necessary button and fields should still be recognizable. Likewise, some version number might have changed; in that case, please simply use the latest / default one. Please don’t hesitate to contact me ([email protected]) in case of any problems.

Create AWS accounthttp://aws.amazon.com

Continue with the signup process, choosing the “Basic (Free)” support plan. It will ask for a credit card number, which will not be billed during this course. Once you're done, send the email address you have used to sign up for AWS to [email protected] have charges taken over by ABS.

Accessing AWS consoleSelect a region; "US East" worked fine for me, but any other region should be fine as well.

Configure FirewallGo to EC2 in AWS Console

Page 2: Create AWS account - homepages.cwi.nlhomepages.cwi.nl/.../BDBA-ISfBD-AWS-Intro.docx  · Web viewPlease don’t hesitate to contact me (Stefan.Manegold@cwi.nl) in case of any problems

First, create a key pair for protected authorized access. Go to "Key Pair"

Click on “Create Key Pair” and choose a name for your key pair

And save the key pair to your local machine following your browser’s download dialog --- recall where, i.e., in which directory/location, you / your browser save(s) the text file containing your key pair; you’ll need it later.

Then, go to "Security Groups" and click "Create Security Group"

The Hue Web interface (details below) runs on TCP port 8888, hence we need to create a firewall rule that allows access to this port. Please note that this is not suitable for a production environment. Name and description are of course arbitrary.

Page 3: Create AWS account - homepages.cwi.nlhomepages.cwi.nl/.../BDBA-ISfBD-AWS-Intro.docx  · Web viewPlease don’t hesitate to contact me (Stefan.Manegold@cwi.nl) in case of any problems

Start EMR ClusterGo to EMR in AWS Console

Create Cluster

Select "advanced options". We need to set some configuration.

Page 4: Create AWS account - homepages.cwi.nlhomepages.cwi.nl/.../BDBA-ISfBD-AWS-Intro.docx  · Web viewPlease don’t hesitate to contact me (Stefan.Manegold@cwi.nl) in case of any problems

This is the "Advanced cluster configuration screen":

Page 5: Create AWS account - homepages.cwi.nlhomepages.cwi.nl/.../BDBA-ISfBD-AWS-Intro.docx  · Web viewPlease don’t hesitate to contact me (Stefan.Manegold@cwi.nl) in case of any problems

Step 1: Software & StepsSelect Amazon Software Release 4.2.0 (or 4.1.0, 3.11.0, 3.10.0, 3.9.0) (the Pig script we’re using later on does not seem to work properly with higher/newer versions ...)

Select Hue to be installed --- most probably this is nowadays done automatically with software release 3.9.0; see the tick with “Hue 3.7.1” in above screenshot.

Step 2: Hardware:Select instance type (m2.xlarge m3.xlarge, or whatever the current default is, if fine for now) and cluster size (2 in total, 1 master, 1 core).

Step 3: General Cluster Settings:Just click “Next”

Step 4: Security

Page 6: Create AWS account - homepages.cwi.nlhomepages.cwi.nl/.../BDBA-ISfBD-AWS-Intro.docx  · Web viewPlease don’t hesitate to contact me (Stefan.Manegold@cwi.nl) in case of any problems

First select the key pair you created earlier:

Then assign the security group we created earlier to the Master node:

Page 7: Create AWS account - homepages.cwi.nlhomepages.cwi.nl/.../BDBA-ISfBD-AWS-Intro.docx  · Web viewPlease don’t hesitate to contact me (Stefan.Manegold@cwi.nl) in case of any problems

This is how this should look like. Note that your security group ID will be different.

Page 8: Create AWS account - homepages.cwi.nlhomepages.cwi.nl/.../BDBA-ISfBD-AWS-Intro.docx  · Web viewPlease don’t hesitate to contact me (Stefan.Manegold@cwi.nl) in case of any problems

And finally, go for launch: click on “Create cluster”.

You will then see a screen like this:

You can check the "Hardware" tab to see instances launching. You can also check the EC2 service to see your instances starting.

Now you need to wait till you see "Waiting". This can take a while, 15 mins are not unheard of.

While you're waiting, calculate how much running this cluster will cost per hourhttps://aws.amazon.com/elasticmapreduce/pricing/

Page 9: Create AWS account - homepages.cwi.nlhomepages.cwi.nl/.../BDBA-ISfBD-AWS-Intro.docx  · Web viewPlease don’t hesitate to contact me (Stefan.Manegold@cwi.nl) in case of any problems

HueSince we opened the Firewall, we can access the Hue web interface directly. In the above screen, just right of “Connections:”, the “Hue” should be a clickable link. If so, just click on it.If not, you need to copy the address just right of “Master public DNS:” into a new tab of your browser and add “:8888” as port number; i.e., from the above screenshot that would behttp://ec2-52-28-188-141.eu-central-1.compute.amazonaws.com:8888

You will see the Hue Web interface login screen:

Coming up with an arbitrary (simple, no special characters such as @) user name and a complex enough password is left as an exercise to the reader.

Afterwards, you will see the following screen --- or similar (possibly simpler) with newer software releases ...

Page 10: Create AWS account - homepages.cwi.nlhomepages.cwi.nl/.../BDBA-ISfBD-AWS-Intro.docx  · Web viewPlease don’t hesitate to contact me (Stefan.Manegold@cwi.nl) in case of any problems

Uploading dataWe are going to use the King James version of the Bible as sample data. Download the text file from http://homepages.cwi.nl/~manegold/UvA-ABS-MBA-BDBA-BDIT-2017/kjv.txt

Go to the “File Browser” (sometime also labeled “HDFS Browser”) and upload the file

You will see your file uploaded if all goes well.

Running your first EMR Pig job - WordcountWe are going to find the most common words in the Bible. To this end, we are going to run an Apache Pig script on the kjv.txt file using MapReduce.

Page 11: Create AWS account - homepages.cwi.nlhomepages.cwi.nl/.../BDBA-ISfBD-AWS-Intro.docx  · Web viewPlease don’t hesitate to contact me (Stefan.Manegold@cwi.nl) in case of any problems

Enter the following script:

a = LOAD 'kjv.txt';b = FOREACH a GENERATE FLATTEN(TOKENIZE((chararray)$0)) AS word;c = GROUP b BY word;d = FOREACH c GENERATE COUNT(b) AS ct, group;e = ORDER d BY ct DESC;STORE e INTO 'bible_wordcount';

Then execute the script (what does it do? Can you tell?) by clicking the triangle:

The script will dutifully start executing and show the log:

Page 12: Create AWS account - homepages.cwi.nlhomepages.cwi.nl/.../BDBA-ISfBD-AWS-Intro.docx  · Web viewPlease don’t hesitate to contact me (Stefan.Manegold@cwi.nl) in case of any problems

Downloading resultsOnce your job has finished, you can inspect the results using the File Browser again. You will see a new folder, bible_wordcount.

Inside, there are files part-r-..., which have the word counts in it. Tadaa.

Page 13: Create AWS account - homepages.cwi.nlhomepages.cwi.nl/.../BDBA-ISfBD-AWS-Intro.docx  · Web viewPlease don’t hesitate to contact me (Stefan.Manegold@cwi.nl) in case of any problems

Before you shut down, you could look around a bit (if there is time)

● Check the Hue job log to see the MapReduce jobs created from your Pig script. What were their statistics?

● Check the EC2/EMR interfaces to see the various performance parameters● If you have time, you could try to change the pig script such that only the 100 most frequent words

are shown and words with a length smaller than three are omitted.

Cluster ShutdownDo not forget to shut down your EMR cluster again, since it will create costs otherwise.Go back to the AWS Web interface, select the EMR service, select your cluster, and click "Terminate".

Page 14: Create AWS account - homepages.cwi.nlhomepages.cwi.nl/.../BDBA-ISfBD-AWS-Intro.docx  · Web viewPlease don’t hesitate to contact me (Stefan.Manegold@cwi.nl) in case of any problems

Shutdown VerificationIt is important to make sure that no EC2 instances / EMR clusters are left running at the end. So check the EC2 console to make sure 0 instances are running. Also check the EMR console to make sure no clusters are running.

Page 15: Create AWS account - homepages.cwi.nlhomepages.cwi.nl/.../BDBA-ISfBD-AWS-Intro.docx  · Web viewPlease don’t hesitate to contact me (Stefan.Manegold@cwi.nl) in case of any problems