seminar map/reduce · seminar map/reduce 20.10.2010 prof. johann-christoph freytag, ph. d. rico...

Post on 03-Aug-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Seminar Map/Reduce

20.10.2010

Prof. Johann-Christoph Freytag, Ph. D.

Rico Bergmann

o introo talkso papero Cloudso Map/

Reduceo themes

contact

• Prof. Johann-Christoph Freytag Ph.D.

– Prof. at chair in Databases and Information Systems (DBIS) RUD25

• Rico Bergmann

– research assistant at chair in Databases andInformation Systems (DBIS) RUD25

– room 4‘222 (please make an appointment)

– Mail: bergmann@informatik.hu-berlin.de

o introo talkso papero Cloudso Map/

Reduceo themes

organisation

• weekly

• first talk: 03. nov. 2010

• RUD25 room 4‘112

• Wednesday 13-15 p.m.

• conditions for a certificate

– presentation

– term paper

– regular attendance

o introo talkso papero Cloudso Map/

Reduceo themes

organisation

• presentation of a theme

– 60 minutes presentation

– 30 minutes questions and feedback

– send the slides until Monday before your talk to bergmann@informatik.hu-berlin.de

• term paper

– to be handed in until: 13. Feb 2011

o introo talkso papero Cloudso Map/

Reduceo themes

questions?

o introo talkso papero Cloudso Map/

Reduceo themes

some recommendations

How to give a good talk (very briefly)?

How to write a good termpaper (even shorter)?

o introo talkso papero Cloudso Map/

Reduceo themes

presentations

• a good talk

– is interesting

– has a logical and observable structure

– has a take-home-message

o introo talkso papero Cloudso Map/

Reduceo themes

bad talks …

• have too much text or too muchpictures

• have no colors or too much colors

• have no substance or too muchsubstance

• can be found nearly everywhere

o introo talkso papero Cloudso Map/

Reduceo themes

the perfect talk …

• does not exist

• but: try to get near perfect

• you can make every themeinteresting (business men know this)

• you may even lie, if it helps youraudience to understand you

o introo talkso papero Cloudso Map/

Reduceo themes

talk - introduction

• first slide: title, name of the speaker

• motivation for the talk

– this is the key to attention

WHOOOMP– it is the appetizer for your audience

o introo talkso papero Cloudso Map/

Reduceo themes

research talks

• are NOT business talks

• don‘t sell

• present information– not too much (OutOfMemoryError)

– not too less (under-utilization)

• guide your audience– from known things

– step by step

– to your key message(-s)

o introo talkso papero Cloudso Map/

Reduceo themes

understanding your talk

• use examples

• use grafics, diagrams, pictures– they should be intuitive

– and must be explained

• make pauses

• ….

• and have a go

o introo talkso papero Cloudso Map/

Reduceo themes

slides

• big font (28 pt. or more)

• sans serif

• no sentences

• page numbers

• high-contrast

o introo talkso papero Cloudso Map/

Reduceo themes

effect of your talk

• impacts for keeping information

– 30% content

– 30% mimic

– 40% gesture

source: http://www.ifi.uzh.ch/groups/req/ftp/wap/WAP-Praesentationstechnik.pdf

o introo talkso papero Cloudso Map/

Reduceo themes

term paper

• is scientific „antiseptic“

• structure (recommended):

– introduction (motivation, definitions …)

– main part (describe the solution)

– outro (discussion, open issues …)

• describe in your own words

o introo talkso papero Cloudso Map/

Reduceo themes

style

• use short sentences

• be concise

– but don‘t let out important information

• cite correctly

• give it a clear structure

• visualize (and explain each graphic)

• give examples

o introo talkso papero Cloudso Map/

Reduceo themes

questions?

o introo talkso papero Cloudso Map/

Reduceo themes

About Clouds

Cloud Computing

o introo talkso papero Cloudso Map/

Reduceo themes

CC - definition

• combination of clusters and Grids

• on-demand computing

• mostly virtualized nodes (VMs)

• parallel and distributed

• dynamically provisioned

• presented as one or more unified computing resource(s)

source: „Cloud computing and emerging IT platforms: Vision, hype and reality for delivering computing as the 5th utility“, Buyya et.al, Future Generation Computer Systems 2009, Vol.26, Is.6

o introo talkso papero Cloudso Map/

Reduceo themes

Why Clouds?

• datacenter utilization is normallyaround 5% – 20% [AFG+10]

• sell compute power

– and storage

• industry needs a new hype after SOA

*AFG+10+ Armbrust, M., Fox, A., Griffith, R. et.al, „Above the Clouds: A Berkeley View of Cloud Computing“, UC Berkeley RAD Labs, 2010

o introo talkso papero Cloudso Map/

Reduceo themes

CC ingredients

• heterogeneous computer

• commodity hardware

• virtualisation

• high-speed network

o introo talkso papero Cloudso Map/

Reduceo themes

Cloud stack

source:http://www.saasblogs.com/images/uploads/2008/12/cloud_stack.gif

o introo talkso papero Cloudso Map/

Reduceo themes

Cloud Service Provider

• Google AppEngine

• Amazon Elastic Compute Cloud (EC2)

• Microsoft Azure

• force.com

• Google Docs

• … and many more

o introo talkso papero Cloudso Map/

Reduceo themes

questions?

o introo talkso papero Cloudso Map/

Reduceo themes

MapReduce

o introo talkso papero Cloudso Map/

Reduceo themes

MapReduce (logical)

M

M

M

R

R

M R

file Map partitions Reduce files

o introo talkso papero Cloudso Map/

Reduceo themes

MapReduce attributes

• fault tolerance

• implicit parallelisation

• data locality

• schema free

• robustness (skips „bad records“)

o introo talkso papero Cloudso Map/

Reduceo themes

questions?

o introo talkso papero Cloudso Map/

Reduceo themes

themes for your talk

o introo talkso papero Cloudso Map/

Reduceo themes

themes

1. MapReduce programming model

2. distributed file systems (GFS,HDFS)

3. Hadoop (without HDFS)

4. MapReduce vs. PDBMS (a comparison of MapReduce and Parallel DBMS)

5. HadoopDB (architectural hybrid of MapReduce and DBMS)

o introo talkso papero Cloudso Map/

Reduceo themes

themes

6. Hive (a data warehouse on top MapReduce with an SQL-like QL)

7. Pig/PigLatin (dataflow system with SQL-like QL)

8. PACT/Nephele (project Stratosphere - a database system in the Cloud – work in progress)

9. MapReduce Online (extension of the MapReduce model for online aggregation)

o introo talkso papero Cloudso Map/

Reduceo themes

themes

10. Map-Reduce-Merge (extension of the MapReduce model for joins)

11. SQL/MapReduce (uses MapReduce model for UDF-programming inside a DBMS - Aster Data nCluster)

12. MapReduce for Multi-Cores / Multi-Processors (Evaluation of the MapReduce model on Multi-Core and Multi-Processor systems – project Phoenix)

o introo talkso papero Cloudso Map/

Reduceo themes

themes

13. Dryad/DryadLiNQ (the Microsoft approach to Cloud Computing –execution system and a QL)

14. MapReduce and functional programming (the MapReduce model discussed from a functional programming perspective)

top related