Download - Cloud Computing
![Page 1: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/1.jpg)
Cloud Computing
Other High-level parallel processing languages
Keke Chen
![Page 2: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/2.jpg)
Outline sawzall Dryad and DraydLINQ (MS, abandoned) Hive
![Page 3: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/3.jpg)
Sawzall Simplify mapreduce programming Filters + aggregator
mapper reducer
![Page 4: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/4.jpg)
Example
mappers
reducers
Convert the input record to float
![Page 5: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/5.jpg)
input Sawzall program works on a single
record As a filter filtering through the data stream
Input can be parsed to Values, e.g., float Data structurex: float = input;(variable : type = input)
![Page 6: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/6.jpg)
aggregators definition
table agg_name of data_type/variable
Examples: c: table collection of string; S: table sample(100) of string; S: table sum of {count: int, revenue: float}
More aggregators Maximum, quantile, top, unique
![Page 7: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/7.jpg)
Indexed aggregators similar to “group by”, the index is group
id Example
t1: table sum[country: string] of intcountry: string = input;Emit t1[country] <- 1;
![Page 8: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/8.jpg)
More example
Proto “querylog.proto”queries_per_degree: table sum[lat: int]
[lon:int] of int;Log_record: queryLogProto = input;Loc: Location = locationinfo(log_record.ip);Emit queries_per_degree[int(loc.lat)]
[int(loc.lon)]<-1
![Page 9: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/9.jpg)
Performance
Single-CPU speed, Also 51 times slower than compiled C++
![Page 10: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/10.jpg)
Performance
![Page 11: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/11.jpg)
Dryad and DryadLINQ Dryad provides a low-level parallel data
flow processing interface Acyclic data flow graphs Data communication methods include pipes,
file-based, message, shared-memory
DryadLINQ A high level language for app developers It hides the data flow details
![Page 12: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/12.jpg)
Job = Directed Acyclic Graph
Processingvertices Channels
(file, pipe, shared memory)
Inputs
Outputs
![Page 13: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/13.jpg)
Runtime
Services Name server Daemon
Job Manager Centralized coordinating process User application to construct graph Linked with Dryad libraries for scheduling
vertices Vertex executable
Dryad libraries to communicate with JM User application sees channels in/out Arbitrary application code, can use local FS
V V V
![Page 14: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/14.jpg)
Graph operators
![Page 15: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/15.jpg)
Hive Developed by facebook (open source) Mimic SQL language Built on hadoop/mapreduce
![Page 16: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/16.jpg)
Hive data model: table etc. Table
Similar to DB table stored in hadoop directories Builtin compression, serialization/deserialization
Partitions Groups in the table Subdirectory in the table directory
Buckets Files in the partition directory Key (column) based partition
/table/partition/bucket1
![Page 17: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/17.jpg)
Hive data model: Column type integers, floating point numbers, generic
strings, dates and booleans nestable collection types: array and
map.
![Page 18: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/18.jpg)
![Page 19: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/19.jpg)
Architecture
Metastore stores the schema of databases. It uses non HDFSdata store
![Page 20: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/20.jpg)
Query processing Steps (similar to DBMS)
Parse Semantic analyzer Logical plan generator (algebra tree) Optimizer Physical plan generator (to mapreduce jobs)
![Page 21: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/21.jpg)
Operations: DDL and DML HiveQL: SQL like, with slightly different
syntax User defined filtering and aggregation
functions Java only
Map/reduce plugin for streaming process Implemented with any language
![Page 22: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/22.jpg)
Example Facebook status updates
Table: status_updates(userid int, status string,ds string) profiles(userid int,school string,gender int)
Operations Load data
LOAD DATA LOCAL INPATH `/logs/status_updates‘ INTO TABLE status_updates PARTITION (ds='2009-03-20')
Count status updates by school and by gender
![Page 23: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/23.jpg)
More query examples
![Page 24: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/24.jpg)
Query examples
![Page 25: Cloud Computing](https://reader035.vdocuments.us/reader035/viewer/2022062721/568135ce550346895d9d3529/html5/thumbnails/25.jpg)
Query examples – using hadoopstreaming