big data

16
Big Data Anton Boyko

Upload: mary-arnold

Post on 02-Jan-2016

80 views

Category:

Documents


1 download

DESCRIPTION

Big Data. Anton Boyko. Agenda. What is Big Data? Why Big Data? How to Big Data?. What is Big Data?. Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, manage , and process the data within a tolerable elapsed time. Data growth. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Big Data

Big DataAnton Boyko

Page 2: Big Data

Agenda

• What is Big Data?• Why Big Data?• How to Big Data?

Page 3: Big Data

What is Big Data?

Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time.

Gigabytes

Terabytes

Petabytes

Page 4: Big Data

Data growth

Big Data

Volume 10x

Velocity 4.3

Variety 85%

Page 5: Big Data

How to process Big Data?

Traditional way Appropriate

way

Page 6: Big Data

Move data to compute

Page 7: Big Data

Move compute to data

• Fast storage vs. fast CPU and fast networking

• Linear scalability

Page 8: Big Data

Map/Reduce workflow

File system File system

Mappers(find

matches)

Reducers(combine matches)

Mappers(inverse keys and values)

Reducer (combine results)

DFS temp

Page 9: Big Data

Map/Reduce – how it workspublic class NamespaceMapper : MapperBase{ //Override the map method. public override void Map(

string inputLine,MapperContext context)

{ var reg = new Regex(@"(using)\s[A-za-z0-9_\.]*\;"); var matches = reg.Matches(inputLine);

foreach (Match match in matches) { //Just emit the namespaces. context.EmitKeyValue(match.Value,"1"); } }}

public class NamespaceReducer : ReducerCombinerBase{ //Accepts each key and count the occurrences public override void Reduce(

string key,IEnumerable<string> values,

ReducerCombinerContext context) { //Write back context.EmitKeyValue(key,values.Count().ToString()); }}

Page 10: Big Data

Traditional RDBMS vs. Map/Reduce

RDBMS

• Terabytes of data

• Static schema• Interactive and

batch access• Nonlinear

scaling

Map/Reduce

• Exabytes of data (or more)

• Dynamic schema• Batch access

only• Linear scaling

Page 11: Big Data

Hadoop – implementation of Map/Reduce engine

Page 12: Big Data

Hadoop ecosystem

Page 13: Big Data

Offering

• ODBC for Excel• PowerPivot• Windows Server or Windows Azure• C#, Java, JavaScript

Page 14: Big Data

Demo

Page 15: Big Data

Pricing

Head Node

• Single extra large instance (8 CPU 14 GB)

• $0.32 per hour• $238 per

month

Compute Node

• One or more large instances (4 CPU 7 GB)

• $0.16 per hour• $119 per

month

Page 16: Big Data

Вопросы?Антон Бойко

[email protected]