big data testing

15
BIG DATA TESTING By QA InfoTech

Upload: qainfotech123

Post on 30-Dec-2015

39 views

Category:

Documents


0 download

DESCRIPTION

Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analysed with traditional computing techniques. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: BIG  DATA TESTING

BIG DATA TESTING

By QA InfoTech

Page 2: BIG  DATA TESTING

Scenario

Page 3: BIG  DATA TESTING

OMG!! Did he just asked me to catch rats in a place full of snakes

3

Page 4: BIG  DATA TESTING

Agenda

1. What is Big Data2. Characteristic of Big Data3. Meaning of BIG DATA to “US”4. Hadoop6. Submitting a Map Reduce Job

Page 5: BIG  DATA TESTING

What is BIG DATA?

• ‘Big Data’ is similar to ‘small data’, but bigger in size

• Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques.

• Walmart handles more than 1 million customer transactions every hour.

• Facebook handles 40 billion photos from its user base.

• Decoding the human genome originally took 10years to process; now it can be achieved in one week.

Page 6: BIG  DATA TESTING

Three Characteristics of Big Data V3s

Volume

•Data

quantity

Velocity

•Data Speed

Variety

•Data Types

Page 7: BIG  DATA TESTING

What BIG DATA TESTING mean to Testers?

Take into consideration these 3 perspectives:• Data • Infrastructure• Validation Tools

Page 8: BIG  DATA TESTING

Now the questions comes what technology is needed for handling BIG DATA ?

1.HADOOP

Page 9: BIG  DATA TESTING

Hadoop & Its Components

• Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Essentially, it accomplishes two tasks: massive data storage and faster processing.

Source: http://www.trieuvan.com/apache/hadoop/common/

Page 10: BIG  DATA TESTING

How is Hadoop Helping?

• HDFS: Java based distributed FS that can run and store all kinds of data• Map Reduce: A software programming model for processing large set of

data in parallel• YARN: A resource management framework for scheduling and handling

resource requests from distributed applications

Page 11: BIG  DATA TESTING

11

This is our Input File : Input Sampleset.txt

Page 12: BIG  DATA TESTING

12

Map Reduce Program For Max Temperature :Driver Class

Job job = new Job();job.setJarByClass(MaxTemperatureDriver.class);job.setJobName("Max Temperature");

FileInputFormat.addInputPath(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setMapperClass(MaxTemperatureMapper.class);job.setReducerClass(MaxTemperatureReducer.class);

Page 13: BIG  DATA TESTING

13

Mapper Class

@Overridepublic void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException {String line = value.toString();String year = line.substring(15, 19);int airTemperature;if (line.charAt(87) == '+') { // parseInt doesn't like leading plus// signsairTemperature = Integer.parseInt(line.substring(88, 92));} else {airTemperature = Integer.parseInt(line.substring(87, 92));}

Page 14: BIG  DATA TESTING

14

Reducer Class

@Overridepublic void reduce(Text key, Iterable<IntWritable> values,Context context)throws IOException, InterruptedException { int maxValue = Integer.MIN_VALUE;for (IntWritable value : values) {maxValue = Math.max(maxValue, value.get());}context.write(key, new IntWritable(maxValue));}}