big data testing

Post on 30-Dec-2015

39 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analysed with traditional computing techniques. - PowerPoint PPT Presentation

TRANSCRIPT

BIG DATA TESTING

By QA InfoTech

Scenario

OMG!! Did he just asked me to catch rats in a place full of snakes

3

Agenda

1. What is Big Data2. Characteristic of Big Data3. Meaning of BIG DATA to “US”4. Hadoop6. Submitting a Map Reduce Job

What is BIG DATA?

• ‘Big Data’ is similar to ‘small data’, but bigger in size

• Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques.

• Walmart handles more than 1 million customer transactions every hour.

• Facebook handles 40 billion photos from its user base.

• Decoding the human genome originally took 10years to process; now it can be achieved in one week.

Three Characteristics of Big Data V3s

Volume

•Data

quantity

Velocity

•Data Speed

Variety

•Data Types

What BIG DATA TESTING mean to Testers?

Take into consideration these 3 perspectives:• Data • Infrastructure• Validation Tools

Now the questions comes what technology is needed for handling BIG DATA ?

1.HADOOP

Hadoop & Its Components

• Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Essentially, it accomplishes two tasks: massive data storage and faster processing.

Source: http://www.trieuvan.com/apache/hadoop/common/

How is Hadoop Helping?

• HDFS: Java based distributed FS that can run and store all kinds of data• Map Reduce: A software programming model for processing large set of

data in parallel• YARN: A resource management framework for scheduling and handling

resource requests from distributed applications

11

This is our Input File : Input Sampleset.txt

12

Map Reduce Program For Max Temperature :Driver Class

Job job = new Job();job.setJarByClass(MaxTemperatureDriver.class);job.setJobName("Max Temperature");

FileInputFormat.addInputPath(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setMapperClass(MaxTemperatureMapper.class);job.setReducerClass(MaxTemperatureReducer.class);

13

Mapper Class

@Overridepublic void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException {String line = value.toString();String year = line.substring(15, 19);int airTemperature;if (line.charAt(87) == '+') { // parseInt doesn't like leading plus// signsairTemperature = Integer.parseInt(line.substring(88, 92));} else {airTemperature = Integer.parseInt(line.substring(87, 92));}

14

Reducer Class

@Overridepublic void reduce(Text key, Iterable<IntWritable> values,Context context)throws IOException, InterruptedException { int maxValue = Integer.MIN_VALUE;for (IntWritable value : values) {maxValue = Math.max(maxValue, value.get());}context.write(key, new IntWritable(maxValue));}}

top related