iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
DESCRIPTION
TRANSCRIPT
![Page 1: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/1.jpg)
1
Testing Big Data
Prepared by: Anca Andreea Sfecla, Quality Assurance Manager Embarcadero Technologies Romania
@ CODECAMP 2013,
20th April 2013
![Page 2: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/2.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
![Page 3: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/3.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
What is Big Data?
• “Big Data is the frontier of a firm’s ability to store, process, and access all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.” - Forrester Research
• “Big data creates a new layer in the economy which is all about information, turning information, or data, into revenue. In 2013, big data is forecast to drive $34 billion of IT spending” – Gartner Research
![Page 4: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/4.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Characteristics
Big Data
Volume
Variety
Velocity
Value
![Page 5: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/5.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Characteristics
Big Data
Volume
Variety
Velocity
Value
![Page 6: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/6.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Characteristics
Big Data
Volume
Variety
Velocity
Value
![Page 7: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/7.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Characteristics
Big Data
Volume
Variety
Velocity
Value
![Page 8: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/8.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Characteristics
Big Data
Volume
Variety
Velocity
Value
![Page 9: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/9.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Success Stories
• Detecting infections in premature infants up to 24 hours before they exhibit symptoms
• Reducing the cost of sequencing a genome from $10,000 to less than $100
• Predict flu outbreaks by analyzing massive number of Google searches related to flu symptoms
![Page 10: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/10.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
EDW versus Big Data
![Page 11: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/11.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
EDW versus Big DataClean Data Unclean Data
Gigabytes to Terabytes(1000 GB)
Petabytes(1000 TB) to Exabytes(1000 PB)
Simplified, Structured Complex, Semi or Unstructured
Data from relational database
Data from non-relational flat file storage
Centralized data Distributed data
Structured Database Schema
Customized-instant schema, generated
![Page 12: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/12.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Solutions
Microsoft Big Data Solution
![Page 13: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/13.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Solutions
![Page 14: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/14.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Solutions
![Page 15: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/15.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
![Page 16: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/16.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Processing using Hadoop Framework
![Page 17: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/17.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Analytics
Web Logs StreamingData Social Data Transactional
Data (RDBMS)
Enterprise Data Warehouse
HAD
OO
P
HivePig
MapReduce(Job Execution)HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
Processed Data
Data Load using Sqoop
ETL Process
Big Data Architecture
![Page 18: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/18.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Architecture
Big Data Analytics
Web Logs StreamingData Social Data Transactional
Data (RDBMS)
Enterprise Data Warehouse
HAD
OO
P
HivePig
MapReduce(Job Execution)HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
Processed Data
Data Load using Sqoop
ETL Process
1 Pre-HadoopProcessing
![Page 19: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/19.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Possible problems• incorrect data captured from source systems
• incorrect storage of data
• incomplete or incorrect replications
![Page 20: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/20.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Analytics
Web Logs StreamingData Social Data Transactional
Data (RDBMS)
Enterprise Data Warehouse
HAD
OO
P
HivePig
MapReduce(Job Execution)HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
Processed Data
Data Load using Sqoop
ETL Process
Big Data Architecture
1 Pre-HadoopProcessing
2 Map-Reduce process validation
![Page 21: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/21.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Possible problems• coding issues in map-reduce jobs
• jobs working correctly when run in standalone node, but working incorrectly when run on multiple nodes
• incorrect aggregations, node configurations and incorrect output format
![Page 22: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/22.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Analytics
Web Logs StreamingData Social Data Transactional
Data (RDBMS)
Enterprise Data Warehouse
HAD
OO
P
HivePig
MapReduce(Job Execution)HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
Processed Data
Data Load using Sqoop
ETL Process
Big Data Architecture
1 Pre-HadoopProcessing
2 Map-Reduce process
validation
3 Data Extract and Load Process
![Page 23: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/23.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Possible problems• incorrectly applied transformation
rules
• incomplete data extract from HDFS
• incorrect load of HDFS files into analysis tools
![Page 24: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/24.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Analytics
Web Logs StreamingData Social Data Transactional
Data (RDBMS)
Enterprise Data WarehouseH
ADO
OP HivePig
MapReduce(Job Execution)HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
Processed Data
Data Load using Sqoop
ETL Process
Big Data Architecture
1 Pre-HadoopProcessing
2 Map-Reduce process
validation
3 Data Extract and Load Process
Reports testing
![Page 25: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/25.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Possible problems
• report definitions not set as per requirement
• report data issues
• layout and format issues
![Page 26: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/26.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Big Data Analytics
Web Logs StreamingData Social Data Transactional
Data (RDBMS)
Enterprise Data WarehouseH
ADO
OP HivePig
MapReduce(Job Execution)HBase(NoSQL DB)
HDFS (Hadoop Distributed File System)
Processed Data
Data Load using Sqoop
ETL Process
Big Data Architecture
1 Pre-HadoopProcessing
2 Map-Reduce process
validation
3 Data Extract and Load Process
Non
Fun
ction
al T
estin
g Reports testing
![Page 27: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/27.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Possible problems• imbalance in input splits
• redundant sorts
• moving most of the aggregation computations to the Reduce process
• node failures
• data corruption
![Page 28: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/28.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
New to the tester
• Semi-structured and unstructured data
• Immense volumes of dynamic, complex data
• Test environment
• Big Data ecosystem
• Pure programming tools
• Non-SQL interrogations
![Page 29: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/29.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Testing Big Data
• Big
• Fast
• Complex
• Rewarding
![Page 30: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/30.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Q&A
![Page 31: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero](https://reader034.vdocuments.us/reader034/viewer/2022051411/54536705af79597c338b4769/html5/thumbnails/31.jpg)
Prepared by Anca Sfecla, QAM - Embarcadero Technologies
Thank you!
& Please fill in your evaluation form [email protected]