![Page 1: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/1.jpg)
Bug bites Elephant?
Test-driven Quality Assurance
in Big Data Application Development
Dr. Dominik Benz, Inovex GmbH
2013/06/03, Berlin Buzzwords
![Page 2: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/2.jpg)
? TDD! ? ? ? ? ? ? ?
Write/execute tests, specify acceptance criteria, … 2
Who speaks… … the Elephant language?
Class A extends Mapper…
ROI, $$, …
apt-get install…
![Page 3: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/3.jpg)
3
The road… … to Big Data QA
our Big Data QA problem
the FitNesse approach
test data definition / selection
job & workflow control
result inspection
![Page 4: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/4.jpg)
4
QA problem
Web Intelligence @ 1&1
DWH
Hadoop Cluster
~ 1 billion log events / day, ~ 1 TB (thrift)
logfiles
chains of MR jobs, running on
20 nodes / 8 cores / 96 GB RAM (CDH)
BI reporting, web analytics, …
![Page 5: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/5.jpg)
5
QA problem
An exemplary workflow
Log Files
(thrift)
Log Files(thrift)
Log Files
(thrift)
Inter-mediate result (avro)
MR job 1
…DWH
(RDBMS)
MR job 2
create (sample) input data
? inspect (binary) formats
? control workflows
?
![Page 6: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/6.jpg)
method
tests what? issues for our usecase
JUnit isolated functions no integration, Java syntax
MRUnit 1 mapper + 1 reducer „little“ integration, Java syntax
iTest hadoop jobs/workflows
Java / Groovy syntax
Scripts/CLI
(manual) scripting/inspect.
„script chaos“, syntax
6
QA problem
Existing Approaches
FitNesse as suitable addition / solution!
![Page 7: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/7.jpg)
7
The road… … to Big Data QA
Big Data QA is different!
the FitNesse approach
test data definition / selection
job & workflow control
result inspection
![Page 8: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/8.jpg)
8
FitNesse In a nutshell
„fully integrated standalone wiki and acceptance testing framework”
„executable“ Wiki-Pages (returning test results)
(almost) natural language test specification
connection to SUT via (Java-)“Fixtures“
![Page 9: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/9.jpg)
9
FitNesse Architecture Overview
script | check | num results | 3 |
Browser
FitNesse Server
public int numResults { ... }
System under Test
Fixtures
„calling java methods from wiki“, compare return values
Integrates with REST, Jenkins…
![Page 10: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/10.jpg)
10
FitNesse An Exemplary Test
![Page 11: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/11.jpg)
11
FitNesse Exemplary Test Source
!path /home/inovex/lib/*.jar
| script | Hadoop |
| upload | viewLog.csv | to hdfs | /testdata/ |
| hadoop job from jar | viewLog.jar | [...] |
| show | job output |
| check | number of output files | 3 |
![Page 12: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/12.jpg)
12
FitNesse Hadoop Fixture Java Code
public class Hadoop {
public boolean uploadToHdfs(String localFile,
String remoteFile) {...}
public boolean hadoopJobFromJar(String jar,
String input, String output) {...}
public String jobOutput() {...}
public String numberOfOutputFiles() {...}
}
![Page 13: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/13.jpg)
13
The road… … to Big Data QA
Big Data QA is different!
Fitnesse Wiki test execution!
test data definition / selection
job & workflow control
result inspection
![Page 14: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/14.jpg)
14
Test Data
CSV
![Page 15: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/15.jpg)
‣ Big Data: Efficient data transfer among heterogeneous sources
‣ Define Interface via IDL, Compiler for many languages
15
Test Data
Thrift
![Page 16: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/16.jpg)
‣ Dev/Test Hadoop Cluster: Identical Hardware like Prod, but fewer nodes
‣ (random/biased) sampling e.g. on daily basis
‣ Feedback loop:
‣ identify „special cases“ from real data
‣ include them in (manual) data definition
‣ Gradually increase test coverage / artefact quality
16
Test Data
Real World Data
![Page 17: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/17.jpg)
17
The road… … to Big Data QA
Big Data QA is different!
FitNesse Wiki test execution!
Define CSV / thrift / real-
world test data!
job & workflow control
result inspection
![Page 18: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/18.jpg)
‣ Execute arbitrary (shell) commands
‣ Mainly a wrapper around apache.commons.exec.CommandLine
18
Job Control
Swiss Army Knife: Shell
![Page 19: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/19.jpg)
‣ Hide complexity from test authors
‣ „define“ appropriate test language via (Java) method names
‣ re-use other fixtures (Shell, …) internally
19
Job Control
Hadoop Fixture
![Page 20: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/20.jpg)
‣ FitNesse allows to group tests into suites
‣ Can be used to simulate MR processing chains
‣ SetupSuite / TearDownSuite for creating / destroying test conditions
‣ Tests can still be executed individually
20
Job Control
Workflows & Suites
MR job 1
MR job 2
![Page 21: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/21.jpg)
21
The road… … to Big Data QA
Big Data QA is different!
FitNesse Wiki test execution!
Define CSV / thrift / real-world data!
Use suites & fixtures for jobs/workflows!
result inspection
![Page 22: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/22.jpg)
‣ Validate RDBMS contents (via JDBC)
‣ E.g. for checking the final result
‣ Or use Hive + Hive-Server to query raw data
22
Results Data Warehouse / Hive
![Page 23: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/23.jpg)
‣ Execute arbitrary pig commands from Wiki page
‣ Inspect e.g. binary intermediate results (avro, …)
23
Results Pig
![Page 24: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/24.jpg)
public class PigConsole extends PigServer {
public void loadAvroFileUsingAlias(String filename, String alias) {
this.registerQuery( alias + "= LOAD" + filename + "USING" + AVRO_STORAGE_LOADER + ";"); }
}
24
Results Pig Fixture extends PigServer
![Page 25: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/25.jpg)
25
Results Server Infrastructure
Fitnesse Master
TestEnvironments
ProjA ProjB
TestConfigurations
ProjA ProjB
dev
qs live dev
qs live
Import / edit tests remotely
QS ProjA Slave
Dev ProjA Slave
Live ProjA Slave
ProjA
QS ProjA Slave
Dev ProjA Slave
Live ProjA Slave
Import / edit config remotely
dev
qs live
![Page 26: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/26.jpg)
26
Thank you! [email protected]
Big Data QA is different!
FitNesse Wiki test execution!
Define CSV / thrift / real-world data!
Inspect results via Pig/Hive
Use suites & fixtures for jobs/workflows!
![Page 27: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/27.jpg)
27
Want more? Inovex trains you!
Android Developer Training (3 days, Karlsruhe/München)
Hadoop Developer Training (3 days, Karlsruhe/Köln)
Certified Scrum Developer Training (5 days, Köln)
Pentaho Data Integration Training (4 days, München/Köln)
Liferay Portal-Admin Training (3 days, Karlsruhe)
Liferay Portal-Developer Training (4 days, Karlsruhe)
information and registration at
www.inovex.de/offene-trainings
![Page 28: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/28.jpg)
28
Inovex @bbuzz
Stefan Kathrin
Bernhard
Jörg
Andrew Christi
an
Christian
![Page 29: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/29.jpg)
29
BACKUP
![Page 30: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/30.jpg)
30
FitNesse Server Infrastructure
Fitnesse Master
TestEnvironments
ProjA ProjB
TestConfigurations
ProjA ProjB
dev
qs live dev
qs live
Import / edit tests remotely
QS ProjA Slave
Dev ProjA Slave
Live ProjA Slave
ProjA
QS ProjA Slave
Dev ProjA Slave
Live ProjA Slave
Import / edit config remotely
dev
qs live
![Page 31: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/31.jpg)
‣ Download & install FitNesse server
‣ Create csv log file
‣ Run hadoop job which counts viewed items
‣ Inspect Results with Hive
31
Results Demo
![Page 32: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/32.jpg)
32
![Page 33: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/33.jpg)
33
FitNesse Exemplary Test Source
!path /home/inovex/lib/*.jar
| Table:Log File |
| /home/inovex/viewLog.csv | |
| date | user | product | browser | os |
| 2013-03-12 | john | 1 | ff | win |
| script | Hadoop |
| upload | viewLog.csv | to hdfs | /testdata/ |
| hadoop job from jar | viewLog.jar | [...] |
| show | job output |
| check | number of output files | 3 |
![Page 34: Bug bites Elephant? - 2013.berlinbuzzwords.de · Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Development Dr. Dominik Benz, Inovex GmbH 2013/06/03, Berlin](https://reader033.vdocuments.us/reader033/viewer/2022050418/5f8db8a72f32f438cb029c9b/html5/thumbnails/34.jpg)
34
FitNesse An Exemplary Test