March 14, 2016 Sam Siewert
SE420 - Software Quality Assurance
Lecture 9 – Negative Testing, Defect Tracking and Root-Cause Analysis
http://dilbert.com/strip/2010-08-21
Reminders Assignment #4
Remaining Assignments [Top Down / Bottom-Up] – #5 – Design, Module Unit Tests and Regression Suite – #6 – Complete Code, Refine and Run all V&V Tests and Deliver
Track Bugs with Bugzilla - http://prclab.pr.erau.edu/ Import your Project Code into GitHub - https://github.com/
Sam Siewert 2
Negative Testing, Defect Tracking and Root-Cause Analysis
Sam Siewert
3
http://www.nasa.gov/pdf/65776main_noaa_np_mishap.pdf, http://en.wikipedia.org/wiki/NOAA-19
Integration and Test Integrate Software Modules [units] and Hardware Components into Sub-systems Test Focus on Interfaces [Function, Message, Shared Memory, Hardware], Protocols, and Interoperability of Modules
Sam Siewert 4
Test Types – Goals Today Positive Tests
– Functional Software Interface Tests Functions calling Functions – API Message Passing – Local Message Queues, Network, Client-Server Shared Memory – Synchronization, Buffers
– Hardware Interface Tests Drivers and Device Interfaces Firmware [ROM Code, Run out of Reset]
Negative Tests
– Software Interface Faults – Hardware Interface Fault Injection
Bug Tracking, Defect Rate, How to Use for Project and SQA Management Root-Cause-Analysis [RCA] Wrap-Up – JPL Mars Pathfinder Story Diagnostics [Built-in Self-Test] Unit Interoperability
– Sub-system Resource Testing – Memory, CPU, I/O, Storage, Power – Protocols – Message Acknowledgement, Command/Response, Background Commands, Peer-to-
Peer, etc.
Performance Tests – Profiles and Traces
Sam Siewert 5
Outline for Every Integration Test 1. Check out Specific Source Code Test Configuration – CMVC Tools, Git
– Collection of Modules [Units] Tagged by Revision Control – OR Current
2. Build and Link Modules (*.o) and Libraries (*.a) into Sub-system to Test
3. Load / Install Sub-system Code onto Test Hardware Platform of Known
Configuration – Record key hardware configuration parameters – E.g. for I/O HW config - lspci, lsusb, – General config - hwinfo – Linux OS kernel build config - uname –a – cat /proc/meminfo – cat /proc/cpuinfo
4. Run Integrated Test(s) [with Gcov, Lcov, Gprof] 5. Review of Expected Syslogs, Output to Terminal, for Each Feature 6. Review Performance Profiles 7. Track Bugs, Anomalies, and Disposition as Defects
Sam Siewert 6
Bug Open/Close Rates and Readiness Controversy – Bug Counts, Closure and Prediction of Phase Transition Readiness – E.g. Unit to I&T to System Test to Acceptance Test to Shipment – Can Be inaccurate due to Unsatisfactory Testing or Lack of Criteria – Guideline for Project Management [Compared to Guessing!] – Not all Reported Bugs Become Defects [Test Case Errors, Human Error]
Sam Siewert 7
http://www.testandverification.com/DVClub/24_Jan_2011/Greg_Smith.pdf
Test
Cas
e C
over
age
[E.g
. Cod
e P
ath
Cov
erag
e]
Bug
Cou
nts
[Rep
orte
d, N
ot V
erifi
ed a
s D
efec
t]
Root-Cause Analysis Field Issue - Anomaly, Reported Bug, Data Corruption, … – Software Defect? – Hardware Reliability – User Error
Reproducibility – Capture Conditions via Logging – Recreate Scenario in SQA / QA Lab
Trace to Root-Cause – Assert – Analysis Triggers
– Propose Fixes – Apply and Regression Test – Release Maintenance Patch
Sam Siewert 8
Case Study – Mars Pathfinder Story JPL Mission Flow to Mars, Landing on July 4th, 1997 Pathfinder Rolling Resets on Final Approach to Mars Capture Orbit VxWorks RTOS Used Reproduction of Anomaly on the Ground Root-Cause Analysis Proposed Fix
Sam Siewert 9
Data Driven CPU Loading Root-Cause on Pathfinder was a Combination of Issues 1. Software Re-use and Unfortunate Default in Pipe
[INVERSION_SAFE, PRIORITY_ORDER, FIFO_DEFAULT] 2. Unbounded Priority Inversion [Interoperability Issue] 3. Increased Loading Due to Meteorological Analysis of
Candidate Landing Sites [Performance and Interoperability]
Sam Siewert 10
http://www.cse.uaa.alaska.edu/~ssiewert/archive/IBM-Out-of-print/soc-5.pdf
Note on Data Driven Algorithms and CPU Loading
Real-Time Algorithms Ideally have Fixed Computational Demands per Request – Provide Predictable Response,
Enables Accurate Rate-Monotonic Analysis
– Rate Monotonic Theory Requires Known C, T, D Inputs [CPU Required, Request Rate, Deadline Relative to Request Time]
Computer Vision and Image Processing Depends on Data from Instrument Observation – Parsing Scene for Linear Segments
[Edges] – Finding Elliptical or Circular Objects
[Craters, Holes, etc.] – Number of Features Found and
Processed will Vary! – Optical Navigation – Making an Impact:
AI Group at JPL Sam Siewert 11
Hough Linear Example
Hough Circular Example
Discussion … List of Theories for Root Cause [Good List, From OS, General Engineering Judgement] Suggestions for Teamwork [Good Approaches – Brainstorm, Gather all Cognizant Engineers into One Room – JPL, Wind River, RAD6000] Scenario and Anomaly [Rolling Reset on Approach] Reproduction on Ground System Software Re-Use and Lack of Default to Inversion Safe MUTEX in POSIX Pipes, Triggered due to Meteorological Increased CPU Loading for Landing Sites, Root-Cause Ground Verification and Uplink to Enable Inversion Safe Option for Hidden MUTEX Mission Saved and Quite Successful! Sam Siewert 12
Diagnostic Tests Primarily Hardware Tests, Driven by Software Could be OS test, E.g. During Boot of System – CPU – I/O – Network – Memory test – File system test – OS Services
Memory Test – Simple – Walking 1’s,
Address Bus Test, Pattern Tests all Read-after-Write to Address
– Advanced – ECC, SoC Drawer Paper Sam Siewert 13
E.g. Linux Boot-up Process for Centos 6.x
BIST – Built-in Self Tests SW Driven and Controlled Diagnostics [Firmware] Key to Hardware Verification Cooperative Hardware and Firmware Mode Make Available for Root-Cause Analysis Post-Ship or During I&T and System Testing E.g. Dell Laptops – LCD BIST Disk Drive Test-Unit Ready – sg_turs, T10 TUR
Sam Siewert 14
Performance Tests Profiling – Gprof – Open souce tool [similar to Gcov, but for Profiling] – Vtune – Commercial Tool from Intel – Logic Analyzer and HP’s SPA (Statistical Performance Analysis)
Tracing – E.g. Timestamps output to syslog
Statistics – top, htop – iostat – memstat
Workloads – Iometer – stress
Sam Siewert 15
Performance - Sysprof What is Using CPU on my System Rather than Profile of an Application – Sub-System [Service]
Sam Siewert 16
Gprof Simple –pg compile opiton Run, gprof on gmon.out to get analysis
Sam Siewert 17
%make cc -O3 -Wall -pg -msse3 -malign-double -g -c raidtest.c raidtest.c: In function 'main': raidtest.c:99: warning: format '%d' expects type 'int', but argument 2 has type 'long unsigned int' raidtest.c:68: warning: unused variable 'aveRate' raidtest.c:68: warning: unused variable 'totalRate' raidtest.c:66: warning: unused variable 'rc' raidtest.c:212: warning: control reaches end of non-void function cc -O3 -Wall -pg -msse3 -malign-double -g -c raidlib.c cc -O3 -Wall -pg -msse3 -malign-double -g -o raidtest raidtest.o raidlib.o %./raidtest Will default to 1000 iterations Architecture validation: sizeof(unsigned long long)=8 RAID Operations Performance Test Test Done in 453 microsecs for 1000 iterations 2207505.518764 RAID ops computed per second %ls Makefile gmon.out raidlib.h raidlib64.c raidtest raidtest.o Makefile64 raidlib.c raidlib.o raidlib64.h raidtest.c raidtest64 %gprof raidtest gmon.out > raidtest_analysis.txt
Gprof Analysis 1 million iterations of RAID test XOR and Rebuild
Sam Siewert 18
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 82.13 1.54 1.54 main 15.47 1.83 0.29 2000001 145.38 145.38 xorLBA 2.67 1.88 0.05 2000001 25.07 25.07 rebuildLBA % the percentage of the total running time of the time program used by this function. cumulative a running sum of the number of seconds accounted seconds for by this function and those listed above it. self the number of seconds accounted for by this seconds function alone. … calls the number of times this function was invoked, if this function is profiled, else blank. self the average number of milliseconds spent in this ms/call function per call, … total the average number of milliseconds spent in this ms/call function and its descendents per call, … name the name of the function. …
RAID Operations Performance Test Test Done in 206417 microsecs for 1000000 iterations 4844562.221135 RAID ops computed per second
Call Graph Profile from Gprof
Sam Siewert 19
Call graph (explanation follows) granularity: each sample hit covers 2 byte(s) for 0.53% of 1.88 seconds index % time self children called name <spontaneous> [1] 100.0 1.54 0.34 main [1] 0.29 0.00 2000001/2000001 xorLBA [2] 0.05 0.00 2000001/2000001 rebuildLBA [3] ----------------------------------------------- 0.29 0.00 2000001/2000001 main [1] [2] 15.4 0.29 0.00 2000001 xorLBA [2] ----------------------------------------------- 0.05 0.00 2000001/2000001 main [1] [3] 2.7 0.05 0.00 2000001 rebuildLBA [3] ----------------------------------------------- This table describes the call tree of the program, and was sorted by the total amount of time spent in each function and its children… % time This is the percentage of the `total' time that was spent in this function and its children… self This is the total amount of time spent in this function. children This is the total amount of time propagated into this function by its children. called This is the number of times the function was called…
Discussion and Q&A I&T is to Verify and Validate Sub-systems from Integrated SW Units and HW Components, in a Configuration – Unit Tests Precede – Integrate and Configure – Function/Feature Positive Tests – Negative Testing [Fault Injection] – Interoperability Testing – Diagnostics, Root-Cause, and Bug Tracking Critical New Aspects – Performance Testing [of Integrated and Configured Sub-systems] – Determine Readiness for Final Integration and Entry to System
Testing – Provides Regression Test Cases for System Test
Precedes System Test, Where Sub-systems are … – Fully Integrated – Configured Similar to Deployment [Perhaps Not Exact – E.g.
Spacecraft in Thermal-Vac Testing] – Stimulated with Tests Replicating Operations
Sam Siewert 20