regression testing & quality assurance · regression testing qor regression testing stability...

Regression Testing &

Quality Assurance

Lukas van Ginneken

August 1, 2019

Outline

• The release process and regression testing

• Roles in testing

• The release process

• Branching and merging

• Other types of testing

• Regression testing

• QOR regression testing

• Stability and Repeatability

Testing in commercial EDA

• Alpha testing

– By developers (R&D) • for bug fixes

• new features and enhancements

– By application engineers (AE) • Usually designs supplied by customers

– Regression testing by QA department

• Beta testing

– Competetive Benchmarks

– Acceptance testing by client EDA dept.

– Bug reporting

Commercial EDA release flow

developers

R&D

QA

Regressions

Applications

Engineering

Client

EDA dept Designer

training training

QA

• Runs regression tests

• Only QA is authorized to check in code to the master

branch

• Determines the order of merges to the master branch

• Executes release schedule

• Generally does not develop regressions tests

Developers

• Develop unit tests

• Installs bug test cases as regresssion tests

• Runs regression tests before submitting branch

Applications engineering

• Supports customers

• Interface between customers are developers

• Files bug reports

• Verifies or creates test cases for bug reports

• Creates end-to-end and QOR regression tests

• Does not run regression tests

Customers

• Supply designs, libraries, floor plans,

• Provide test cases

• Develop their own acceptance tests

Release train

Release

• Runs on a schedule

• Don't miss the train

Gates

Check points for merging and releasing

1. Submit contribution to main branch (Developer) • Verify bug is fixed

• New regression test for new feature

• Run some tests to make sure it didn't break anything else

2. Checkin on main branch (QA) • Smoke tests to verify DOA

• Unit tests to verify by modules

• Select set of QOR tests

3. Minor Release (critical bug fixes or expedited request) (QA) • Feature or bug specific tests

• Customer specific tests

4. Major Release (new functionality) (QA) • Comprehensive tests

Version management and QA

1 2 3 4 5

1 2 3

Master branch

Developer branch

FAIL

Version management and QA

1 2 3 4 5

1 2 3

Master branch

Developer branch 4

6

resolve conflicts

here

fewer problems

here

Code coverage testing

• Function coverage

• Statement or line coverage

• Branch coverage

• Loop coverage

• Variable coverage

Tools • Covtool

• gcov

• PureCoverage

• Many others

• Surprisingly large amount of code is not

covered

• Code not covered is guaranteed to be

untested

• Code covered is not guaranteed to be bug

free.

Memory tests

• Use memory debugger software

• Detect memory errors such as

– Reading uninitialized memory

– Accessing freed memory

– Exceeding array size

– Access outside a malloced block

– Memory leaks (forget to free memory)

• Potential problems and “unclean” code

• May not be an actual problem (a bit like lint)

• Examples: dmalloc, PurifyPlus, Electric fence

Other types of testing

• Performance testing - e.g. gprof, Quantify

• Cache misses

• Static verification - Coverity

• Lint

Input Sanity checks

Design sanity checks • combinational loops

• disconnected pins

• excessive levels of logic

• latches

• asynchronous logic

• data signals feeding clock inputs

• unconstrained IOs

• bad clock latency

Libary sanity checks • unroutable pins

• off grid pins

• use of M2

• timing sense does not match logic

function

• negative slope of NLDM tables

Regression test types

• Smoke tests – Test if basic functionality is not totally broken

– E.g. can it read a design or execute a command

– Read DEF test, Read LEF test, bring up the GUI

• Unit tests – Test a unit

• e.g. global placer, timer, detailed router

• Bug fix tests – Test cases that were originally filed with bug

reports

– tedious and time consuming

• Feature tests

– Features that span the system

• e.g. dont_touch

• QOR tests

– QOR of separate algorithms

– QOR of whole designs

• Release tests – Full suite of all tests

– Expensive to run in compute time.

– Requires server farm with batch job

scheduler

Regression tests

• Pass/Fail

• Uniform calling method

– Executable

– Directory

– Script (always run.tcl)

• exit 0 = success

• exit N = failure

• Written in TCL

• Direct measurement of performance

metrics.

• Adjustable error messages

(Info, warning, error, fatal)

grepping log files from TCL

Diff reports appropriate for timer, not for QOR

tests

stack trace for crashes

Check in regressions in same repository

• Keep track of versions

Test harness

• Batch processing

• Cluster computing

• e.g. LSF, Slurm, Loadleveler

QOR Metrics

• Primary metrics

– Timing violations

• WNS worst negative slack

• TNS total negative slack

• Hold time slack

• Max slew / Min slew

• Clock skew

– DRC violations

• shorts, spacing, open, via, min area, antenna

– User constraint violations

• dont_touch

• dont_use

• regions violations

• blockage violations

QOR Metrics

• Secondary metrics

– Power dynamic/leakage

• Dynamic power consumption often is difficult

to determine accurately

– IR drop, EM metrics

• Depends on dynamic power consumption

– Utilization

• Not important, except as indicator

– Congestion

• Total #overflows

• Worst gcell overflow

– Wirelength

• Half perimeter bounding box

• Steiner tree

• Global routing

– #vias

– Testability

– Run time

– Memory usage

Interpretation of constraints

• Is the clock frequency a hard constraint or merely a suggestion?

• Should the system violate timing to meet an area constraint?

• Should the system violate min/max slew to meet timing or power?

• Setup violations can usually be fixed by slowing down the clock

• Hold violations baked into the silicon (can't be fixed by slowing the clock)

• “I gave you impossible constraints but I still want a reasonable result”

Challenges in QOR testing

• Stability – Small changes in the input can cause big differences in the output

– Butterfly effect

• Repeatability

• Interpretation of constraints

• Size of the solution space

• Correlation of metrics

– utilization to congestion

– global routing congestion to detailed routing DRCs

– synthesis slack vs post opt slack

Repeatability testing

• Multiple runs with seemingly identical inputs can result in different outputs

• Not really, but the input differences can be in

unexpected and unintended places such as:

– User name, working directory, date and time

– Host name, host configuration, OS version

– Environment variables

– Search paths

• A common path for divergence is something like:

– char *pwd = strdup(getenv(“PWD”));

– This causes a variable amount of memory to be allocated, depending on the path name of the current directory

– This causes subsequent malloc() calls to return different addresses

– This causes many objects to have different addresses

– An algorithm uses address dependent logic, such as sorting or hashing by address

– This changes net order, or cell order or pin order

– This causes heuristics to make different optimizations

– Normal instability causes more divergence

• Tracking down the source of the divergence isn't always easy

The stability problem

• Small changes in input can have big consequences

• Butterfly effect – Inherent to deterministic non linear systems

– Examples: the weather, the stock market, fluid dynamics, discrete optimization heuristics

• “Meaningless” changes, such as: – Module, file, net, cell or pin order

– Net cell or pin names or line numbers

• Actual small changes, such as the inverting of a

single pin or moving a pad

• Divergence tends to grow with more optimization

• Discrete decisions are the main causes: eg

– automatic macro and pad placement

– buffer insertion

– routing order

• Smooth algorithms are less unstable: eg. global

placement

The “uses” of instability

• Instability is sometimes used as an optimization technique

– Seemingly arbitrary parameter variations cause different results

– Run a number of nearly identical runs in parallel

– Pick the best one at the end

• Seemingly arbitrary parameter or script changes can cause a regression test

to pass or fail

• Or they can “fix” bugs

• Can lead to organizational “thrashing” - making frequent code changes that

appear to be improvements but are actually arbitrary

QOR testing

This means that it's difficult to prove a parameter setting is better without

doing extensive and well controlled experiments.

• Similar to problems in the medical field

• Requires a “study” • Many data points required

• Each data point takes hours of run time

• Many parameters to tweak - huge search space

• Hard to separate signal / noise

• Underlying system keeps changing

– nullifies learned lessons

So what do people actually do?

• Make a tweak and test it.

• Due to instability this actually has a decent

chance of making things better.

• Call it good and report the problem solved.

• If it didn't work, try a different tweak.

• Some of these are pure luck

• Some of them are genuine improvements

Tendency to tack on additional fixing

steps at the end of the scenario • One last sizing

• One last buffering

• One last legalization

• One last detailed route

• Avoids disturbing the process before the point

where the problem becomes obvious.

QOR regressions

• Create unit tests

• Simple global metric

– Global placement -> HPWL < limit

– Sizing -> TNS == 0

– CTS -> clock skew

– Global routing -> congestion

• Don't diff entire log files or entire designs

• exit 0 if successful

• exit 1 if unsuccessful

• Keep blocks small enough to have

reasonable run time

• Keep blocks large enough to be

meaningful

• Maybe about 5,000 - 100,000 cells

QOR Integration tests

• Must be somewhat challenging

• Usually benchmark design

• High maintenance

• Use for Release testing not overnight build

regression testing & quality assurance · regression testing qor regression testing stability...

Documents