on the origin of bugs - t&vs · continuous integration trunk based development code refactoring...

20
ON THE ORIGIN OF BUGS Or… Understanding Hardware Bugs And How To Avoid Them BRYAN DICKMAN DVCLUB: NOVEMBER 26 TH 2019 1 Where do bugs come from? What are the common ways that bugs are introduced into designs? What can design engineers and verification engineers jointly do to avoid them?

Upload: others

Post on 25-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

ON THE ORIGIN OF BUGSOr…

Understanding Hardware Bugs And How To Avoid Them

BRYAN DICKMAN DVCLUB: NOVEMBER 26TH 2019

1

Where do bugs come from? What are the common ways that bugs are introduced into designs?What can design engineers and verification engineers jointly do to avoid them?

Page 2: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

BRYAN DICKMAN: VALYTIC CONSULTING LIMITED

¡ A senior technology manager and people leader¡ Recognised industry expert with 35 years of experience in the

semiconductor industry¡ 22 years of leading engineering teams at Arm

¡ IP Design-Verification delivery over an era when many new methodologies were introduced into engineering workflows

¡ Development of Design and Verification best practices¡ Engineering data strategies that exploit modern data science practices to

drive rich engineering insights and process/workflow improvements¡ Senior Director within the Technology Services Group ¡ Experienced developer of people

¡ T&VS Associate¡ Acuerdo Limited Associate (Joe Convey)

2

https://www.linkedin.com/in/bryan-dickman-74b1a914/?originalSubdomain=uk

Page 3: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

INTRODUCTION AND DISCLAIMERS

¡ The following is a personal perspective

¡ None of the data shown is real data – it is hypothetical

¡ It’s all established thinking

¡ but some of it is more established in software development today

¡ I have a whitepaper on the subject to follow

3

Page 4: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

WHY DO WE STILL HAVE BUGS?

…and jobs as DV engineers?

4

Page 5: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

BUG FREE UTOPIA?

¡ ASSERTION: All complex designs contain bugs¡ No design can ever be 100% bug-free, no matter how hard

you try!

¡ Verification is a time and resource-limited quest to find as many bugs as possible before shipping

¡ Verification completeness is not generally achievable¡ Test planning can never be 100% complete

¡ Coverage models can never be 100% complete

¡ Infinite verification cycles is not possible

¡ …and so bugs will be missed

¡ Verification should employ strategies that increase the chances to find all bugs

¡ Designers should employ strategies that minimize bug risks

5

Page 6: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

TWO DISCIPLINES…With A Lot In Common

6

Logic DesignBehavioral modellingRTL codingSynthesisDFTTiming AnalysisPower AnalysisImplementation

'ABC’Testbench Design

CoverageTest Planning

UVMSoftware test development

Hardware AccelerationDeep Formal

SpecificationArchitecture

Micro-Architecture designSystem Architecture

BenchmarkingSimulation

Waveform AnalysisSoftware understandingVerification techniques

AssertionsFormal (for assertions)

Scripting/building workflowsData Analysis/Data Science

DESIGN VERIFICATION

Page 7: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

WHEN IS A BUG A BUG?…Only When You See It?

¡ Observable¡ An error is eventually seen (preferably detected by

verification)

¡ Lockups and denial of service – easy to detect – reset might workaround

¡ Data Corruption - if silent/undetected, consequences might be severe

¡ How ‘rare’ is it?

¡ What takes days and weeks to detect in the verification env, might manifest as a debilitating failure rate in silicon

¡ Non-Observable¡ Might be ‘spotted’ by chance, or found with formal?

¡ Might be an error in coding that is masked or unreachable

¡ Fix it or leave it? Weigh up the risks!

¡ Vulnerabilities¡ Reachable by malicious code, risk for security!

¡ Non-operational functions¡ Debug or event counters – software developers impacted

¡ Safety-critical and Reliability

¡ DRAM and logic sensitivity to SEUs – e.g. ECC functions for SECDED – errors rates in 1010 to 1017 errors/bit-h

¡ Performance and Power¡ Non-functional – but entitled performance is lost though

coding error, or too much power is consumed by device

¡ Clocking and Reset

¡ Asynchronous events can lead to meta-stability

7

Page 8: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

WHEN ARE BUGS FOUND?…The Sooner The Cheaper!

¡ OPINION: most bugs should be flushed out in the early stages (say with 30% of the work) these are the easy finds.

¡ Verification work (consumption of resources and human effort to find, debug and fix) is disproportionately high for the remaining 10% of bugs

¡ But these are often the critical ones

¡ Focus workflow and methodology improvements on this later stage for the biggest ROI

8

Page 9: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

COMMON ROOT CAUSES

9

Verification

Copy-Paste

Typos

Missing Code

Perfomance Tuning

Bug Fixing

Refactoring

Interfaces

Specifications Creating

Changing

¡ Bugs occur while creating code

¡ Spec errors/ambiguities

¡ Interface misunderstandings/spec

¡ Typos and Copy-Paste errors

¡ Incorrect verification env assumptions

¡ Missing code

¡ Or while changing code (code churn)

¡ Adding features

¡ Fixing bugs

¡ Performance tuning

¡ Or as a consequence of COMPLEXITY

Page 10: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

A WORD ON COMPLEXITY

10

¡ It’s complicated because…

¡ The architecture specification is complex to meet function and performance targets

¡ The architecture-implementation (or micro-architecture) use a catalog of complex ‘clever-tricks’ to meet performance targets (the Art of Design)

¡ It’s complicated due to…

¡ Behavior is no longer fully understood.

¡ Design partitioning is sub-optimal

¡ Code style is more implied-gates and less behavioral

¡ Comments are missing or worse – incorrect

¡ Code health has deteriorated – accumulation of technical (code) debt

¡ Is it measurable? ¡ Engineering experience and gut feel!

¡ LOCs

¡ MaCabe Cyclomatic Complexity

¡ Code indentation complexity

¡ Other metrics from Verilog compilers and linting tools

¡ E.g. #registers, #wires, gated clocks, redundant code, logic depth

¡ A search will reveal a limited number of tools to measure RTL complexity – most use McCabe

¡ If I can measure it, I can visualize it and then decide how to act upon it.¡ E.g. Refactor code, or intensify verification

Page 11: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

ESTABLISHED BUG AVOIDANCE

¡ Coding rules and static linting

¡ Design reviewing/ code scrubs

¡ Designer assertions

¡ Formal correct-by-construction

¡ If all else fails….

¡ Implement Feature toggle bits

11

Page 12: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

SOFTWARE DEVELOPMENT IDEAS

¡ In the last decade we have seen the emergence of DevOps for the software development world (rooted in Lean and then Agile)

¡ Most mainstream software platforms are developed and operated using the DevOps model.¡ Enables 10s,100s, 1000s software deployments per day, while achieving stability, reliability, high availability and

security.

¡ What DevOps principles might apply to hardware development?

12

Automate Testing:

Continuous Integration Trunk based

development

Code Refactoring

(build in >20%) Integrate Performance

Testing

Test-Driven Development

Pair Programming

(automate with Gerrit)

Blameless Post-Mortems

(Retrospectives)

Swarm on Defects

Telemetry:Continuous analysis of

metrics

Page 13: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

HOW CAN DATA AND ANALYTICS HELP?…assuming I am managing my data!

13

Page 14: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

COVERAGE ANALYTICS

¡ Not wishing to state the obvious…

¡ Well established practices of tracking all available coverage metrics to achieve 100% or as close as possible with analysis

¡ Remembering…

Covered != Verified

14

Page 15: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

BUG ANALYTICS

¡ Collecting and tracking bug data from your bugs database is a great way visualize where things are at

¡ We eventually expect a plateau

¡ But the plateau might just be a very shallow curve

¡ And there may be many false summits en-route

¡ Time to review and change something?

15

Page 16: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

CORRELATING DATA

¡ Better insights gained from correlating bug data with other data such as the verification effort (machine/human hours).¡ “I’m running good cycles, but bugs are no

longer being found”

¡ And/or commit data…¡ “not only are no bugs being found, the

design and verification codebases are stable”

¡ From that you judge when to stop!¡ Or migrate to the next platform e.g.

Emulation, FPGA?

¡ And what if a late bug is then found?¡ How does that impact my sign off

verification target?

16

Page 17: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

BUG PREDICTION…How Great Would That Be?

¡ It might be possible, and it may be a worthy endeavor to try

¡ It has been done before (see DVClub 2011 – Greg Smith*)

¡ Searching finds several papers for doing this for software

¡ Some use Machine Learning techniques

¡ So long as you have good datasets and have collected relevant design metrics for bugs

¡ Experiment with different training approaches e.g. Decision Trees, Naïve Bayes, Artificial Neural Networks (ANNs) to find the best prediction models

¡ Be aware of social factors and differences between teams when looking at historical datasets

17* https://www.testandverification.com/DVClub/24_Jan_2011/Greg_Smith.pdf

Page 18: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

CODEBASE ANALYTICS…Exploiting Version Control Data

¡ Another book recommendation

¡ Another idea from the software world

¡ Ideas on how to extract insights from GIT

¡ Hotspots indicate complex code with a high commit rate –what’s going on there? Complexity tracking over development time. Refactoring indicators.

¡ Correlate this with Defects

¡ Unexpected couplings between modules that frequently get committed together

¡ Social aspects of code development – how many different editors, code that is now ‘abandoned’

¡ Architecture and Project Management insights

18

Screenshot of Hotspot visualization taken from codescenehttps://codescene.io/projects/171/jobs/15343/results/code/hotspots/system-map

(Permission kindly granted by Adam Tornhill)

Page 19: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

THANK-YOU

¡ Hardware and software developers both develop code

¡ It’s a different mindset – but there are lessons from software that can be reused for hardware such as CI, Pair-programming/Gerrit, Refactoring, Complexity analysis, Bug Prediction, GIT analytics. Take some time to look at DevOps!

¡ Critical hardware bugs can be very costly (lith-masks, packaging, end-products), but so can software bugs in modern business-critical, high-availability and high-security platforms

¡ Successful teams will use data analytics to gain insights and apply improvements to reduce cost, shorten schedule and improve quality (less bugs!).

19

Page 20: ON THE ORIGIN OF BUGS - T&VS · Continuous Integration Trunk based development Code Refactoring (build in >20%) Integrate Performance Testing Test-Driven Development Pair Programming

20