evaluation of an automated pavement distress ... daleiden.pdf · pavement distress identification...

Evaluation of an Automated Pavement Distress Identification and Quantification Application

Jerome Daleiden, Nima Kargah-Ostadi (Fugro)Abdenour Nazef (Florida DOT)

Road Profile User’s Group (RPUG) ConferenceNovember 3, 2016

San Diego, CA

this presentation will address:

• background on distress survey methods• framework for evaluating existing methodologies• evaluation results for FDOT rigid pavement surveys• context‐sensitive gap analysis • recommendations

2

Pavement Distress Survey Methods

• Manual (windshield or walking)• Safety concerns• Agreement issues among raters

• Semi‐Automated (manual review of images)• No safety concerns• Agreement issues among raters

• Automated (detection and classification software)• Set reasonable accuracy goals and QA procedure• Still requires post‐survey QC

3

Existing FDOT Rigid Pavement Survey Protocol• Transverse Cracking (count), Light‐Moderate‐Severe• Longitudinal Cracking (count), Light‐Moderate‐Severe• Spalling (linear feet), Moderate‐Severe• Corner Cracking (count), Light‐Moderate‐Severe• Patching (sq. yards), Fair‐ Poor• Shattered Slabs (count), Moderate‐Severe• Surface Deterioration (sq. feet), Moderate‐Severe• Pumping (percentage range: Code 1 to 4), Light‐Moderate‐Severe

• Joint Condition, partially sealed, not sealed• Multiple Cracked, Slabs count

4

Distress Identification Workshop

• Increase consistency among raters• Notes (clarifications) on existing protocol• 80% agreement among raters (3) for total transverse cracking

• 63% agreement for total longitudinal cracking• 65% agreement for spalling• More agreement in total amount of each distress type than on distress severity levels

5

Evaluation Framework2) Image Benchmarking

and Validation

1) Select 12 Representative Rigid Test Sections and

Review Distress Protocol

4-1) Comparison of Cumulative Distress: AccuracyPrecisionRepeatabilityReproducibilityEfficiency

3-1) Manual Windshield Survey by

FDOT Raters:DisparityAgreementReproducibilityConsensus

3-2) Available Automated

Application(s)

3-3) Manual Survey of Collected Images by

FDOT Raters:DisparityAgreementReproducibilityReference Crack Map

4-2) Verification of Identified Distress: True Positives, False Positives, False NegativesValiditySensitivityClassification

5) Gap Analysis

6) Design Recommendations:SoftwareHardwareProtocols

6

Image Quality Validation

• Image Properties: resolution, exposure, dynamic range, white balance

• Image Issues: alignment of control lines, image streaks

• Image Feature Capturing: optical distortion, signal‐to‐noise ratio

• Hardware: distance measuring accuracy, latitude‐longitude accuracy, platform stability

• Environmental Effects: lighting conditions, temperature, humidity, wind, vehicle speed

• AASHTO PP68 depends on crack detection software

7

Optimum Software Settings

1) Image Pre‐Processing: Applying Filters2) Detection (find optimum parameters)3) Classification (and grouping)4) Rating Distress Type and Severity

8

Detection Classification Rating

Automated Survey Steps

9

Automated Rating

• Transverse Joints• Longitudinal Joints• Issues with skewed joints and transition areas• Transverse Cracks• Longitudinal Cracks• Cracks per slab• Grouping and counting cracks

10

Evaluation Framework

11

2) Image Benchmarking and Validation

1) Select 12 Representative Rigid Test Sections and

Review Distress Protocol

4-1) Comparison of Cumulative Distress: AccuracyPrecisionRepeatabilityReproducibilityEfficiency

3-1) Manual Windshield Survey by

FDOT Raters

3-2) Available Automated

Application(s)

3-3) Manual Survey of Collected Images by

FDOT Raters

4-2) Verification of Identified Distress: True Positives, False Positives, False NegativesValiditySensitivityClassification

5) Gap Analysis 6) Design Recommendations

Success Metrics

• Effectiveness: Accuracy• Efficiency: Speed• Reliability

• Precision: variation among different sections• Reproducibility: variation among raters• Repeatability: variation among runs

Reference values

12

AccuracyBias (average normalized error) compared to a reference rating

spalling bias was over 100% 13

0102030405060708090100

TransverseCracking

LongitudinalCracking

Spalling Corner Cracks ShatteredSlabsAc

curacy (%

)= 100

–Ab

solute Bias

Manual Field Survey Semi‐Automated Rating Automated Rating (length)

PrecisionVariation of error among sections

spalling variation was above 100% 14

0102030405060708090100

TransverseCracking


Spalling CornerCracks

ShatteredSlabs

Stan

dard deviatio

n of normalize

d error (%) a

mon

g sections

Manual Field Survey Semi‐Automated Rating Automated Rating (length)

ReliabilityVariation of error among raters/runs

15

0102030405060708090100

TransverseCracking



ShatteredSlabs

Agreem

ent A

mon

g Ra

ters/Run

s (%

) = AVG

(100

–CO

V (%

))

Manual Field Survey Semi‐Automated Rating Automated Rating

“Outlier” RatingsA measure of rating reproducibility/repeatability

16

051015202530354045

TransverseCracking



ShatteredSlabs

Percen

tage of R

atings M

ore Th

an 1

Stan

dard Deviatio

n Aw

ay from

Av

erage

Manual Field Survey Semi‐Automated Rating Automated Rating

Efficiency

17

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

1 1 2 3 3 4 5 5 6 7 7 8 9 9 10 11 11 12

Survey Spe

ed (m

ph)

Test Section

FDOT Semi‐Automated Fugro Semi‐Automated Automated

Verification of Detected Distress

18

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24Reference Values (ft) True Positives (ft)Correctly Rated True Positives (ft) False Positives (ft)False Negatives (Missed Cracks) (ft)

Results on 24 random image frames

Verification of Detected Distress

Statistic

Length (ft) Count

Ground Truth

True PositiveFalse

PositiveFalse

NegativeReference

Rating Result

Correctly Rated

Incorrectly Rated

AVG 25.06 18.68 2.76 8.35 3.70 2.42 6.42STD 12.49 12.38 5.72 11.25 3.34 1.41 3.12MIN 6.58 0.00 0.00 0.00 0.00 1.00 2.00MAX 48.97 43.381 20.202 44.74 12.25 7.00 14.00

StatisticDistress Validity (or

Precision)Distress Sensitivity (or

Recall)Distress Classification

PerformanceAVG 78% 84% 86%STD 21% 14% 28%MIN 19% 51% 0%MAX 100% 100% 100%

19

Average results on 24 random image frames

Key Observations• semi‐automated results show higher accuracy and precision, but lower efficiency and less agreement among raters compared to the manual surveys

• automated survey was more successful in identifying transverse cracks than longitudinal cracks

• 78% of the detected distress was actually present in the reference survey

• Automated method detected 84% of reference survey distresses

• 86% of the detected distress is correctly classified by the automated algorithm

20

Gap Analysis

21

Category Gap Recommended SolutionHuman Random Errors High variation of rating results among

test sections N/A

Human Systematic Errors

High bias (average error) and high variation of rating results among

multiple raters

Review and/or revise distress protocols

Software Systematic Errors

High bias in longitudinal cracking amount (high number of false positives) Joint detection plugin and plugin for

separating stripesHigh variation of error among multiple

test sections

Issue with crack counts Improve crack grouping

Not rating corner cracks Corner crack plugin

Not rating shattered slabs Shattered slab pluginIssue with crack width determination

and severity ratingDo not use filters moving forward so that crack width can be measured

Hardware Limitations distresses such as spalling or patching cannot be detected without 3D data Evaluate 3D data

Design Recommendations

1. Develop plugin for transverse and longitudinal joint detection

2. Develop plugin for lane marking detection3. Plugin to improve crack grouping and count per

slab4. Plugin to classify and rate corner cracks5. Plugin to classify and rate shattered slabs

22

Summary• Framework for evaluation of different distress survey methods

• On cumulative amount of distress• Accuracy (bias)• Precision (variation among sections)• Reproducibility/repeatability (variation among raters/runs)• Efficiency (speed)

• Distress‐by‐distress verification• Validity = true positives / (true positives + false positives)• Sensitivity = true positive / (true positive +false negative)• Classification performance = correctly classified / true positives

• Gap Analysis• Design Recommendations

23

evaluation of an automated pavement distress ... daleiden.pdf · pavement distress identification...

Documents