evaluation of an automated pavement distress ... daleiden.pdf · pavement distress identification...
TRANSCRIPT
Evaluation of an Automated Pavement Distress Identification and Quantification Application
Jerome Daleiden, Nima Kargah-Ostadi (Fugro)Abdenour Nazef (Florida DOT)
Road Profile User’s Group (RPUG) ConferenceNovember 3, 2016
San Diego, CA
this presentation will address:
• background on distress survey methods• framework for evaluating existing methodologies• evaluation results for FDOT rigid pavement surveys• context‐sensitive gap analysis • recommendations
2
Pavement Distress Survey Methods
• Manual (windshield or walking)• Safety concerns• Agreement issues among raters
• Semi‐Automated (manual review of images)• No safety concerns• Agreement issues among raters
• Automated (detection and classification software)• Set reasonable accuracy goals and QA procedure• Still requires post‐survey QC
3
Existing FDOT Rigid Pavement Survey Protocol• Transverse Cracking (count), Light‐Moderate‐Severe• Longitudinal Cracking (count), Light‐Moderate‐Severe• Spalling (linear feet), Moderate‐Severe• Corner Cracking (count), Light‐Moderate‐Severe• Patching (sq. yards), Fair‐ Poor• Shattered Slabs (count), Moderate‐Severe• Surface Deterioration (sq. feet), Moderate‐Severe• Pumping (percentage range: Code 1 to 4), Light‐Moderate‐Severe
• Joint Condition, partially sealed, not sealed• Multiple Cracked, Slabs count
4
Distress Identification Workshop
• Increase consistency among raters• Notes (clarifications) on existing protocol• 80% agreement among raters (3) for total transverse cracking
• 63% agreement for total longitudinal cracking• 65% agreement for spalling• More agreement in total amount of each distress type than on distress severity levels
5
Evaluation Framework2) Image Benchmarking
and Validation
1) Select 12 Representative Rigid Test Sections and
Review Distress Protocol
4-1) Comparison of Cumulative Distress: AccuracyPrecisionRepeatabilityReproducibilityEfficiency
3-1) Manual Windshield Survey by
FDOT Raters:DisparityAgreementReproducibilityConsensus
3-2) Available Automated
Application(s)
3-3) Manual Survey of Collected Images by
FDOT Raters:DisparityAgreementReproducibilityReference Crack Map
4-2) Verification of Identified Distress: True Positives, False Positives, False NegativesValiditySensitivityClassification
5) Gap Analysis
6) Design Recommendations:SoftwareHardwareProtocols
6
Image Quality Validation
• Image Properties: resolution, exposure, dynamic range, white balance
• Image Issues: alignment of control lines, image streaks
• Image Feature Capturing: optical distortion, signal‐to‐noise ratio
• Hardware: distance measuring accuracy, latitude‐longitude accuracy, platform stability
• Environmental Effects: lighting conditions, temperature, humidity, wind, vehicle speed
• AASHTO PP68 depends on crack detection software
7
Optimum Software Settings
1) Image Pre‐Processing: Applying Filters2) Detection (find optimum parameters)3) Classification (and grouping)4) Rating Distress Type and Severity
8
Detection Classification Rating
Automated Survey Steps
9
Automated Rating
• Transverse Joints• Longitudinal Joints• Issues with skewed joints and transition areas• Transverse Cracks• Longitudinal Cracks• Cracks per slab• Grouping and counting cracks
10
Evaluation Framework
11
2) Image Benchmarking and Validation
1) Select 12 Representative Rigid Test Sections and
Review Distress Protocol
4-1) Comparison of Cumulative Distress: AccuracyPrecisionRepeatabilityReproducibilityEfficiency
3-1) Manual Windshield Survey by
FDOT Raters
3-2) Available Automated
Application(s)
3-3) Manual Survey of Collected Images by
FDOT Raters
4-2) Verification of Identified Distress: True Positives, False Positives, False NegativesValiditySensitivityClassification
5) Gap Analysis 6) Design Recommendations
Success Metrics
• Effectiveness: Accuracy• Efficiency: Speed• Reliability
• Precision: variation among different sections• Reproducibility: variation among raters• Repeatability: variation among runs
Reference values
12
AccuracyBias (average normalized error) compared to a reference rating
spalling bias was over 100% 13
0102030405060708090100
TransverseCracking
LongitudinalCracking
Spalling Corner Cracks ShatteredSlabsAc
curacy (%
)= 100
–Ab
solute Bias
Manual Field Survey Semi‐Automated Rating Automated Rating (length)
PrecisionVariation of error among sections
spalling variation was above 100% 14
0102030405060708090100
TransverseCracking
LongitudinalCracking
Spalling CornerCracks
ShatteredSlabs
Stan
dard deviatio
n of normalize
d error (%) a
mon
g sections
Manual Field Survey Semi‐Automated Rating Automated Rating (length)
ReliabilityVariation of error among raters/runs
15
0102030405060708090100
TransverseCracking
LongitudinalCracking
Spalling CornerCracks
ShatteredSlabs
Agreem
ent A
mon
g Ra
ters/Run
s (%
) = AVG
(100
–CO
V (%
))
Manual Field Survey Semi‐Automated Rating Automated Rating
“Outlier” RatingsA measure of rating reproducibility/repeatability
16
051015202530354045
TransverseCracking
LongitudinalCracking
Spalling CornerCracks
ShatteredSlabs
Percen
tage of R
atings M
ore Th
an 1
Stan
dard Deviatio
n Aw
ay from
Av
erage
Manual Field Survey Semi‐Automated Rating Automated Rating
Efficiency
17
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
1 1 2 3 3 4 5 5 6 7 7 8 9 9 10 11 11 12
Survey Spe
ed (m
ph)
Test Section
FDOT Semi‐Automated Fugro Semi‐Automated Automated
Verification of Detected Distress
18
0
20
40
60
80
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24Reference Values (ft) True Positives (ft)Correctly Rated True Positives (ft) False Positives (ft)False Negatives (Missed Cracks) (ft)
Results on 24 random image frames
Verification of Detected Distress
Statistic
Length (ft) Count
Ground Truth
True PositiveFalse
PositiveFalse
NegativeReference
Rating Result
Correctly Rated
Incorrectly Rated
AVG 25.06 18.68 2.76 8.35 3.70 2.42 6.42STD 12.49 12.38 5.72 11.25 3.34 1.41 3.12MIN 6.58 0.00 0.00 0.00 0.00 1.00 2.00MAX 48.97 43.381 20.202 44.74 12.25 7.00 14.00
StatisticDistress Validity (or
Precision)Distress Sensitivity (or
Recall)Distress Classification
PerformanceAVG 78% 84% 86%STD 21% 14% 28%MIN 19% 51% 0%MAX 100% 100% 100%
19
Average results on 24 random image frames
Key Observations• semi‐automated results show higher accuracy and precision, but lower efficiency and less agreement among raters compared to the manual surveys
• automated survey was more successful in identifying transverse cracks than longitudinal cracks
• 78% of the detected distress was actually present in the reference survey
• Automated method detected 84% of reference survey distresses
• 86% of the detected distress is correctly classified by the automated algorithm
20
Gap Analysis
21
Category Gap Recommended SolutionHuman Random Errors High variation of rating results among
test sections N/A
Human Systematic Errors
High bias (average error) and high variation of rating results among
multiple raters
Review and/or revise distress protocols
Software Systematic Errors
High bias in longitudinal cracking amount (high number of false positives) Joint detection plugin and plugin for
separating stripesHigh variation of error among multiple
test sections
Issue with crack counts Improve crack grouping
Not rating corner cracks Corner crack plugin
Not rating shattered slabs Shattered slab pluginIssue with crack width determination
and severity ratingDo not use filters moving forward so that crack width can be measured
Hardware Limitations distresses such as spalling or patching cannot be detected without 3D data Evaluate 3D data
Design Recommendations
1. Develop plugin for transverse and longitudinal joint detection
2. Develop plugin for lane marking detection3. Plugin to improve crack grouping and count per
slab4. Plugin to classify and rate corner cracks5. Plugin to classify and rate shattered slabs
22
Summary• Framework for evaluation of different distress survey methods
• On cumulative amount of distress• Accuracy (bias)• Precision (variation among sections)• Reproducibility/repeatability (variation among raters/runs)• Efficiency (speed)
• Distress‐by‐distress verification• Validity = true positives / (true positives + false positives)• Sensitivity = true positive / (true positive +false negative)• Classification performance = correctly classified / true positives
• Gap Analysis• Design Recommendations
23