make your gis work for you, first step: data quality - esri
TRANSCRIPT
Make Your GIS Work for You,
First Step: Data Quality
Presented by:
Scott Sumners, GIS Manager, City of Brentwood
James McCord, GIS Analyst, City of Brentwood
Gerardo Boquin, GISP /CH2M
Brentwood GIS
1995 2016
• CAD to GIS
• Aerial Photography
• Topographic base maps
2006
GIS Dept.
(Staff and Interns)
2012
• Water Billing
• Asset Mang / Vueworks
• Schema Changes
• Added new layers
• Added subtypes and domains
New Imagery
2009 2015
Topics
• Why is Data Quality important?
• Brentwood GIS Case Study
• Introduction to Data Reviewer
• Initial Findings
• Data Quality Assurance Plan (Methodology)
• Running Data Reviewer (Lessons Learned)
• Geometric Networks (Lessons Learned)
• Fixing the data
• The results
• The unperceived benefits
Why is Data Quality Important?
• Data needs to be:
- Accessible
- Reliable
- Spatially accurate
- Descriptive
• “As a result, water, wastewater, and stormwater utilities
are now heavily focusing on quality assurance (QA) and
quality control (QC) to ensure that their GIS data truly
meets their needs.”
ESRI White Paper: GIS Data Quality Best Practices for Water, Wastewater and Stormwater Utilities July 2011
Why is Data Quality Important?
• Summary of tools described in paper:
- Geodatabases
- A good data model
- Versioned Environment (multiple editors)
- Geometric Networks
- Model Builder/ Python
- ArcGIS Data Reviewer
- Production Mapping
• Workflow Manager
ESRI White Paper: GIS Data Quality Best Practices for Water, Wastewater and Stormwater Utilities July 2011
Why is Data Quality Important?
• Positional Accuracy
• Topological Logic
• Geometric Data Considerations
• Projections and Coordinate Systems
• Attribute and Data Structure
https://www.linkedin.com/pulse/gis-gigo-garbage-out-30-checks-data-errors-nathan-heazlewood
Why is Data Quality Important?
• It is very expensive to have dirty-bad or incomplete data
- Data is not reliable
- Tools don’t work correctly
- Creating custom tools is very time consuming
- Trouble shooting data errors
Brentwood GIS Case Study
• GIS Workflow
- Multiple editors
- Typical QA/QC consisted on visual checks
- No issues = no problems
• Until…
- CH2M ran a courtesy check on sewer GIS data using
ArcGIS Data Reviewer
- Overall GIS health was good.
- Data Reviewer pointed out areas where business needs were not met
100%.
Shocking introduction to Data Reviewer in 2015
Consultant calls to say:
“I ran Data Reviewer on your data and discovered that
YOUR data has lots of ISSUES”
Initial Findings
• Issues Found (52,218) Sewer Dataset only:
- Non compliant domains
- Duplicate geometries
- Multipart geometries
- Duplicate ID’s
Initial Findings
• Geometric Network
Issue Solution
Missing Assets Add Taps and Tee’s
Flow Trim/Extend Pipe
Connectivity Snap features
Initial Findings
• 1- Missing assets
• 2- Pipes had wrong flow direction
• 3- Pipes connected to wrong asset
Disconnected assets
1 2
3
Data Quality Assurance (Methodology)
• Established a Data Quality Plan
- Consisted on 4 pre-configured batch files addressing:
1. Invalid geometries
2. Duplicate ID’s
3. Duplicate Geometries
4. Domain and Subtype Validation
- Geometric Network
1. Checking all features
2. Customized for critical features
Assets
1Flow Monitors
2Grease_Interceptors
3GrinderPumps
4LiftStation
5Manholes
6SewerAirReleaseValve
7SewerControlValve
8SewerFitting
9SewerGravityPipe
10SewerLateralPipe
11SewerPressurizedPipe
12SewerService
13SewerServiceValve
Data Quality Assurance (Methodology)
• CH2M provided hands on Data Reviewer training
• City of Brentwood self performed data clean up
• City of Brentwood submitted copy of the Sewer and Water
feature datasets every 3 - 4 weeks for review
• CH2M Reviewed all datasets consistently and compiled
information for presentation
Running Data Reviewer (Lessons Learned)
1. Always perform a check for invalid geometries prior to
perform other checks.
2. Create batch files containing custom checks
3. Create multiple batch files and group them logically
4. Create separate database to store QAQC results
Running Data Reviewer (Lessons Learned)
• More on Invalid Geometries:
- In general, invalid geometries are the worst error to fix.
- It can cause tools to crash or produce incomplete results.
https://blogs.esri.com/esri/arcgis/2012/03/28/invalid-geometry-check-explained/
Does this sound familiar?
Geometric Networks (Lessons Learned)
• Does not support features with M or Z values
- This might require a data model change
• 10.2 geometric networks didn’t accept versioned features
- Created database replicas to perform geometric network
fixes
• 10.3 accepts versioned features
http://resources.arcgis.com/en/help/main/10.2/index.html#/in_ArcCatalog/002r00000009000000/
http://desktop.arcgis.com/en/arcmap/10.3/manage-data/geometric-networks/geometric-networks-and-versioned-geodatabases.htm
Fixing the data
• Systematically mining the data by ObjectID
• Organizing the data
Fixing the data
Results
29965
30120
30887 30873
71718
61216
21 20 0
10000
20000
30000
40000
50000
60000
70000
80000
29400
29600
29800
30000
30200
30400
30600
30800
31000
Base R1 R2 R3
Number Of Features
Validation
GIS Data Vrs. Domain/ Subtype ErrorsN
um
be
r o
f F
ea
ture
s
Nu
mb
er
of
Err
ors
• Schema changes
- Removed unnecessary fields
- Added new domain values to fill in the Nulls
• Used Data Reviewer
- Identify non complaint values and fixed them
- Sorted fields Ascending (typed values were on top)
• Tackle one attribute problem at a time
- Worked on resolving Invalid values first
- Address Null values second
Results
Results
29965 30120
30887 308731489
1010 1083
927
0
200
400
600
800
1000
1200
1400
1600
29400
29600
29800
30000
30200
30400
30600
30800
31000
Base R1 R2 R3
Number Of Features
DuplicateIDs
GIS Data Vrs. Duplicate ID ErrorsN
um
be
r o
f F
ea
ture
s
Nu
mb
er
of
Err
ors
• Created a Python Script to assign new ID’s automatically
• Unique ID Check helped identify Backflow Control Valves
that didn’t have an ID assigned to it in the backflow control
database
• Both Backflow control Engineer and GIS Analyst were
aware of the issue. GIS Analyst followed up with Engineer
to assign ID’s.
• Brentwood gets audited once a year by the state and not
having unique ID’s that correlated to backflow control
tables would have cause a reason to fail the audit.
Results
Results
29965
30120
30887 30873463 458
142
23
0
50
100
150
200
250
300
350
400
450
500
29400
29600
29800
30000
30200
30400
30600
30800
31000
Base R1 R2 R3
Number Of Features
GeometricErrors
GIS Data Vrs. Geometric ErrorsN
um
be
r o
f F
ea
ture
s
Nu
mb
er
of
Err
ors
The unperceived benefits
1. Integrated systems work correctly
a) Data that is part of a business process needs to be
properly maintained
2. Robust datasets
a) Source of truth
b) Better reporting and access to more spatial analysis
capabilities
3. QA/QC Industry Best Practices
4. Faster tool development
a) Standardized data = less time spent developing workflow to
help trouble shoot analysis work flows
Conclusions
• Data is dynamic and maintaining Data Quality is a continuous
process
• Its expensive to have dirty / incomplete data
• ArcGIS Data Reviewer is easily customizable to:
- Check Data (Batch or single check)
- Data Health Reports
• A combination of visual and computerized checks is the best
formula to maintain Data Quality
Thank you for your time.
• Scott Sumners: [email protected]
• James McCord: [email protected]
• Gerardo Boquin: [email protected]