data 101: fundamentals of data in gis
DESCRIPTION
September 2012 GIS ToT WebinarTRANSCRIPT
![Page 1: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/1.jpg)
Data 101
Fundamentals of data in a GIS
![Page 2: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/2.jpg)
Overview
Role of data
Data structures and schemas
Metadata
Linking data
Issues of confidentiality
![Page 3: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/3.jpg)
Review
![Page 4: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/4.jpg)
90 percent rule
90% Data Preparation
10% Mapping90% of the cost, time and effort will be devoted to data preparation
![Page 5: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/5.jpg)
90% Rule
Data Preparation Collecting
Cleaning
Validating
Formatting
Linking with other data
Mapping Map design
Categorization decisions
Production
![Page 6: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/6.jpg)
GIS analysis is only as strong as the data used.
![Page 7: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/7.jpg)
Strategies for strong data
Accuracy
Timlieness
Properly structured
Properly documented
![Page 8: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/8.jpg)
Data accuracy
Data should accurately reflect reality
In GIS there are two types of accuracy to be concerned with:
Spatial accuracy
Items located correctly
Attribute accuracy
Attributes are correct and properly linked to geography
![Page 9: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/9.jpg)
Spatial accuracy
Hotel Suryaa
Real Location
![Page 10: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/10.jpg)
Spatial Accuracy and Scale
Hotel Suryaa
![Page 11: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/11.jpg)
Attribute Accuracy
Is the data associated with the location accurate?
Is it linked to the right geographic entity?
![Page 12: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/12.jpg)
Attribute Accuracy
![Page 13: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/13.jpg)
Timeliness
Is the data for the time period of interest? Boundaries change
New features created
Features change
![Page 14: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/14.jpg)
Data Structure
Proper data structure is necessary in order to effectively use data
Software must know how to read the data, and query it.
The structure of the data is also known as data schema
![Page 15: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/15.jpg)
Data Schema
For most programs, data will need to be stored in a row and column format
GIS programs expect well formed data in the following schema:
One record per geographic unit
Geographic units don’t repeat in records
Variables are stored in columns
No blank cells unless data is missing
![Page 16: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/16.jpg)
Data Schema
Population China India United States
Indonesia
Total 1339724852 1210193422 312417000 237556363
Percent of World’s Population
19.23% 17.37% 4.48% 3.41%
Population Density
140/km2 368/km2 32/km2 121/km2
Poor data schema•Columns are geographic units•Variables are rows
![Page 17: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/17.jpg)
Blank Cells
Duplicate D
istrict Nam
es
![Page 18: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/18.jpg)
Proper Data Schema
One record per geographic unit
Columns are variables
![Page 19: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/19.jpg)
Metadata
Data about data
Provides information on:
Source of data
Who created it
When it was created
Coordinate system and datum
Usage and sharing restrictions
![Page 20: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/20.jpg)
Metadata
Metadata is especially important with spatial data because of issues of:
Spatial accuracy
Coordinate systems and datums
Confidentiality
Timeliness
![Page 21: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/21.jpg)
Metadata formats
International standard
ISO 9115
Mandatory elements
Schema for metadata
Countries may have their own national standards that are compatible with the ISO standard but provide extra elements
![Page 22: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/22.jpg)
Metadata Example
![Page 23: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/23.jpg)
Data Types
Text
Numeric
Coordinates
Programs assign variables to be a specific type which can affect the way the program handles data
![Page 24: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/24.jpg)
Data Types
Text
Arithmetic can not be conducted on values in text fields
Numeric
Arithmetic permitted
May require user to declare number of decimal places before entering data
This can be important when storing coordinates
![Page 25: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/25.jpg)
Linking data
Key field
The field that contains information common between tables
Tables are linked using the key field
Can’t link using key fields that are two different types
![Page 26: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/26.jpg)
District Population Male Pop Female Pop
North 24015 14409 9606
West 31154 16202 14952
South 62442 29972 32470
District Area (sq km)
North 243
West 310
South 602
District is the key field
District Population Male Pop Female Pop Area (sq km)
North 24015 14409 9606 243
West 31154 16202 14952 310
South 62442 29972 32470 602
![Page 27: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/27.jpg)
Linking data
Linking using text fields can be problematic
Variations in spelling
![Page 28: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/28.jpg)
District Population Male Pop Female Pop
North Kinley 24015 14409 9606
West 31154 16202 14952
South 62442 29972 32470
District Area (sq km)
N. Kinley 243
West 310
South 602
The two tables have different spellings for the district North Kinley
District Population Male Pop Female Pop Area (sq km)
West 31154 16202 14952 310
South 62442 29972 32470 602
![Page 29: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/29.jpg)
Linking data
Linking using numeric fields is often more reliable and less vulnerable to variations and other issues
Countries often use numeric codes for administrative units to get around problems with spelling variations
If standardized national codes exist, it is a good idea to include them in data National Bureau of Statistics or Census often
manage such codes
![Page 30: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/30.jpg)
District Dist code Population Male Pop Female Pop
North Kinley 100 24015 14409 9606
West 200 31154 16202 14952
South 300 62442 29972 32470
District Dist code Area (sq km)
N. Kinley 100 243
West 200 310
South 300 602
Dist code is the key field
District Dist Code Population Male Pop Female Pop
Area (sq km)
North 100 24015 14409 9606 243
West 200 31154 16202 14952 310
South 300 62442 29972 32470 602
![Page 31: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/31.jpg)
Advantage of numeric codes
Can manage hierarchy effectively
North District Code 100
District Province Code
North Coast 101
North Mountain 103
North Savanna 105
Savanna
Mountain
Coast
![Page 32: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/32.jpg)
Linking data key points
Key fields must be of the same type
Text fields can be problematic due to spelling variations
Numeric fields are often a more reliable key field
Unique geography codes, if available in a country is often the best option for making linkages
![Page 33: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/33.jpg)
Data and confidentiality issues
Important issue when working with spatial data
Discuss issues of confidentiality and spatial tools
Present strategies for protecting confidentiality
![Page 34: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/34.jpg)
Confidentiality
Protecting identity of individuals
Requirement
Informed consent agreements
Ethical research
![Page 35: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/35.jpg)
The act of explicitly making data available that breaches confidentiality commitments.
Overt disclosure
![Page 36: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/36.jpg)
Deductive Disclosure
45 year old female
45 year old female
45 year old female
Has 5 children
45 year old female
Has 5 children
45 year old female
Has 5 children
Works for General Electric in Delhi
45 year old female
Has 5 children
Works for General Electric in Delhi
28.67171, 77.21211
![Page 37: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/37.jpg)
Spatial Data
Overt disclosure
Makes deductive disclosure easier
![Page 38: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/38.jpg)
Geoprivacy
“[an] individual’s right to prevent disclosure of the location of one’s home, workplace, daily activities or trips.”
Protection of geoprivacy and accuracy of Spatial Information: How Effective are Geographical Masks?
Kwan, Casas, Schmitz
Cartographica, Vol 39, #2
![Page 39: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/39.jpg)
Four Principles
Protection of Confidentiality
Social-Spatial Linkage
Data Sharing
Data Preservation
Confidentiality and spatially explicit data: Concerns and challenges
VanWey, Rindfuss, Gutmann, Entwisle, Balk PNAS, vol. 102, no. 43
![Page 40: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/40.jpg)
1. Protection of Confidentiality
Fundamental to ethical research
Information that might lead to physical, emotional, financial or other harm
Protection of information that discloses identity
![Page 41: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/41.jpg)
2. Social-Spatial Linkage
All human activity takes place on earth
Understanding that adds context and perspective
Key to advancement of science
Essential for understanding the diffusion of behaviors
![Page 42: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/42.jpg)
3. Data Sharing
Essential on both scientific and financial grounds
Provide access to data for other researchers
Condition of funders
![Page 43: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/43.jpg)
4. Data Preservation
Data available in the future
How long should data be deemed “sensitive”?
When, if ever, can it be released
![Page 44: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/44.jpg)
Strategies
![Page 45: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/45.jpg)
Random Perturbations
Random shifting of point locations
Pros: Easy (relatively) to do
Cons: Lose original location, introduces error
![Page 46: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/46.jpg)
Affine Transformation
Change scale
Rotate
Shift a set distance
Combination
Pros: Easy to do
Cons: Easy to undo, can impact some types of analysis
![Page 47: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/47.jpg)
Aggregate
Point locations are aggregated to higher unit of analysis
Pros: Easy to do
Cons: Requires sufficient data points, Finer data variations will be lost
![Page 48: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/48.jpg)
Despatialize
Remove Coordinate System
Use Euclidean space
Pros: Simple, keeps relative position and placement
Cons: Loses contextual data
![Page 49: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/49.jpg)
Nothing
Do not collect or release data
Cold room or on-site analysis only
Pros: Maintains all of the original spatial data
Cons: Complicated, limits data sharing, limits social-spatial link
![Page 50: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/50.jpg)
![Page 51: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/51.jpg)
“Ignoring is unacceptable”
Can get lost in the excitement about GIS
Those who collect data must think about the confidentiality issues
Data users must also think about how their analysis may increase the risk of deductive disclosure.
![Page 52: Data 101: Fundamentals of Data in GIS](https://reader036.vdocuments.us/reader036/viewer/2022081413/5459589ab1af9f37608b561e/html5/thumbnails/52.jpg)
Key points
Confidentiality issues arise when spatial context is included in data.
It’s important to protect confidentiality. People have an expectation that their identities are protected.
There are strategies that can preserve confidentiality, but there is no “one-size-fits-all solution”