1 peter fox xinformatics week 10, april 2, 2012 worked example
TRANSCRIPT
1
Peter Fox
Xinformatics
Week 10, April 2, 2012
Worked example
Contents• Review of last class, reading
• Information quality, uncertainty (again) and bias (again) in a project setting.
• Your projects?
• Last few classes
2
The semantics of data and information quality, uncertainty and bias and
applications in atmospheric science
Peter Fox, and …
Stephan Zednik1, Gregory Leptoukh2,
Chris Lynnes2, Jianfu Pan3
1. Tetherless World Constellation, Rensselaer Polytechnic Inst.
2. NASA Goddard Space Flight Center, Greenbelt, MD, United States
3. Adnet Systems, Inc.
Definitions (Atmospheric)
• Quality– Is in the eyes of the beholder – worst case scenario… or a good challenge
• Uncertainty– has aspects of accuracy (how accurately the real world situation is
assessed, it also includes bias) and precision (down to how many digits)
• Bias has two aspects:– Systematic error resulting in the distortion of measurement data
caused by prejudice or faulty measurement technique
– A vested interest, or strongly held paradigm or condition that may skew the results of sampling, measuring, or reporting the findings of a quality assessment:
• Psychological: for example, when data providers audit their own data, they usually have a bias to overstate its quality.
• Sampling: Sampling procedures that result in a sample that is not truly representative of the population sampled. (Larry English)
4
Acronyms
ACCESS Advancing Collaborative Connections for Earth System Science
ACE Aerosol-Cloud-Ecosystems
AGU American Geophysical Union
AIST Advanced Information Systems Technology
AOD Aerosol Optical Depth
AVHRR Advanced Very High Resolution Radiometer
GACM Global Atmospheric Composition Mission
GeoCAPE Geostationary Coastal and Air Pollution Events
GEWEX Global Energy and Water Cycle Experiment
GOES Geostationary Operational Environmental Satellite
GOME-2 Global Ozone Monitoring Experiment-2
JPSS Joint Polar Satellite System
LST Local Solar Time
MDSA Multi-sensor Data Synergy Advisor
MISR Multiangle Imaging SpectroRadiometer
MODIS Moderate Resolution Imaging Spectraradiometer
NPP National Polar-Orbiting Operational Environmental Satellite System Preparatory Project
Acronyms (cont.)
OMI Ozone Monitoring Instrument
OWL Web Ontology Language
PML Proof Markup Language
QA4EO QA for Earth Observations
REST Representational State Transfer
TRL Technology Readiness Level
UTC Coordinated Universal Time
WADL Web Application Description Language
XML eXtensible Markup Language
XSL eXtensible Stylesheet Language
XSLT XSL Transformation
Data quality needs: fitness for purpose
• Measuring Climate Change:– Model validation: gridded contiguous data with uncertainties– Long-term time series: bias assessment is the must , especially
sensor degradation, orbit and spatial sampling change
• Studying phenomena using multi-sensor data:– Cross-sensor bias is needed
• Realizing Societal Benefits through Applications:– Near-Real Time for transport/event monitoring - in some cases,
coverage and timeliness might be more important that accuracy– Pollution monitoring (e.g., air quality exceedance levels) – accuracy
• Educational (users generally not well-versed in the intricacies of quality; just taking all the data as usable can impair educational lessons) – only the best products
Where are we in respect to this data challenge?
“The user cannot find the data;
If he can find it, cannot access it;
If he can access it, ;
he doesn't know he doesn't know how goodhow good they are; they are;
if he finds them good, he can not if he finds them good, he can not mergemerge them with other data”them with other data”
The Users View of IT, NAS 1989
8
Level 2 data
9
Level 2 data
• Swathfor MISR, orbit 192 (2001)
10
Level 3 data
11
MODIS vs. MERIS
Same parameter Same space & time
Different results – why?
MODIS MERIS
A threshold used in MERIS processing effectively excludes high aerosol values. Note: MERIS was designed primarily as an ocean-color instrument, so aerosols are “obstacles” not signal.
Why so difficult?
• Quality is perceived differently by data providers and data recipients.
• There are many different qualitative and quantitative aspects of quality.
• Methodologies for dealing with data qualities are just emerging
• Almost nothing exists for remote sensing data quality• Even the most comprehensive review Batini’s book
demonstrates that there are no preferred methodologies for solving many data quality issues
• Little funding was allocated in the past to data quality as the priority was to build an instrument, launch a rocket, collect and process data, and publish a paper using just one set of data.
• Each science team handled quality differently.
Spatial and temporal sampling – how to quantify to make it useful for modelers?
• Completeness: MODIS dark target algorithm does not work for deserts• Representativeness: monthly aggregation is not enough for MISR and
even MODIS• Spatial sampling patterns are different for MODIS Aqua and MISR Terra:
“pulsating” areas over ocean are oriented differently due to different orbital direction during day-time measurement Cognitive bias
MODIS Aqua AOD July 2009 MISR Terra AOD July 2009
Quality Control vs. Quality Assessment
• Quality Control (QC) flags in the data (assigned by the algorithm) reflect “happiness” of the retrieval algorithm, e.g., all the necessary channels indeed had data, not too many clouds, the algorithm has converged to a solution, etc.
• Quality assessment is done by analyzing the data “after the fact” through validation, intercomparison with other measurements, self-consistency, etc. It is presented as bias and uncertainty. It is rather inconsistent and can be found in papers, validation reports all over the place.
Challenges in dealing with Data Quality
• Q: Why now? What has changed? • A: With the recent revolutionary progress in data
systems, dealing with data from many different sensors finally has become a reality.
Only now, a systematic approach to remote sensing quality is on the table.
• NASA is beefing up efforts on data quality.• ESA is seriously addressing these issues.• QA4EO: an international effort to bring communities
together on data quality.• Many information and computer science research
questions
16
Data from multiple sources to be used together:• Current sensors/missions: MODIS, MISR, GOES, OMI. • Future decadal missions: ACE, NPP, JPSS, Geo-CAPE • European and other countries’ satellites• Models
Harmonization needs:• It is not sufficient just to have the data from different sensors and
their provenances in one place• Before comparing and fusing data, things need to be harmonized:
• Metadata: terminology, standard fields, units, scale • Data: format, grid, spatial and temporal resolution,
wavelength, etc.• Provenance: source, assumptions, algorithm, processing steps• Quality: bias, uncertainty, fitness-for-purpose, Quality: bias, uncertainty, fitness-for-purpose,
validationvalidation
Dangers of easy data access without proper assessment of the joint data usage - It is easy to use data incorrectly
Intercomparison of data from multiple sensors
17
Anomaly Example: South Pacific Anomaly
Anomaly
18
MODIS Level 3 dataday definition leads to artifact in correlation
…is caused by an Overpass Time Difference
19
Different kinds of reported data quality
• Pixel-level Quality: algorithmic guess at usability of data point– Granule-level Quality: statistical roll-up of Pixel-level Quality
• Product-level Quality: how closely the data represent the actual geophysical state
• Record-level Quality: how consistent and reliable the data record is across generations of measurements
Different quality types are often erroneously assumed having the same meaning
Ensuring Data Quality at these different levels requires different focus and action
Sensitivity of Aerosol and Chl Relationship to Data-Day Definition
Correlation Coefficients MODIS AOT at 550nm and SeaWiFS Chl
Difference between Correlation of A and B:A: MODIS AOT of LST and SeaWiFS ChlB: MODIS AOT of UTC and SeaWiFS Chl
Artifact: difference between using LST and the calendar
UTC-based dataday
Artifact: difference between using LST and the calendar
UTC-based dataday
Correlation between MODIS Aqua AOD (Ocean group product) and MODIS-Aqua AOD (Atmosphere group product)
Pixel Count distribution
Only half of the Data Day artifact is present because the Ocean Group uses the better
Data Day definition!
Sensitivity Study: Effect of the Data Day definition on Ocean Color data correlation with
Aerosol data
Factors contributing to uncertainty and bias in Level 2
• Physical: instrument, retrieval algorithm, aerosol spatial and temporal variability…
• Input: ancillary data used by the retrieval algorithm
• Classification: erroneous flagging of the data • Simulation: the geophysical model used for the
retrieval • Sampling: the averaging within the retrieval
footprint Borrowed from the SST study on error budget
23
What is Level 3 quality?
It is not defined in Earth Science….• If Level 2 errors are known, the corresponding Level
3 error can be computed, in principle• Processing from L2L3 daily L3 monthly may
reduce random noise but can also exacerbate systematic bias and introduce additional sampling bias
• However, at best, standard deviations (mostly reflecting variability within a grid box), and sometimes pixel counts and quality histograms are provided
• Convolution of natural variability with sensor/retrieval uncertainty and bias – need to understand their relative contribution to differences between data
• This does not address sampling bias
Why can’t we just apply L2 quality to L3?
Aggregation to L3 introduces new issues where aerosols co-vary with some observing or
environmental conditions – sampling bias:• Spatial: sampling polar areas more than equatorial• Temporal: sampling one time of a day only (not
obvious when looking at L3 maps)
• Vertical: not sensitive to a certain part of the atmosphere thus emphasizing other parts
• Contextual: bright surface or clear sky bias • Pixel Quality: filtering or weighting by quality may
mask out areas with specific features
Addressing Level 3 data “quality”
• Terminology: Quality, Uncertainty, Bias, Error budget, etc.• Quality aspects (examples):
–Completeness:• Spatial (MODIS covers more than MISR)• Temporal (Terra mission has been longer in space than Aqua)• Observing Condition (MODIS cannot measure over sun glint while MISR can)
–Consistency:• Spatial (e.g., not changing over sea-land boundary)• Temporal (e.g., trends, discontinuities and anomalies)• Observing Condition (e.g., exhibit variations in retrieved measurements due to the
viewing conditions, such as viewing geometry or cloud fraction)
–Representativeness:• Neither pixel count nor standard deviation fully express representativeness of the grid
cell value
– Data Quality Glossary development: http://tw.rpi.edu/web/project/MDSA/Glossary
26
More terminology• ‘Even a slight difference in terminology can lead to
significant differences between data from different sensors. It gives an IMPRESSION of data being of bad quality while in fact they measure different things. For example, MODIS and MISR definitions of the aerosol "fine mode" is different, so the direct comparison of fine modes from MODIS and MISR does not always give good correlation.’
• Ralph Kahn, MISR Aerosol Lead.
27
Three projects with data quality flavor
• Multi-sensor Data Synergy Advisor (**)– Product level
• Data Quality Screening Service– Pixel level
• Aerosol Status– Record level
28
Multi-Sensor Data Synergy Advisor (MDSA)
• Goal: Provide science users with clear, cogent information on salient differences between data candidates for fusion, merging and intercomparison
–Enable scientifically and statistically valid conclusions
• Develop MDSA on current missions:
– NASA - Terra, Aqua, (maybe Aura)
• Define implications for future missions
29
How MDSA works?
MDSA is a service designed to characterize the differences between two datasets and advise a user (human or machine) on the advisability of combining them.
• Provides the Giovanni online analysis tool • Describes parameter and products• Documents steps leading to the final data product• Enables better interpretation and utilization of parameter
difference and correlation visualizations. • Provides clear and cogent information on salient differences
between data candidates for intercomparison and fusion. • Provides information on data quality• Provides advice on available options for further data
processing and analysis. 30
Giovanni Earth Science Data Visualization & Analysis Tool
• Developed and hosted by NASA/ Goddard Space Flight Center (GSFC)
• Multi-sensor and model data analysis and visualization online tool
• Supports dozens of visualization types
• Generate dataset comparisons
• ~1500 Parameters
• Used by modelers, researchers, policy makers, students, teachers, etc. 31
Web-based tools like Giovanni allow scientists to compress
the time needed for pre-science preliminary tasks:
data discovery, access, manipulation, visualization,
and basic statistical analysis.
DO SCIENCE
Submit the paper
Minutes
Web-based Services:
Perform filtering/masking
Find data Retrieve high volume data
Extract parameters
Perform spatial and other subsetting
Identify quality and other flags and constraints
Develop analysis and visualization
Accept/discard/get more data (sat, model, ground-based)
Learn formats and develop readers
Jan
Feb
Mar
May
Jun
Apr
Pre-Science
Days for exploration
Use the best data for the final analysis
Write the paper
Derive conclusions
Exploration
Use the best data for the final analysis
Write the paper
Initial Analysis
Derive conclusions
Submit the paper
Jul
Aug
Sep
Oct
The Old Way: The Giovanni Way:
Read Data
Reformat
Analyze
Explore
Reproject
Visualize
Extract Parameter
Gio
vann
i
Mirad
or
Scientists have more time to do science!Scientists have more time to do science!
DO SCIENCE
Giovanni Allows Scientists to Concentrate on the Science
Filter Quality
Subset Spatially
Data Usage Workflow
33
Data Usage Workflow
34Integration
Reformat
Re-project
Filtering
Subset / Constrain
Data Usage Workflow
35
Integration Planning
Precision Requirements
Quality Assessment Requirements
Intended Use
Integration
Reformat
Re-project
Filtering
Subset / Constrain
Challenge• Giovanni streamlines data processing, performing required actions on
behalf of the user
– but automation amplifies the potential for users to generate and use
results they do not fully understand
• The assessment stage is integral for the user to understand fitness-for-
use of the result
– but Giovanni does not assist in assessment
• We are challenged to instrument the system to help users understand
results
36
Advising users on Product quality
We need to be able to advise MDSA users on:• Which product is better? Or, a better formulated
question: Which product has better quality over certain areas?
• Address harmonization of quality across products• How does sampling bias affect product quality?
–Spatial: sampling polar area more than equatorial–Temporal: sampling one time of a day only–Vertical: not sensitive to a certain part of the atmosphere thus emphasizing other parts– Pixel Quality : filtering by quality may mask out areas with specific features– Clear sky: e.g., measuring humidity only where there are no clouds may lead to dry bias– Surface type related
Research approach• Systematizing quality aspects
– Working through literature– Identifying aspects of quality and their dependence of
measurement and environmental conditions– Developing Data Quality ontologies– Understanding and collecting internal and external
provenance
• Developing rulesets allows to infer pieces of knowledge to extract and assemble
• Presenting the data quality knowledge with good visual, statement and references
Data Quality Ontology Development (Quality flag)
Working together with Chris Lynnes’s DQSS project, started from the pixel-level quality view.
Data Quality Ontology Development (Bias)
http://cmapspublic3.ihmc.us:80/servlet/SBReadResourceServlet?rid=1286316097170_183793435_22228&partName=htmltext
Modeling quality (Uncertainty)
Link to other cmap presentations of quality ontology:
http://cmapspublic3.ihmc.us:80/servlet/SBReadResourceServlet?rid=1299017667444_1897825847_19570&partName=htmltext
MDSA Aerosol Data Ontology Example
Ontology of Aerosol Data made with cmap ontology editor
RuleSet Development
[DiffNEQCT:(?s rdf:type gio:RequestedService),(?s gio:input ?a),(?a rdf:type gio:DataSelection),(?s gio:input ?b),(?b rdf:type gio:DataSelection),(?a gio:sourceDataset ?a.ds),(?b gio:sourceDataset ?b.ds),(?a.ds gio:fromDeployment ?a.dply),(?b.ds gio:fromDeployment ?b.dply),(?a.dply rdf:type gio:SunSynchronousOrbitalDeployment),(?b.dply rdf:type gio:SunSynchronousOrbitalDeployment),(?a.dply gio:hasNominalEquatorialCrossingTime ?a.neqct),(?b.dply gio:hasNominalEquatorialCrossingTime ?b.neqct),notEqual(?a.neqct, ?b.neqct)->(?s gio:issueAdvisory giodata:DifferentNEQCTAdvisory)]
Thus - Multi-Sensor Data Synergy Advisor
• Assemble semantic knowledge base
– Giovanni Service Selections
– Data Source Provenance (external provenance - low detail)
– Giovanni Planned Operations (what service intends to do)
• Analyze service plan
– Are we integrating/comparing/synthesizing?
• Are similar dimensions in data sources semantically comparable? (semantic diff)
• How comparable? (semantic distance)
– What data usage caveats exist for data sources?
• Advise regarding general fitness-for-use and data-usage caveats 44
Provenance Distance Computation
Provenance:Origin or source from which something comes, intention for use, who/what
generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility.
Based on provenance “distance”, we tell users how different data products are.
Issues:• Computing the similarity of two provenance traces is non-trivial
• Factors in provenance have varied weight on how comparable results of processing are
• Factors in provenance are interdependent in how they affect final results of processing• Need to characterize similarity of external (pre-Giovanni) provenance• Dimensions/factors that affect comparability is quickly overwhelming• Not all of these dimensions are independent - most of them are correlated with
each other.• Numerical studies comparing datasets can be used, when available, and where
applicable to Giovanni analysis
Provenance:Origin or source from which something comes, intention for use, who/what
generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility.
Based on provenance “distance”, we tell users how different data products are.
Issues:• Computing the similarity of two provenance traces is non-trivial
• Factors in provenance have varied weight on how comparable results of processing are
• Factors in provenance are interdependent in how they affect final results of processing• Need to characterize similarity of external (pre-Giovanni) provenance• Dimensions/factors that affect comparability is quickly overwhelming• Not all of these dimensions are independent - most of them are correlated with
each other.• Numerical studies comparing datasets can be used, when available, and where
applicable to Giovanni analysis
Assisting in Assessment
46
Integration Planning
Precision Requirements
Quality Assessment Requirements
Intended Use
Integration
Reformat
Re-project
Filtering
Subset / Constrain
MDSA Advisory Report
Provenance & Lineage Visualization
Multi-Domain Knowledgebase
47
Advisor Knowledge Base
48Advisor Rules test for potential anomalies, create association between service metadata and
anomaly metadata in Advisor KB
Semantic Advisor Architecture
RPI
…. complexity
50
Presenting data quality to users
We split quality (viewed here broadly) into two categories:• Global or product level quality information, e.g.
consistency, completeness, etc., that can be presented in a tabular form.
• Regional/seasonal. This is where we've tried various approaches: – maps with outlines regions, one map per
sensor/parameter/season– scatter plots with error estimates, one per a combination of
Aeronet station, parameter, and season; with different colors representing different wavelengths, etc.
Advisor Presentation Requirements
• Present metadata that can affect fitness for use of result
• In comparison or integration data sources– Make obvious which properties are
comparable– Highlight differences (that affect
comparability) where present• Present descriptive text (and if possible
visuals) for any data usage caveats highlighted by expert ruleset
• Presentation must be understandable by Earth Scientists!! Oh you laugh… 52
Advisory Report
• Tabular representation of the semantic equivalence of comparable data source and processing properties
• Advise of and describe potential data anomalies/bias
53
Advisory Report (Dimension Comparison Detail)
54
Advisory Report (Expert Advisories Detail)
55
Quality Comparison Table for Level-3 AOD (Global example)
Quality Aspect MODIS MISR
Completeness
Total Time Range Platform Time Range 2/2/200-presentTerra 2/2/2000-present
Aqua 7/2/2002-present
Local Revisit Time Platform Time Range Platform Time Range
Terra 10:30 AM Terra 10:30 AM
Aqua 1:30 PM
Revisit Time global coverage of entire earth in 1 day; coverage overlap near pole
global coverage of entire earth in 9 days & coverage in 2 days in polar region
Swath Width 2330 km 380 km
Spectral AOD AOD over ocean for 7 wavelengths (466, 553, 660, 860, 1240, 1640, 2120 nm );AOD over land for 4 wavelengths (466, 553, 660, 2120 nm (land)
AOD over land and ocean for 4 wavelengths (446, 558, 672, and 866 nm)
AOD Uncertainty or Expected Error (EE)
+-0.03+- 5% (over ocean; QAC > = 1)+-0.05+-20% (over land, QAC=3);
63% fall within 0.05 or 20% of Aeronet AOD; 40% are within 0.03 or 10%
Successful Retrievals
15% of Time 15% of Time (slightly more because of retrieval over Glint region also)
Data Quality Screening Service for Remote Sensing Data
The DQSS filters out bad pixels for the user•Default user scenario
– Search for data– Select science team recommendation for quality
screening (filtering)– Download screened data
•More advanced scenario – Search for data– Select custom quality screening parameters– Download screened data
57
The quality of data can vary considerably
AIRS Parameter Best (%)
Good (%)
Do Not Use (%)
Total Precipitable Water
38 38 24
Carbon Monoxide 64 7 29
Surface Temperature 5 44 51
Version 5 Level 2 Standard Retrieval Statistics
Percent of Biased Data in MODIS Aerosols Over Land Increase as
Confidence Flag Decreases
*Compliant data are within + 0.05 + 0.2Aeronet
Statistics from Hyer, E., J. Reid, and J. Zhang, 2010, An over-land aerosol optical depth data set for data assimilation by filtering, correction, and aggregation of MODIS Collection 5 optical depth retrievals, Atmos. Meas. Tech. Discuss., 3, 4091–4167.
The effect of bad qualitydata is often not negligible
Total Column Precipitable Water Quality
Best Good Do Not Usekg/m2
Hurricane Ike, 9/10/2008
DQSS replaces bad-quality pixels with fill values
Mask based on user criteria
(Quality level < 2)
Good quality data pixels
retained
Output file has the same format and structure as the input file (except for extra mask and original_data fields)
Original data array(Total column precipitable water)
AeroStat?
• Different papers provide different views on whether MODIS and MISR measure aerosols well.
• Peer-reviewed papers usually are well behind the latest version of the data.
• It is difficult to verify results of a published paper and resolve controversies between different groups as it is difficult to reproduce the results - they might have dealt with either different data or used different quality controls or flags.
• It is important to have an online shareable environment where data processing and analysis can be done in a transparent way by any user of this environment and can be shared amongst all the members of the aerosol community.
2/18/2011
62
Monthly AOD Standard deviation
Areas with high AOD standard deviation might point to areas of high uncertainty.Next slide shows these areas on map of mean AOD.
AeroStat: Online Platform for the Statistical Intercomparison of Aerosols
64
Explore & Visualize Level 3
Compare Level 3
Correct Level 2
Compare Level 2Before and After
Merge Level 2 to new Level 3
Level 3 are too aggregated
Switch to high-res Level 2
04/22/23
Explore & Visualize Level 2
64EGU 2011
Types of Bias Correction
Type of Correction
Spatial Basis
Temporal Basis
Pros Cons
Relative (Cross-sensor) linear Climatological
Region Season Not influenced by data in other regions, good sampling
Difficult to validate
Relative (Cross-sensor) non-linear Climatological
Global Full data record
Complete sampling Difficult to validate
Anchored Parameterized Linear
Near Aeronet stations
Full data record
Can be validated Limited areal sampling
Anchored Parameterized Non-Linear
Near Aeronet stations
Full data record
Can be validated Limited insight into correction2/18/2011
65
Data Quality Issues• Validation of aerosol data show that not all data pixels labeled as
“bad” are actually bad if looking at from a bias perspective.
• But many pixels are biased differently due to various reasons
From Levy et al, 2009
Quality & Bias assessment using FreeMind
from the Aerosol Parameter Ontology
FreeMind allows capturing various relations between various aspects of aerosol measurements, algorithms, conditions, validation, etc. The “traditional” worksheets do not support complex multi-dimensional nature of the task
Reference: Hyer, E. J., Reid, J. S., and Zhang, J., 2011: An over-land aerosol optical depth data set for data assimilation by filtering, correction, and aggregation of MODIS Collection 5 optical depth retrievals, Atmos. Meas. Tech., 4, 379-408, doi:10.5194/amt-4-379-2011
(General) Statement: Collection 5 MODIS AOD at 550 nm during Aug-Oct over Central South America highly over-estimates for large AOD and in non-burning season underestimates for small AOD, as compared to Aeronet; good comparisons are found at moderate AOD.Region & season characteristics: Central region of Brazil is mix of forest, cerrado, and pasture and known to have low AOD most of the year except during biomass burning season
(Example) : Scatter plot of MODIS AOD and AOD at 550 nm vs. Aeronet from ref. (Hyer et al, 2011) (Description Caption) shows severe over-estimation of MODIS Col 5 AOD (dark target algorithm) at large AOD at 550 nm during Aug-Oct 2005-2008 over Brazil. (Constraints) Only best quality of MODIS data (Quality =3 ) used. Data with scattering angle > 170 deg excluded. (Symbols) Red Lines define regions of Expected Error (EE), Green is the fitted slopeResults: Tolerance= 62% within EE; RMSE=0.212 ; r2=0.81; Slope=1.00For Low AOD (<0.2) Slope=0.3. For high AOD (> 1.4) Slope=1.54
(Dominating factors leading to Aerosol Estimate bias): 1.Large positive bias in AOD estimate during biomass burning season may be due to wrong assignment of Aerosol absorbing characteristics.(Specific explanation) a constant Single Scattering Albedo ~ 0.91 is assigned for all seasons, while the true value is closer to ~0.92-0.93. [ Notes or exceptions: Biomass burning regions in Southern Africa do not show as large positive bias as in this case, it may be due to different optical characteristics or single scattering albedo of smoke particles, Aeronet observations of SSA confirm this] 2. Low AOD is common in non burning season. In Low AOD cases, biases are highly dependent on lower boundary conditions. In general a negative bias is found due to uncertainty in Surface Reflectance Characterization which dominates if signal from atmospheric aerosol is low.
0 1 2 Aeronet AOD
Central South America
* Mato Grosso
* Santa Cruz
* Alta Floresta
AeroStat Ontology
69
Completeness: Observing Conditions for MODIS AOD at 550 nm Over Ocean
Region Ecosystem % of Retrieval Within
Expected Error
Average Aeronet AOD
AOD Estimation Relative to Aeronet
US Atlantic Ocean
Dominated by Fine mode aerosols (smoke & sulfate)
72% 0.15 Over- estimated(by 7%) *
Indian Ocean Dominated by Fine mode aerosols (smoke & sulfate)
64 % 0.16 Over- estimated (by 7% ) *
Asian Pacific Oceans
Dominated by fine aerosol, not dust
56% 0.21 Over-estimated (by 13%)
“Saharan” Ocean
Outflow Regions in Atlantic dominated by Dust in Spring
56% 0.31 Random Bias (1%) *
Mediterranean Dominated by fine aerosol
57% 0.23 Under- estimated (by 6% ) *
*Remer L. A. et al., 2005: The MODIS Aerosol Algorithm, Products and Validation. Journal of the Atmospheric Sciences, Special Section. 62, 947-973.
Completeness: Observing Conditions for MODIS AOD at 550 nm Over Land
Region Ecosystem % of Retrieval Within Expected
Error
Correlation W.r.t Chinese ground Sun Hazetometer
AOD Estimation Relative to Ground based sensor
Yanting, China Agriculture Site (central China)
45% slope=1.04 ; offset= -0.063 Corr ^2 = 0.83
Slightly Over- estimated
Fukung, China Semi Desert (North West China) Site
7% slope=1.65 offset=0.074 Corr ^2 = 0.58
Over- estimated(more than 100% at large AOD values
Beijing Urban SiteIndustrial Pollution
35% Slope = 0.38, Offset = 0.086, Corr ^2 = 0.46%
Severely Under-estimated(more than 100% at large AOD values)
* Li Z. et al, 2007: Validation and understanding of Moderate Resolution Imaging Spectroradiometer aerosol products (C5) using ground-based measurements from the handheld Sun photometer network in China, JGR, VOL. 112, D22S07, doi:10.1029/2007JD008479.
Summary
• Quality is very hard to characterize, different groups will focus on different and inconsistent measures of quality– Modern ontology representations to the rescue!
• Products with known Quality (whether good or bad quality) are more valuable than products with unknown Quality.– Known quality helps you correctly assess fitness-for-use
• Harmonization of data quality is even more difficult that characterizing quality of a single data product
72
Summary
• Advisory Report is not a replacement for proper analysis planning– But provides benefit for all user types summarizing general
fitness-for-usage, integrability, and data usage caveat information
– Science user feedback has been very positive
• Provenance trace dumps are difficult to read, especially to non-software engineers– Science user feedback; “Too much information in provenance
lineage, I need a simplified abstraction/view”
• Transparency Translucency
– make the important stuff stand out
• Uncertainty and bias are evident at may levels!!73
Future Work• Advisor suggestions to correct for potential
anomalies
• Views/abstractions of provenance based on specific user group requirements
• Continued iteration on visualization tools based on user requirements
• Present a comparability index / research techniques to quantify comparability
74
Your task when reviewing this..• Pull out all the informatics principles used…
75
Reading for this week• None
76
What is next• Week 11 – TBD
• Week 12 – guest lectures and written part of final project due
• Week 13 – project presentations
77