making climate change data easier to find and use michael corsello seshu vaddey...
Post on 19-Dec-2015
214 views
TRANSCRIPT
[email protected] http://Eclime.blogspot.com
RFCorselloResearchFoundation
Making Climate Change Data Easier to Find and UseMichael CorselloSeshu Vaddey
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Climate Change is a Paradigm Shift
In how we think of climate dynamics: Non-Stationarity
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Climate Change is a Paradigm Shift
In how we workAs Planners, Engineers, Biologists, Hydrologists,
etc.
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Climate Change is a Paradigm Shift
Goal: Maximize the value of climate data in your organizationAid in Vulnerability AnalysesSupport Planning ProcessesSupport Decision Making Frameworks
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Otherwise
We are using old analytical techniquesDesigned for an old paradigm
Being applied to a new paradigm of problems
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Example
You get new Climate Change data
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Example
What’s the first thing you do?
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Example
Try to put it into excel
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Take a closer look at Climate Change data
UW CIG CBCCSP2 emission scenarios10 GCM’s3 downscaling methods
From available total of6 emission scenarios23 GCM’sMultiple Approaches
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Take a closer look at Climate Change data
•Total Size of Data Produced ~32 TB % of Total
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Take a closer look at Climate Change data
•Total Size of Data Produced ~32 TB % of Total
•Individual hydrologic projection (297 sites) ~1.3 GB 0.004 %
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Take a closer look at Climate Change data
•Total Size of Data Produced ~32 TB % of Total
•Individual hydrologic projection (297 sites) ~1.3 GB 0.004 %
•Hydrology (297 Sites, All Projections)) ~18.5 GB 0.06 %
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Take a closer look at Climate Change data
•Total Size of Data Produced ~32 TB % of Total
•Individual hydrologic projection (297 sites) ~1.3 GB 0.004 %
•Hydrology (297 Sites, All Projections)) ~18.5 GB 0.06 %
•Temp & Precip data (2 of 21 parameters)• Monthly Grids (all HD projections)• Daily Grids (all HD projections) ~65 GB
~2.4 TB
0.20 %7.5 %
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Take a closer look at Climate Change data
•Total Size of Data Produced ~32 TB % of Total
•Individual hydrologic projection (297 sites) ~1.3 GB 0.004 %
•Hydrology (297 Sites, All Projections)) ~18.5 GB 0.06 %
•Temp & Precip data (2 of 21 parameters)• Monthly Grids (all HD projections)• Daily Grids (all HD projections)
~65 GB~2.4 TB
0.20 %7.5 %
Daily total precipitationDaily average temperatureDaily maximum temperatureDaily minimum temperatureOutgoing longwave radiationIncoming shortwave radiationRelative humidityVapor pressure deficitDaily evapotranspirationDaily RunoffDaily BaseflowSoil Moisture, Layer 1Soil Moisture, Layer 2Soil Moisture, Layer 3Snow water equivalentSnow depthPotential Evapotranspiration 1Potential Evapotranspiration 2Potential Evapotranspiration 3Potential Evapotranspiration 4 (alfalfa)Potential Evapotranspiration 5
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Working with Climate Change data
The Challenge Volume of data swamps Cyber Infrastructure
Steep learning curves to use new tools
Tools are always changing
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Enter the Web and Cloud computing
Software as a Service
Platform as a Service
Infrastructure as a Service
SoftwarePlatform
Infrastructure
SoftwarePlatform
Infrastructure
SmartphonesSmartphones
CamerasCamerasTabletsTablets
LaptopsLaptopsDesktopsDesktops
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Enterprise Data Management
Move away from data living on our computers
Collect DataCollect DataLoad DataLoad Data
Analyze DataAnalyze Data
Read Data
Post Results
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Enterprise Data Management
The data and tools / applications now reside on servers (Cloud)
The data is now more crucial than ever
We all “share” common sets of data “through” the cloud
Collect DataCollect DataLoad DataLoad Data
Data Repository
Data Repository
Data Repository
Data Repository
Analyze DataAnalyze Data
Read Data
Post Results
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Enterprise Data Management
The data and tools / applications now reside on servers (Cloud)
The data is now more crucial than ever
We all “share” common sets of data “through” the cloud
Collect DataCollect DataLoad DataLoad Data
Data Repository
Data Repository
Data Repository
Data Repository
Analyze DataAnalyze Data
Read Data
Post Results
MANAGEMENT OF DATA IS PARAMOUNT
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
SummaryThe need for a paradigm shift
In how we work
This new paradigm must provide for Ease of use, and value to the organization (Return on
Investment)
CRF is working towards this goal We need users across different domains to work with us
RFCorselloResearchFoundation
Questions?Blog: http://Eclime.blogspot.com
Breakout Discussion Session Wednesday at 10am
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
CRF Developed Solution
Develop series of database structures
Based upon “real-world things” (like flows)
FlowSets
PK Id
FK3 TemporalIntervalIdFK4 SiteIdFK1 FlowTypeIdFK2 RunSetId Name Description StartDate
Alterations
PK Id
Name Description
AnnualFlows
PK,FK1 IdPK Ordinal
Flow
TemporalInterval
PK Id
Name Description TimeWindow
Models
PK Id
Name Description
FlowTypes
PK Id
FK1 FlowClassId Name Description
Approaches
PK Id
FK1 MSAIdFK2 ModelId Name Description
MonthlyFlows
PK,FK1 IdPK Ordinal
Flow
HourlyFlows
PK,FK1 IdPK Ordinal
Flow
CenturyFlows
PK,FK1 IdPK Ordinal
Flow
IrregularFlows
PK,FK1 IdPK StartDate
EndDate Flow
DailyFlows
PK,FK1 IdPK Ordinal
Flow
RunSets
PK Id
FK2 RunIdFK4 YearOfRecordIdFK1 ApproachId Name DescriptionFK3 RunTypeId
Methods
PK Id
Name Description
MethodSourceAlterations
PK Id
ModelSetIdFK2 MethodIdFK1 AlterationId Name Description
FlowClasses
PK Id
Name Description FlowType
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
CRF Developed Solution
Projects
PK Id
Label OfficialLabelFK1 ProjectTypeId PrimaryPOCId
ProjectTypes
PK Id
Label Description
ProjectHierarchy
PK,FK1 ParentIdPK,FK2 ChildId
ProjectTeams
PK Id
FK1 ProjectId OrgId Label
ProjectMembers
PK Id
PersonIdFK1 ProjectId Label
TeamMembers
PK,FK1 TeamIdPK,FK2 MemberId
Label
Studies
PK Id
FK1 ProjectId Label Description
Sites
PK Id
Label Description
SiteNotes
PK Id
FK1 SiteId Label Description EntryDate
SamplingActivities
PK Id
Label
SamplingEvents
PK Id
FK1 SamplingActivityId Label
ActivityTypes
PK Id
Label
SampleTypes
PK Id
Label
SamplingActivityTypes
PK Id
FK1 SamplingActivityIdFK3 SampleTypeId Label
SamplingEventActivityTypes
PK,FK1 SamplingEventId
FK2 SamplingActivityTypeId Label
SamplingActivitySites
PK,FK2 Id
FK1 SamplingActivityIdFK2 SiteId
SamplingEventSites
PK,FK2 SamplingEventIdPK,FK1 SamplingActivitySiteId
Label
StudySiteUsage
PK,FK2 StudyIdPK,FK1 SiteId
Proje
cts
Sites
Studie
s
Wat
er Q
uality
Organize these structures into separate databases for each “domain aspect”
Rather than a single monolithic database.
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
CRF Developed Solution Address
PK Id LONGBINARY
AddressData LONGTEXTFK1 CoutnryId LONGBINARY
Countries
PK Id LONGBINARY
CountryName TEXT(100) Isa2Alpha TEXT(50) Iso3Alpha TEXT(50) IanaDomain TEXT(50) UnVehicle TEXT(50) IocOlympic TEXT(50) UnIsoNumeric TEXT(50) ItuCalling TEXT(50)
StatesAndProvinces
PK Id LONGBINARY
StateProvinceName TEXT(100)FK1 CountryId LONGBINARY
PhoneNumber
PK Id LONGBINARY
FK1 CountryId LONGBINARY PhoneNumber TEXT(30)FK2 PhoneType LONGBINARY
PhoneTypeCodes
PK Id LONGBINARY
PhoneType TEXT(50)ContactInfoAddress
PK Id LONGBINARY
FK1 ContactInfoId LONGBINARYFK2 AddressId LONGBINARY
ContactInfo
PK Id LONGBINARY
Contactname TEXT(100)
ContactInfoPhone
PK Id LONGBINARY
FK2 ContactInfoId LONGBINARYFK1 PhoneId LONGBINARY
EmailAddress
PK Id LONGBINARY
DomainName TEXT(100) TopLevelDomain TEXT(50) EmailName TEXT(100)FK1 EmailType LONGBINARY
EmailTypeCodes
PK Id LONGBINARY
EmailType TEXT(50)
UrlAddress
PK Id LONGBINARY
DomainName TEXT(100) TopLevelDomain TEXT(50) UrlPath TEXT(600) Port LONG Protocol TEXT(50)FK1 UrlType LONGBINARY
UrlTypeCodes
PK Id LONGBINARY
UrlType TEXT(50)
ContactInfoEmail
PK Id LONGBINARY
FK2 ContactInfoId LONGBINARYFK1 EmailId LONGBINARY
ContactInfoUrl
PK Id LONGBINARY
FK1 ContactInfoId LONGBINARYFK2 UrlId LONGBINARY
Person
PK Id LONGBINARY
FirstName TEXT(50) LastName TEXT(75)FK1 LaborType LONGBINARY
LaborType
PK Id LONGBINARY
LaborType TEXT(75)
Person_Contact
PK Id LONGBINARY
FK2 PersonId LONGBINARYFK1 ContactInfoId LONGBINARY
User
PK Id LONGBINARY
Name VARCHAR(100) Password TEXT(100)
Roles
PK Id LONGBINARY
Name TEXT(75) Description CHAR(10)
UserXPerson
PK,FK1 Id LONGBINARY
FK2 UserId LONGBINARY
Permissions
PK Id LONGBINARY
Name TEXT(100) Description TEXT(1000) Module LONGBINARY
UserXRoles
PK Id LONGBINARY
FK2 RoleId LONGBINARYFK1 UserId LONGBINARY
RolesXPermissions
PK Id LONGBINARY
FK1 UserRoleId LONGBINARYFK2 PermissionId LONGBINARY
Cloud Based Data Warehouse
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Metadata Examples An important form of metadata is “chain of custody” (provenance)
Talks about the process by which data originates
What processing methods were used?
What was the source data?
Who did the work?
Another important form of metadata is descriptive When was the sensor last calibrated?
What was the nominal error as defined by the manufacturer?
What is the temporal nature of the data (does it “expire”)? What about licensing info?
Metadata can often be “linked” rather than “stored”
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Define EffortsDefine Efforts
Define DataStandards
Define DataStandards
Initiate EffortsInitiate Efforts
Collect DataCollect Data
Process andLoad Data
Process andLoad Data
Data Repository
Data Repository
Data Repository
Data RepositoryRead and
Analyze DataRead and
Analyze Data
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
The real Challenge with Climate Change?
We want the ONE true answer to Climate Change The rest of the data is meaningless
Because the paradigm we work with is deterministic We have a hard time dealing with uncertainty
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Cloud Computing Basics Move computing from device
oriented to resource oriented Give me enough computing
resources to get an answer
I don’t care where
Software as a Service Software is delivered as an
online service
Salesforce.com, Mint.com, Office 365
Platform as a Service A software platform (e.g. Sharepoint,
Drupal) is provided as a service
Your agency customizes the platform to your needs
Infrastructure as a Service You rent “virtual machines” and set
them up as you see fit
Basically a “virtual” computer
Add or remove machines “on-demand”
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Data Models
Asset Module Mission Module User Module Security Module Config Module
ConfigurationDatabase
ConfigurationDatabase
SecurityDatabaseSecurityDatabase
AssetDatabaseAsset
DatabaseMission
DatabaseMission
DatabaseUser
DatabaseUser
Database
Web UI
AssetData Access
Layer
AssetSecurityLayer
AssetLogicLayer
AssetConfiguration
Layer
MissionData Access
Layer
MissionSecurityLayer
MissionLogicLayer
MissionConfiguration
Layer
UserData Access
Layer
UserSecurityLayer
UserLogicLayer
UserConfiguration
Layer
SecurityData Access
Layer
SecuritySecurityLayer
SecurityLogicLayer
SecurityConfiguration
Layer
ConfigData Access
Layer
ConfigSecurityLayer
ConfigLogicLayer
ConfigConfiguration
Layer
Address
PK Id LONGBINARY
AddressData LONGTEXTFK1 CoutnryId LONGBINARY
Countries
PK Id LONGBINARY
CountryName TEXT(100) Isa2Alpha TEXT(50) Iso3Alpha TEXT(50) IanaDomain TEXT(50) UnVehicle TEXT(50) IocOlympic TEXT(50) UnIsoNumeric TEXT(50) ItuCalling TEXT(50)
StatesAndProvinces
PK Id LONGBINARY
StateProvinceName TEXT(100)FK1 CountryId LONGBINARY
PhoneNumber
PK Id LONGBINARY
FK1 CountryId LONGBINARY PhoneNumber TEXT(30)FK2 PhoneType LONGBINARY
PhoneTypeCodes
PK Id LONGBINARY
PhoneType TEXT(50)ContactInfoAddress
PK Id LONGBINARY
FK1 ContactInfoId LONGBINARYFK2 AddressId LONGBINARY
ContactInfo
PK Id LONGBINARY
Contactname TEXT(100)
ContactInfoPhone
PK Id LONGBINARY
FK2 ContactInfoId LONGBINARYFK1 PhoneId LONGBINARY
EmailAddress
PK Id LONGBINARY
DomainName TEXT(100) TopLevelDomain TEXT(50) EmailName TEXT(100)FK1 EmailType LONGBINARY
EmailTypeCodes
PK Id LONGBINARY
EmailType TEXT(50)
UrlAddress
PK Id LONGBINARY
DomainName TEXT(100) TopLevelDomain TEXT(50) UrlPath TEXT(600) Port LONG Protocol TEXT(50)FK1 UrlType LONGBINARY
UrlTypeCodes
PK Id LONGBINARY
UrlType TEXT(50)
ContactInfoEmail
PK Id LONGBINARY
FK2 ContactInfoId LONGBINARYFK1 EmailId LONGBINARY
ContactInfoUrl
PK Id LONGBINARY
FK1 ContactInfoId LONGBINARYFK2 UrlId LONGBINARY
Person
PK Id LONGBINARY
FirstName TEXT(50) LastName TEXT(75)FK1 LaborType LONGBINARY
LaborType
PK Id LONGBINARY
LaborType TEXT(75)
Person_Contact
PK Id LONGBINARY
FK2 PersonId LONGBINARYFK1 ContactInfoId LONGBINARY
User
PK Id LONGBINARY
Name VARCHAR(100) Password TEXT(100)
Roles
PK Id LONGBINARY
Name TEXT(75) Description CHAR(10)
UserXPerson
PK,FK1 Id LONGBINARY
FK2 UserId LONGBINARY
Permissions
PK Id LONGBINARY
Name TEXT(100) Description TEXT(1000) Module LONGBINARY
UserXRoles
PK Id LONGBINARY
FK2 RoleId LONGBINARYFK1 UserId LONGBINARY
RolesXPermissions
PK Id LONGBINARY
FK1 UserRoleId LONGBINARYFK2 PermissionId LONGBINARY
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Projects
PK Id
Label OfficialLabelFK1 ProjectTypeId PrimaryPOCId
ProjectTypes
PK Id
Label Description
ProjectHierarchy
PK,FK1 ParentIdPK,FK2 ChildId
ProjectTeams
PK Id
FK1 ProjectId OrgId Label
ProjectMembers
PK Id
PersonIdFK1 ProjectId Label
TeamMembers
PK,FK1 TeamIdPK,FK2 MemberId
Label
Studies
PK Id
FK1 ProjectId Label Description
Sites
PK Id
Label Description
SiteNotes
PK Id
FK1 SiteId Label Description EntryDate
SamplingActivities
PK Id
Label
SamplingEvents
PK Id
FK1 SamplingActivityId Label
ActivityTypes
PK Id
Label
SampleTypes
PK Id
Label
SamplingActivityTypes
PK Id
FK1 SamplingActivityIdFK3 SampleTypeId Label
SamplingEventActivityTypes
PK,FK1 SamplingEventId
FK2 SamplingActivityTypeId Label
SamplingActivitySites
PK,FK2 Id
FK1 SamplingActivityIdFK2 SiteId
SamplingEventSites
PK,FK2 SamplingEventIdPK,FK1 SamplingActivitySiteId
Label
StudySiteUsage
PK,FK2 StudyIdPK,FK1 SiteId
Proje
cts
Sites
Studie
s
Wat
er Q
uality
FlowSets
PK Id
FK3 TemporalIntervalIdFK4 SiteIdFK1 FlowTypeIdFK2 RunSetId Name Description StartDate
Alterations
PK Id
Name Description
AnnualFlows
PK,FK1 IdPK Ordinal
Flow
TemporalInterval
PK Id
Name Description TimeWindow
Models
PK Id
Name Description
FlowTypes
PK Id
FK1 FlowClassId Name Description
Approaches
PK Id
FK1 MSAIdFK2 ModelId Name Description
MonthlyFlows
PK,FK1 IdPK Ordinal
Flow
HourlyFlows
PK,FK1 IdPK Ordinal
Flow
CenturyFlows
PK,FK1 IdPK Ordinal
Flow
IrregularFlows
PK,FK1 IdPK StartDate
EndDate Flow
DailyFlows
PK,FK1 IdPK Ordinal
Flow
RunSets
PK Id
FK2 RunIdFK4 YearOfRecordIdFK1 ApproachId Name DescriptionFK3 RunTypeId
Methods
PK Id
Name Description
MethodSourceAlterations
PK Id
ModelSetIdFK2 MethodIdFK1 AlterationId Name Description
FlowClasses
PK Id
Name Description FlowType
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Workflows
More data to manage as we create more data All of our “final” data
Much of our “working” data
Initiate Run
Read Data
SetupModelRun
ExportedAs File
Data Repository
Data Repository
Data Repository
Data Repository
AnalystAnalyst
Analysis ResultAnalysis ResultReportReport
Data AccessApplication
Data AccessApplication
Initiate DataExtraction
ExtractedData Set
ExtractedData Set
AnalyticModel
AnalyticApplicationAnalytic
Application
Input To
Write Result Discover ResultGenerateReport
Write Result
Write Result
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
Workflows
Management translates to Ease of Access to Data
Analysis / Modeling with Data
Results & Reporting
Store Results for future use
Initiate Run
Read Data
SetupModelRun
ExportedAs File
Data Repository
Data Repository
Data Repository
Data Repository
AnalystAnalyst
Analysis ResultAnalysis ResultReportReport
Data AccessApplication
Data AccessApplication
Initiate DataExtraction
ExtractedData Set
ExtractedData Set
AnalyticModel
AnalyticApplicationAnalytic
Application
Input To
Write Result Discover ResultGenerateReport
Write Result
Write Result
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com
CRF Developed Solution
Developed Web and Desktop Tools to Access the Database(s)
RFCorselloResearchFoundation
[email protected] http://Eclime.blogspot.com