3 kishor
DESCRIPTION
Accessibility is a complex property of information, a composite of several different indicators that measure the ease with which one can get to and work with information from another source. These different indicators of accessibility scale differently with the size and scope of information. As the datasets get bigger, some indicators remain constant, some scale linearly, while others may scale in a more accelerated manner. At the University of Wisconsin Carbon Modeling group, we are building a very large, spatially accessible U.S. national climate dataset for ecosystem process modeling that is bringing up unique challenges and lessons for accessibility in the multi-Terabyte scale datasets. This paper describes the problem, and the information accessibility challenges and lessons learned while putting together its solution.TRANSCRIPT
Accessibility Considerations for Very
Large Datasets
Puneet KishorUniversity of Wisconsin-Madison and Creative Commons
Monday, October 25, 2010
Acknowledgments to CODATA for inviting me, Creative Commons for funding my trip, University of Wisconsin-Madison for paying my salary, and most importantly, the US Federal Goverment for making all the data available to anyone, anywhere without any pre-conditionsMonday, October 25, 2010
Research context: ecosystem process modeling of very large terrestrial ecosystems
Monday, October 25, 2010
Information by numbers
Monday, October 25, 2010
7daily variables
tmax
tmin
taveprc
psra
dvp
dda
yl
Monday, October 25, 2010
1km2 cell
1 km
1 km
tmax
tmin
taveprc
psra
dvp
dda
yl
Monday, October 25, 2010
13million cells
4587
2889
.25
Monday, October 25, 2010
8401days
8400
Monday, October 25, 2010
111billion septets
.32
tmax
tmin
taveprc
psra
dvp
dda
yl
111.32 b
Monday, October 25, 2010
725raw gigabytes
.78
Monday, October 25, 2010
10times as much in a database
Monday, October 25, 2010
84GB of NetCDF format in tar gzipped archives
Monday, October 25, 2010
2 3 4 58 9 10 11
612
713 14
1
4˚square chunks
Monday, October 25, 2010
“½”incomplete documentation
Monday, October 25, 2010
0ways to query the data
?XMonday, October 25, 2010
1. Acquire NetCDF file of lat/lon values for each cell from the weather data 1 km2 estimates
2. Dump lat/lon values to CSV with Panoply3. Import into ArcMap as XY data4. Export as shapefile5. Assign WGS84 datum to shapefile in ArcCatalog6. Reproject to Lambert Spherical (“US National
Atlas Equal Area”)7. Separate by 2x2 degree tile using "tile_num"
attribute (so grid will match the netCDF met files) using defination query in ArcMap and exporting to individual shapefiles (256 tiles) as "mask".
8. Open lambert points in qGIS and make 1km grid (shapefile) for each 2x2 tile
9. Assign projection to output (EPSG:2163)10. Add each new grid shapefile (one at a time) to
ArcMap with 2x2 Grid as separate layer11. Select by location (select from grid x that
intersect mask x)12. Export selected features of grid x (now will be
numbered sequentially by record in a way that matches the met NetCDF “ncells”)
13. Clean up: delete extra fields from qGIS (ID,MAXX,MINX,MAXY,MINY) add ncell_id (FID+1) block_id, block_name
“10”times the work to unpack the data
Monday, October 25, 2010
Many kinds of queries f<variable> <location> <point in time>avg(srad) at x,y on Dec 2, 2001tmin for area on May 19, 1992tmax at x,y on May 19, 1992
f<variable> <point location> <duration of time> tave at x,y during the first quarter of 1983sum(vpd) at x,y during the last week of Mar, 2003
Monday, October 25, 2010
accessible |akˈsesəbəl|adjective1 (of a place) able to be reached or entered : the town is accessible by bus | the building has been made accessible to disabled people.• (of an object, service, or facility) able to be easily obtained or
used : making learning opportunities more accessible to adults.• easily understood : his Latin grammar is lucid and accessible.• able to be reached or entered by people in wheelchairs : it
provides specialized features such as nonslip floors and accessible entrances.
2 (of a person, typically one in a position of authority or importance) friendly and easy to talk to; approachable : he is more accessible than most tycoons.Monday, October 25, 2010
Accessible information is easy to: find, determine what one can do with it, acquire, and useMonday, October 25, 2010
Factors that affect accessibility: law; technology; culture; semantics; and economics
Monday, October 25, 2010
Law makes sharing permissible; technology makes it possible; culture makes it acceptable; semantics make it understandable; and economics affordableMonday, October 25, 2010
It is permissible, acceptable, and affordable to access public sector information, but not necessarily possible or understandableMonday, October 25, 2010
Goals of the new storage: make the information technologically and semantically accessibleMonday, October 25, 2010
Allow access by providing user-interface, application programming interface and documentationMonday, October 25, 2010