xray at scipy 2015
TRANSCRIPT
●●●●● la
titud
e
longitude
time
●●●●●●●
●●●
DataArray
○○○○
Dataset
○ DataArray○
time
longitude
latitude
land_coverelevation
>>> ds
<xray.Dataset>
Dimensions: (time: 10, latitude: 8, longitude: 8)
Coordinates:
* time (time) datetime64 2015-01-01 2015-01-02 2015-01-03 2015-01-04 ...
* latitude (latitude) float64 50.0 47.5 45.0 42.5 40.0 37.5 35.0 32.5
* longitude (longitude) float64 -105.0 -102.5 -100.0 -97.5 -95.0 -92.5 ...
elevation (longitude, latitude) int64 201 231 582 239 1848 1004 1004 ...
land_cover (longitude, latitude) object 'forest' 'urban' 'farmland'...
Data variables:
temperature (time, longitude, latitude) float64 13.7 8.031 18.36 24.95 ...
pressure (time, longitude, latitude) float64 1.374 1.142 1.388 0.9992 ...
# numpy style
ds.temperature[0, :, :]
# pandas style
ds.temperature.loc[:, -90, 50]
# with dimension names
ds.sel(time='2015-01-01')
ds.sel(longitude=-90, latitude=50, method='nearest')
# math
(10 + ds) ** 0.5
ds.temperature + ds.pressure
np.sin(ds.temperature)
# aggregation
ds.mean(dim='time')
ds.max(dim=['latitude', 'longitude'])
time tim
e
space
space+ =
Result has the union of all dimension names
year + =
Result has the intersection of coordinate labels
200020012002200320042005200620072008
XX
X
XX
# average by calendar month
ds.groupby('time.month').mean('time')
# resample to every 10 days
ds.resample('10D', dim='time', how='max')
# xray -> numpy
ds.temperature.values
# xray -> pandas
ds.to_dataframe()
# pandas -> xray
xray.Dataset.from_dataframe(df)
>>> ds = xray.open_mfdataset('/Users/shoyer/data/era-interim/*.nc')
>>> ds
<xray.Dataset>
Dimensions: (latitude: 256, longitude: 512, time: 52596)
Coordinates:
* latitude (latitude) float32 89.4628 88.767 88.067 87.3661 86.6648 ...
* longitude (longitude) float32 0.0 0.703125 1.40625 2.10938 2.8125 ...
* time (time) datetime64[ns] 1979-01-01 1979-01-01T06:00:00 ...
Data variables:
t2m (time, latitude, longitude) float64 240.6 240.6 240.6 ...
>>> ds.nbytes * (2 ** -30)
51.363675981760025
ds_by_season = ds.groupby('time.season').mean('time')
t2m_range = abs(ds_by_season.sel(season='JJA')
- ds_by_season.sel(season='DJF')).t2m
%time result = t2m_range.load()
CPU times: user 2min 1s, sys: 49.5 s, total: 2min 51s
Wall time: 38.6 s
More details: continuum.io/blog/xray-dask
pandas: indexing, factorizeNumPy: arraysnetCDF4, h5py, SciPy: IOdask.array: out of core arrays