xray at scipy 2015

20

Upload: stephan-hoyer

Post on 21-Aug-2015

699 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: xray at SciPy 2015
Page 2: xray at SciPy 2015
Page 3: xray at SciPy 2015

●●●●● la

titud

e

longitude

time

Page 4: xray at SciPy 2015

●●●●●●●

Page 5: xray at SciPy 2015

●●●

Page 6: xray at SciPy 2015

DataArray

○○○○

Dataset

○ DataArray○

Page 7: xray at SciPy 2015

time

longitude

latitude

land_coverelevation

Page 8: xray at SciPy 2015

>>> ds

<xray.Dataset>

Dimensions: (time: 10, latitude: 8, longitude: 8)

Coordinates:

* time (time) datetime64 2015-01-01 2015-01-02 2015-01-03 2015-01-04 ...

* latitude (latitude) float64 50.0 47.5 45.0 42.5 40.0 37.5 35.0 32.5

* longitude (longitude) float64 -105.0 -102.5 -100.0 -97.5 -95.0 -92.5 ...

elevation (longitude, latitude) int64 201 231 582 239 1848 1004 1004 ...

land_cover (longitude, latitude) object 'forest' 'urban' 'farmland'...

Data variables:

temperature (time, longitude, latitude) float64 13.7 8.031 18.36 24.95 ...

pressure (time, longitude, latitude) float64 1.374 1.142 1.388 0.9992 ...

Page 9: xray at SciPy 2015

# numpy style

ds.temperature[0, :, :]

# pandas style

ds.temperature.loc[:, -90, 50]

# with dimension names

ds.sel(time='2015-01-01')

ds.sel(longitude=-90, latitude=50, method='nearest')

Page 10: xray at SciPy 2015

# math

(10 + ds) ** 0.5

ds.temperature + ds.pressure

np.sin(ds.temperature)

# aggregation

ds.mean(dim='time')

ds.max(dim=['latitude', 'longitude'])

Page 11: xray at SciPy 2015

time tim

e

space

space+ =

Result has the union of all dimension names

Page 12: xray at SciPy 2015

year + =

Result has the intersection of coordinate labels

200020012002200320042005200620072008

XX

X

XX

Page 13: xray at SciPy 2015

# average by calendar month

ds.groupby('time.month').mean('time')

# resample to every 10 days

ds.resample('10D', dim='time', how='max')

Page 14: xray at SciPy 2015

# xray -> numpy

ds.temperature.values

# xray -> pandas

ds.to_dataframe()

# pandas -> xray

xray.Dataset.from_dataframe(df)

Page 15: xray at SciPy 2015

>>> ds = xray.open_mfdataset('/Users/shoyer/data/era-interim/*.nc')

>>> ds

<xray.Dataset>

Dimensions: (latitude: 256, longitude: 512, time: 52596)

Coordinates:

* latitude (latitude) float32 89.4628 88.767 88.067 87.3661 86.6648 ...

* longitude (longitude) float32 0.0 0.703125 1.40625 2.10938 2.8125 ...

* time (time) datetime64[ns] 1979-01-01 1979-01-01T06:00:00 ...

Data variables:

t2m (time, latitude, longitude) float64 240.6 240.6 240.6 ...

>>> ds.nbytes * (2 ** -30)

51.363675981760025

Page 16: xray at SciPy 2015

ds_by_season = ds.groupby('time.season').mean('time')

t2m_range = abs(ds_by_season.sel(season='JJA')

- ds_by_season.sel(season='DJF')).t2m

%time result = t2m_range.load()

CPU times: user 2min 1s, sys: 49.5 s, total: 2min 51s

Wall time: 38.6 s

Page 17: xray at SciPy 2015

More details: continuum.io/blog/xray-dask

Page 18: xray at SciPy 2015
Page 19: xray at SciPy 2015

pandas: indexing, factorizeNumPy: arraysnetCDF4, h5py, SciPy: IOdask.array: out of core arrays