metadata for energy and working with python - bath · metadata for energy and working with python...

Metadata for Energyand working with Python

Jack KellyImperial College [email protected]

Outline

1. What is energy disaggregation?

2. My dataset: UKDALE

3. Opensource Python tool: NILMTK

4. Metadata

What is energy disaggregation?

UKDALE: UK Disaggregated ApplianceLevel Energy data

● 5 homes● Duration: 39 days to 3.5 years of data per house● Once every 6 seconds records wholehouse

apparent power and appliance power demand● CSV with YAML metadata● HDF5 version● Records wholehouse voltage and current at 16kHz

for 3 homes (stored as hourly FLAC files)● 6 terabytes● Recording hardware: offtheshelf and DIY● Available on UKERC EDC (with DOIs) and FTP● Paper in NPG's Scientific Data journal

Opensource Python disaggregation toolkit: NILMTK

● nilmtk.github.io● (NILM = NonIntrusive Load Monitoring)● Imports 11 datasets to standard format using

HDF5 and detailed metadata● Data cleaning, summary statistics, plotting etc.● 6 NILM algorithms (1 hosted by algo author)● NILM metrics● Can process more data than can fit into RAM● Documentation and code examples● Issue queue (501); mailing list; twitter; wiki

NILMTK: What worked well?

● 11 dataset importers: lots of contributed code● Standalone data conversion scripts are simple to write● People definitely tinker with NILMTK and cite it● Not clear how many people use it for productive work!● 6 NILM algorithms (although this required a lot of effort)● Lots of discussions on the issue queue● Lazy loading & outofcore is powerful (but complex)● Built on existing packages (Pandas etc.)● TESTS! ● Continuous integration (TravisCI)

NILMTK: What didn't work well?

● Outofcore is perhaps not worth the effort: makes the code much more complex. Instead use Blaze?

● Too “monolithic”. Should interact with existing tools more naturally. Perhaps separate NILMTK into separate tools?

● Documentation needs to be extensive but uptodate. This is hard work!

● Issue queue is hard work. Lots of “I don't know how to use Python; please hold my hand”.

● “Need to move to stand still”: keeping up to date with new versions of dependencies (Pandas etc.)

NILMTK: Conclusions

● KISS (especially if you want 3rd party contributions)● Focus on highquality documentation for users: will save

you time in the longrun● Make full use of automatic package managers so users can

install your code & dependencies easily

Datasets are complex

● Many meters● Each connected to some number of appliances● Arbitrary wiring between meters● Different meter models● Specify details of appliances● Preprocessing applied to data● etc.

NILM Metadata schema

● YAML (YAML Ain't Markup Language)● Controlled vocab, object inheritance● Schema has 2 components:

Schema for describing individual datasetsDataset, buildings, meters, appliances, measurements

Common metadata about appliance typesCategories, priors, models of appliances, countries

dataset.yaml

meter_devices.yaml

# building1.yaml

appliances: type: washer dryer instance: 1 meters: [10, 20] components: type: motor meters: [10] type: electric heating element meters: [20]

Example of an appliance containing other 'appliances'

Example of dataset metadata

Example of central metadata

NILM Metadata schema in use

● Used by NILMTK● At least 11 datasets use NILM Metadata● github.com/NILMTK/NILM_Metadata● Lots of contributions to controlled vocabulary

NILM Metadata: future work?

● Make it as easy as possible to use: simple profile?● e.g. detailed appliance metadata not very useful?● Reskin as RDF and OWL?!● Defaults● Wizard● Validator● Micro generation

metadata for energy and working with python - bath · metadata for energy and working with python...

Documents