metadata for energy and working with python - bath · metadata for energy and working with python...
TRANSCRIPT
Metadata for Energyand working with Python
Jack KellyImperial College [email protected]
Outline
1. What is energy disaggregation?
2. My dataset: UKDALE
3. Opensource Python tool: NILMTK
4. Metadata
What is energy disaggregation?
UKDALE: UK Disaggregated ApplianceLevel Energy data
● 5 homes● Duration: 39 days to 3.5 years of data per house● Once every 6 seconds records wholehouse
apparent power and appliance power demand● CSV with YAML metadata● HDF5 version● Records wholehouse voltage and current at 16kHz
for 3 homes (stored as hourly FLAC files)● 6 terabytes● Recording hardware: offtheshelf and DIY● Available on UKERC EDC (with DOIs) and FTP● Paper in NPG's Scientific Data journal
Opensource Python disaggregation toolkit: NILMTK
● nilmtk.github.io● (NILM = NonIntrusive Load Monitoring)● Imports 11 datasets to standard format using
HDF5 and detailed metadata● Data cleaning, summary statistics, plotting etc.● 6 NILM algorithms (1 hosted by algo author)● NILM metrics● Can process more data than can fit into RAM● Documentation and code examples● Issue queue (501); mailing list; twitter; wiki
NILMTK: What worked well?
● 11 dataset importers: lots of contributed code● Standalone data conversion scripts are simple to write● People definitely tinker with NILMTK and cite it● Not clear how many people use it for productive work!● 6 NILM algorithms (although this required a lot of effort)● Lots of discussions on the issue queue● Lazy loading & outofcore is powerful (but complex)● Built on existing packages (Pandas etc.)● TESTS! ● Continuous integration (TravisCI)
NILMTK: What didn't work well?
● Outofcore is perhaps not worth the effort: makes the code much more complex. Instead use Blaze?
● Too “monolithic”. Should interact with existing tools more naturally. Perhaps separate NILMTK into separate tools?
● Documentation needs to be extensive but uptodate. This is hard work!
● Issue queue is hard work. Lots of “I don't know how to use Python; please hold my hand”.
● “Need to move to stand still”: keeping up to date with new versions of dependencies (Pandas etc.)
NILMTK: Conclusions
● KISS (especially if you want 3rd party contributions)● Focus on highquality documentation for users: will save
you time in the longrun● Make full use of automatic package managers so users can
install your code & dependencies easily
Datasets are complex
● Many meters● Each connected to some number of appliances● Arbitrary wiring between meters● Different meter models● Specify details of appliances● Preprocessing applied to data● etc.
NILM Metadata schema
● YAML (YAML Ain't Markup Language)● Controlled vocab, object inheritance● Schema has 2 components:
Schema for describing individual datasetsDataset, buildings, meters, appliances, measurements
Common metadata about appliance typesCategories, priors, models of appliances, countries
dataset.yaml
meter_devices.yaml
# building1.yaml
appliances: type: washer dryer instance: 1 meters: [10, 20] components: type: motor meters: [10] type: electric heating element meters: [20]
Example of an appliance containing other 'appliances'
Example of dataset metadata
Example of dataset metadata
Example of dataset metadata
Example of dataset metadata
Example of dataset metadata
Example of dataset metadata
Example of dataset metadata
Example of dataset metadata
Example of dataset metadata
Example of dataset metadata
Example of dataset metadata
Example of dataset metadata
Example of central metadata
NILM Metadata schema in use
● Used by NILMTK● At least 11 datasets use NILM Metadata● github.com/NILMTK/NILM_Metadata● Lots of contributions to controlled vocabulary
NILM Metadata: future work?
● Make it as easy as possible to use: simple profile?● e.g. detailed appliance metadata not very useful?● Reskin as RDF and OWL?!● Defaults● Wizard● Validator● Micro generation