ultra - vtt technical research centre of finland · • graph based approach • regular...

8
Data analysis and regular decomposition Background on EU-proposal by Hannu Reittu, VTT (Self-organizing networks, BA 1144) ULTRA “ULTimate Regularity – Applications of laws on large structures”

Upload: others

Post on 26-Feb-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ULTRA - VTT Technical Research Centre of Finland · • Graph based approach • Regular decomposition + • Other graph compression techniques • Associations with pattern recognition

Data analysis and regular decompositionBackground on EU-proposal

by Hannu Reittu, VTT(Self-organizing networks, BA 1144)

ULTRA“ULTimate Regularity – Applications of laws on large

structures”

Page 2: ULTRA - VTT Technical Research Centre of Finland · • Graph based approach • Regular decomposition + • Other graph compression techniques • Associations with pattern recognition

Starting point: Szemerédi’s Regularity Lemma (SRL)

Random bipartite graph: draw independently links withprobability p

Very simple object!

Page 3: ULTRA - VTT Technical Research Centre of Finland · • Graph based approach • Regular decomposition + • Other graph compression techniques • Associations with pattern recognition

SRL:

Any (large enough) graph can bedecomposed into a bounded number ofsubgraphs, such that pairs of subgraphsare almost like random bipartite graphs

Page 4: ULTRA - VTT Technical Research Centre of Finland · • Graph based approach • Regular decomposition + • Other graph compression techniques • Associations with pattern recognition

SRL is central for graph theory (and not only)large matrices in data analysis?

Indicates a ’clustering’ that tells the structure of large matrix?computable! Even in exremely large scales (from a sample)

• We have started such a program:• A p2p network

• A real matrix (synthetic):

• Working algorithms: ’regular decomposition’

=

=

Page 5: ULTRA - VTT Technical Research Centre of Finland · • Graph based approach • Regular decomposition + • Other graph compression techniques • Associations with pattern recognition

Example: segmentation of households based on electric smart meter readings (per ½ hour)

Page 6: ULTRA - VTT Technical Research Centre of Finland · • Graph based approach • Regular decomposition + • Other graph compression techniques • Associations with pattern recognition

Columns: half hours of the weekrows: different househods

elements: average (over several months) consumption of powerRegular decomposition of rows into 10 groups (1,2,…,10)

=

1

1 1

1 1

1

1

2

2

2

2

2

2

2

4

4 4 4 4

4

4

5 5

5 5 5

5

5

6

66

66

6

6

7

7 77

7

7

7

8

8 88 8

8

8

9

9 9 9 9

9

9

10

10 10 10 10

10

10

Status :1 An employee2 Self employed with employees3 Self employed with no employees4 Unemployed actively seeking work5 Unemployed not actively seeking work6 Retired7 Carer : Looking after relative family

1 2 3 4 5 6 70.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

status

Psta

tus

Meaningfullsegmentation:

Page 7: ULTRA - VTT Technical Research Centre of Finland · • Graph based approach • Regular decomposition + • Other graph compression techniques • Associations with pattern recognition

ULTRA?

Budget of the call: 36 M EurULTRA: 3-4 M Eur/ 3 years

Page 8: ULTRA - VTT Technical Research Centre of Finland · • Graph based approach • Regular decomposition + • Other graph compression techniques • Associations with pattern recognition

Big Data analytics

• Graph based approach• Regular decomposition +• Other graph compression techniques• Associations with pattern recognition and machine learning• Companies play a central role: problems, business grade implementations of algorithms, end

users of methodology• Work with real data creating real applications and business opportunities

Some research tasks:– Segmentation based on data– Quantitative division between bulk data and borderline cases– Temporal aspects of data: detecting and predicting the changes– Simple models from data -> planning and generating possible future scenarios– Implementing algorithms in parallel computation platforms