algorithms data mining and parallelism a l d · hugo silva, instituto de telecomunicações...

29
UPV - EHU Algorithms Data Mining and Parallelism 1 Algorithms Data Mining and Parallelism A L D A P A www.sc.ehu.es/aldapa Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department University of the Basque Country (UPV/EHU) Javier Muguerza

Upload: others

Post on 29-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

1

AlgorithmsData Mining

and Parallelism A L D A P A

www.sc.ehu.es/aldapa

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Javier Muguerza

Page 2: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

ALDAPA - Algorithms, Data Mining and Parallelism

12 researchers 6 lecturers (PhD)

Olatz Arbelaitz Agustin Arruabarrena Ibai Gurrutxaga José Ignacio Martín Javier Muguerza Jesús M. Pérez

1 researcher (PhD) Jose María Martínez

4 PhD students Joseba Alberdi Igor Ibarguren Aizea Lojo Iñigo Perona

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 3: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

Arises in 2001 from the High Performance Solutions team (1990)

Research activity:

machine learning, data mining supervised and unsupervised learning, prediction,

optimisation

solving real problems

efficient solutions prallelism and high performance computing

ALDAPA - Algorithms, Data Mining and Parallelism

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 4: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

Latest colaborations S21sec (computer security) Ikerlan (object movement modelling) Tecnalia (prediction – robotics) IK4 – Vicomtech – Bidasoa Turismo (web user

modelling - tourism)

Nano-bio Spectroscopy Group (HPC, GP-GPU)

LIPCNE - egokituz (user modelling – web mining and social networks, BCI)[integration process]

ALDAPA - Algorithms, Data Mining and Parallelism

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 5: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

Current research projects

UPV/EHU – Research group All research lines

Basque Government - Saiotek Web mining, Social web mining, Adaptive

systems

Spanish government Web mining for users with special needs

ALDAPA - Algorithms, Data Mining and Parallelism

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 6: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

Previous lines

>> NEUFODI: ANN for detection and diagnosis >> OCR – FORM-LESS: automatic character recognition >> Optimization: VRPTW, PET >> MALBEC: malware behaviour modelling >> CAMIKER: object movement modelling

Current lines

>> CTC – HARITZA: comprehensibility, class imbalance >> MODELACCESS: web user modelling, social web mining >> Physiological CS: BCI, ECG

----------------------------------------------------------- >> HPC – Parallelism: material physics, gp-gpu

ALDAPA - Algorithms, Data Mining and Parallelism

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 7: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

Techniques: Artificial Neural Networks (MLP) + Parallelism

NEUFODI: ANN for detection and diagnosis

Iberdrola – electric power transportation

flood of alarms

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 8: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

Typewritten text (250K) High variability and poor

quality error 2% → 3 months High accuraccy

Techniques: Supervised Classification (k-NN,…) Hierarchical systems Unsupervised learning Parallelism

OCR – FORM-LESS: automatic character recognition

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 9: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

DonostiaBilbao

Vitoria Pamplona

Delimited time Lots of constraints Route Minimizing costs

Techniques: Simulated Annealing Genetic Algorithms Greedy – Search

Optimization: Vehicle Routing Problem with TW

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 10: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

MCTOPTW Real time Adaptive system Maximizing User

experience

Techniques: Iterative local search Time dependent algorithms

Optimization: Personalized Electronic Tourist guide

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 11: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

Computer security Intrusion detection systems (Unsupervised anomaly

detection) Malware Classification (Clustering) --> MALBEC

MALBEC: malware behaviour modelling

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 12: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

12

MALBEC platform

MALBEC: malware behaviour modelling

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 13: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

CAMIKER: object movement modelling

Techniques: Clustering Time-series

Movement modelling

Anomaly detection

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 14: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

CAMIKER: object movement modelling

Techniques: Clustering Time-series

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 15: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

Previous lines

>> NEUFODI: ANN for detection and diagnosis >> OCR – FORM-LESS: automatic character recognition >> Optimization: VRPTW, PET >> MALBEC: malware behaviour modelling >> CAMIKER: object movement modelling

Current lines

>> CTC – HARITZA: comprehensibility, class imbalance >> MODELACCESS: web user modelling, social web mining >> Physiological CS: BCI, ECG

----------------------------------------------------------- >> HPC – Parallelism: material physics, gp-gpu

ALDAPA - Algorithms, Data Mining and Parallelism

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 16: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

Need of machine learning techniques that Produce a comprehensible models Able to deal with class imbalance problems

Supervised techniques: Classification trees (DT) Rule induction, ... Consolidated trees: CT (own proposal)

Benefit of combining

multiple subsamples but

without loss of explaining

capacity (single tree)

CT

CTC – HARITZA: comprehensibility, class imbalance

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 17: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

17

Haritza Platform – GUI

CTC – HARITZA: comprehensibility, class imbalance

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 18: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

18

Haritza Platform – Scripts

CTC – HARITZA: comprehensibility, class imbalance

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 19: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

19

Objective in any context: To adapt web pages to the need of the users

Adaptation becomes especially critical when the users have special needs

Contexts: Tourism website – Bidasoa Turismo Discapnet - ONCE

MODELACCESS: web user modelling

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 20: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

20

Web mining: the application of data mining techniques to the Web data Usage mining Content mining Structure mining

Phases: Data acquisition and preprocessing. Pattern discovery and analysis: Machine

Learning. Exploitation.

MODELACCESS: web user modelling

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 21: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

21

.

bidasoa turismo

MODELACCESS: web user modelling

Personalization of web navigation

Knowledge about interest of users

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 22: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

22

.MODELACCESS: web user modelling

Techniques: Clustering Frequent

Pattern Mining Topic

Modelling

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 23: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

23

.

discapnet

MODELACCESS: web user modelling

Problem detection (user, web)

Web personalization

Techniques: Supervised and unsupervised algorithms

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 24: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

24

.

guremintza – GUREAK

social network adapted for users with special needs

expansion stage

social network mining

MODELACCESS: social web mining

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 25: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

25

.MODELACCESS: social web mining

guremintza – GUREAK

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 26: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

Brain Computer Interface From electroencephalography signal (EEG) to

command devices

Physiological Computing Systems: BCI, ECG

From https://www.etsu.edu/cas/bcilab/

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 27: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

Hugo Silva, Instituto de Telecomunicações

Physiological Computing Systems: ECG

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 28: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

28

High Performance Computing (HPC)Collaboration with Nano-Bio Spectroscopy group (physicist)

Reach petaflop computing with a scientific code analysis of performance and scalabilitySimulate photosynthesis of the light in

chlorophyll (OCTOPUS code: http://www.tddft.org/programs/octopus/)

MachinesVargas, Mare Nostrum, Juguene…

Multi-level parallelism (MPI,

OpenMP, GP-GPU)

Scientific libraries: BLAS, LAPACK, FFT…

HPC – Parallelism: material physics, gp-gpu

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)

Page 29: Algorithms Data Mining and Parallelism A L D · Hugo Silva, Instituto de Telecomunicações Physiological Computing Systems: ECG Laboratory of Human-Computer Interaction for Special

UPV - EHU

AlgorithmsData Mining

and Parallelism

Eskerrik askoMuchas gracias

Thanks for your attention

http://www.sc.ehu.es/aldapa

ALDAPA - Algorithms, Data Mining and Parallelism

Laboratory of Human-Computer Interaction for Special Needs- ALDAPA Computer and Architecture and Technology Department

University of the Basque Country (UPV/EHU)