data science initiatives at (icmc) usp€¦ · data science areas at icmc-usp andre ponce de leon...

27
Data Science Initiatives at (ICMC) USP André C. P. L. F. de Carvalho Universidade de São Paulo [email protected]

Upload: others

Post on 04-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

Data Science Initiatives at (ICMC) USP

André C. P. L. F. de Carvalho

Universidade de São Paulo

[email protected]

Page 2: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

André P L F de Carvalho 2

Topics

Introduction

São Carlos

Data Science ICMC-USP

Responsible Data Science

Previous partnerships with UGPN

Conclusion

Page 3: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

São Carlos

São Carlos

Andre Ponce de Leon de Carvalho 3

Page 4: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

São Carlos

235 km from São Paulo

220,000 inhabitants

Brazilian Capital of Technology

Highest PhD per capita in Latin America

Two public universities and two Embrapa research centres

Startup hub

© André de Carvalho - ICMC/USP 4 Andre Ponce de Leon de Carvalho 4

Page 5: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

São Carlos

Andre Ponce de Leon de Carvalho 5

Average temp.: Summer: 240

Winter: 160

Page 6: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

ICMC

Institute of Mathematics and Computer Sciences USP São Carlos campus,

São Paulo, Brazil

Four departments: Computer Sciences

Mathematics

Applied Mathematics

Statistics

© André de Carvalho - ICMC/USP 6

Page 7: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

Data Science Network

EMAp-FGV

ICMC-USP

Center for Data Science, NYU

University of London

INRIA, France

Big Data Research Center, Chinese University of Hong Kong

Andre Ponce de Leon de Carvalho 7

Page 8: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

Data Science and Engineering Consortium

George Mason University, USA

Oregon State University, USA

Universidad Carlos III, Madrid, Spain

Universidad de Santiago de Compostela

Universidad Nacional Autonoma de Mexico, México

INFOTEC-CONACYT, México

Universidad Católica San Pablo, Arequipa, Peru

Universidade de São Paulo, Sao Carlos, Brazil

Universidade do Porto, Portugal

Andre Ponce de Leon de Carvalho 8

Page 9: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

Data Science at ICMC-USP

Undergraduate level:

Minor in Data Science for 6 majors

Applied Mathematics

Computer Science

Computer Engineering

Information Science

Statistics

Pure Mathematics

Data Science BSc in final approval stage

André P L F de Carvalho 9

Page 10: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

Data Science at ICMC-USP

Graduate level

Professional MSc in Data Science

MBA in Data Science

Business Intelligence and Analytics (with PBS, University of Porto)

Several PhD researchers in Data Science

Last 6 years, 3 of the best PhD Thesis Brazil (Ministry of Education) where in Data Science

All from ICMC-USP

André P L F de Carvalho 10

Page 11: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

Data Science at ICMC-USP

Research

Researchers from Applied Mathematics Computer Science and Statistics

USP NAP Research Center in Machine Learning

Center on Mathematics, Statistics and Computer Sciences for Industry (CeMEAI)

1 of 17 Excellence Centers funded by FAPESP

Data Science is 1 of its 3 main areas

Members: 32 PIs, 73 Associated Researchers, 32 Postdocs and 226 PhDs

André P L F de Carvalho 11

Page 12: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

Books on Data Science

Andre Ponce de Leon de Carvalho 12

Page 13: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

Responsible Data Science

Andre Ponce de Leon de Carvalho 13

Page 14: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

CeMEAI

CEPID CeMEAI

Center for Mathematical Sciences Applied to Industry

Knowledge transfer to industries

CTA, UFSCar, UNICAMP, UNIFESP, USP

11 years project, started in 2013

Budget of US$ 15 million

Projects, Workshops with Companies, Professional MSc, Hackathons

© André de Carvalho - ICMC/USP 14

Page 15: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

Collaboration with companies

Andre Ponce de Leon de Carvalho 15

Agribusiness

Finance Health

Education

Industry

Environment Energy

Technology

Page 16: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

André P L F de Carvalho 16

Governo do Estado de São Paulo

Secretaria da Fazenda

Page 17: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

CeMEAI Researchers

S. J. Rio Preto Ribeirão Preto

São Carlos Bauru

Rosana

Botucatu

Itapeva

Campinas

S.J. dos Campos

São Paulo Buri

Pres. Prudente

Page 18: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

HPC - Cluster SGI-ICE-X

Andre Ponce de Leon de Carvalho 18

• 104 blades • 20 cores - Intel Xeon E5-

2680v2 • 128 GB RAM

• 1 blade Xeon Phi • 20 + 60 cores • 128 GB RAM

• Storage 175 Tb

• 40 blades • 28 cores - Intel Xeon E5-

2680v4 • 128 Gb RAM

• 6 GPU-Node • GPU Nvidia P100

• 4 Fat node • 16 cores E5-2680v2 • 512 Gb RAM

• Data server 0.5 PB

Page 19: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

Data Science areas at ICMC-USP

Andre Ponce de Leon de Carvalho 19

Complex systems

Data stream mining

AutoML

Machine learning Data mining

Robotics

Bayesian Inference

Functional data Modelling

Statistical Quality Control

Classification and Categorical Data Analysis

Survival Analysis

Time varing big data visualization

Time series Data Science

Item Response Theory

Regression Models

Page 20: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

Andre Ponce de Leon de Carvalho 20

Page 21: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

AutoML tools

Andre Ponce de Leon de Carvalho 21

CreateML Apple

Amazon Rekognition

Page 22: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

Paje AutoML

End-to-end AutoML

Main focus

Data pre-processing

Explainable ML

Post-processing

Pipeline

Easily expandable

Andre Ponce de Leon de Carvalho 22

Page 23: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

Pajé Time Arrow

Andre Ponce de Leon de Carvalho 23

Ricardo Sovat. Uma Abordagem Híbrida

Baseada em Casos e Redes Neurais.

Uma aplicação: escolha e configuração

de modelos de redes neurais.

Claudia Regina Milaré.

Extração de Conhecimento

de Redes Neurais Artificiais.

Bruno F. de Souza. Meta-

aprendizagem aplicada à

classificação de dados de

expressão gênica.

Rafael G. Mantovani. Use

of meta-learning for

hyperparameter tuning of

classification problems.

Luis Paulo Garcia.

Noise detection in

classification

problems.

Davi P. Santos.

Seleção e controle do

viés de aprendizado

ativo.

André L. D. Rossi.

Meta-aprendizado

aplicado a fluxos

contínuos de dados.

Rodrigo C.Barros.

Automatic design of

decision tree induction

algorithms.

Mariá C. V. Nascimento. Meta-

heurísticas para o problema

de agrupamento de dados em

grafo.

Estéfane G. Lacerda.

Model Selection of RBF

Networks via Genetic

Algorithms.

Page 24: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

Pajé Team Docentes

André C P L F de Carvalho, USP André Rossi, UNESP Bruno Campos Pimentel, UFAL Bruno Feres de Sousa, UFMA Jefferson Oliva, UTFPR Jorge Kanda, UFAM Luis Paulo Faina Garcia, UNB Rafael Mantovanni, UTFPR

Colaborador Carlos Soares, UP

Pós-doutorandos Kelly da Silva, Intel Tiago Botari, FAPESP

Técnico superior Davi Pereira dos Santos, FAPESP

Mestrandos Eric Rocha, CNPq Tamires Brito, CNPq

Doutorandos Adriano Rivoli da Silva, UTFPR Douglas Castilho, IFPC Edésio Alcobaça, FAPESP Gean Trindade, CNPq Jonas Kasmanas, FAPESP Moisés Rocha, FAPESP Saulo Mastelini, FAPESP Tiago Cunha, FCT Victor Barella, FAPESP Victor Padilha, FAPESP

Iniciação Científica Felipe Siqueira, CNPq Samuel Tomaz Bastos, CNPq Matheus Sanchez, PRP-USP Luan Icaro Pinto Arcanjo, PRP-USP Rodrigo Martins Pires, PRP-USP Thiago Musico, PRP-USP

© André de Carvalho - ICMC/USP 24

Page 25: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

Responsible Data Science

Accountability

Reproducibility

Privacy

Transparency

Explainable AI (XAI)

Fairness

Fair Information Practices

André de Carvalho - ICMC/USP 25

Page 26: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

André de Carvalho - ICMC/USP 26

Page 27: Data Science Initiatives at (ICMC) USP€¦ · Data Science areas at ICMC-USP Andre Ponce de Leon de Carvalho 19 Complex systems Data stream mining AutoML Machine learning Data mining

Questions?

Andre Ponce de Leon de Carvalho 27