handling large amounts of data - mejeriteknisk selskab

30
Handling large amounts of data Applications in applied fluorescence spectroscopy Rasmus Bro Dept. Food Science University of Copenhagen

Upload: others

Post on 19-Nov-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Handling large amounts of data - Mejeriteknisk Selskab

Handling large amounts of data

Applications in applied

fluorescencespectroscopy

Rasmus Bro

Dept. Food ScienceUniversity of Copenhagen

Page 2: Handling large amounts of data - Mejeriteknisk Selskab

What we work with

Dioxin,Environment,Dose-response

Food quality, gastronomyRaw material influence,Production optimization, end point detection

Metabolomics,ProteomicsSystems biology,Cancer,Diabetes,Pharma…

Page 3: Handling large amounts of data - Mejeriteknisk Selskab

Data

FluorescenceHigh resolution NMRMass spectrometry Near-infraredRaman spectroscopyUltrasoundHyperspectral imagingChromatography…

Page 4: Handling large amounts of data - Mejeriteknisk Selskab

We often hide behind statistics

“Birth weight tended to be correlated with D vitamin (r = -0.1, p = 0.06)”

Page 5: Handling large amounts of data - Mejeriteknisk Selskab

Birth Weight

Pla

sm

a D

vit

am

in [

nm

olL

-1]

We often hide behind statistics

“Birth weight tended to be correlated with D vitamin (r = -0.1, p = 0.06)”

Page 6: Handling large amounts of data - Mejeriteknisk Selskab

Need new tools that

Enable the scientist to be critical

Connects the competences(technician, operator, biologist, engineer, doctor, …)

Page 7: Handling large amounts of data - Mejeriteknisk Selskab

Data from advanced sensors can not be analyzed with traditional data analysis

Sensors – multivariate

Page 8: Handling large amounts of data - Mejeriteknisk Selskab

Human pattern recognitionuses all available data

Page 9: Handling large amounts of data - Mejeriteknisk Selskab

0

20

40

60

80

Buttock-Knee Length - 300mmN

ort

h A

me

ric

a

La

tin

Am

eri

ca

(In

dia

n)

La

tin

Am

eri

ca

(E

uro

pe

an

No

rth

ern

Eu

rop

e

Ce

ntr

al

Eu

rop

e

Ea

ste

rn E

uro

pe

So

uth

-Ea

ste

rn E

uro

pe

Fra

nc

e

Ibe

ria

n P

en

ins

ula

No

rth

Afr

ica

We

st

Afr

ica

So

uth

-ea

ste

rn A

fric

a

Ne

ar

Ea

st

No

rth

In

dia

So

uth

In

dia

No

rth

As

ia

So

uth

Ch

ina

So

uth

-ea

st

As

ia

Au

str

ali

a

Ko

rea

/Ja

pa

n

Using a single variable is:Wrong, incorrect, suboptimal, oldfashioned, ..!

CaucasianAfricanAsian

Korea overlaps with Caucasian

Page 10: Handling large amounts of data - Mejeriteknisk Selskab

0

50

100

150Shoulder Breadth (biacromial) - 300mm

0

20

40

60

80

Buttock-Knee Length - 300mm

Caucas.AfricanAsian

No

rth

Am

eri

ca

La

tin

Am

eri

ca

(In

dia

n)

La

tin

Am

eri

ca

(E

uro

pe

an

No

rth

ern

Eu

rop

e

Ce

ntr

al

Eu

rop

e

Ea

ste

rn E

uro

pe

So

uth

-Ea

ste

rn E

uro

pe

Fra

nc

e

Ibe

ria

n P

en

ins

ula

No

rth

Afr

ica

We

st

Afr

ica

So

uth

-ea

ste

rn A

fric

a

Ne

ar

Ea

st

No

rth

In

dia

So

uth

In

dia

No

rth

As

ia

So

uth

Ch

ina

So

uth

-ea

st

As

ia

Au

str

ali

a

Ko

rea

/Ja

pa

n

Using a single variable is:Wrong, incorrect, suboptimal, oldfashioned, ..!

Page 11: Handling large amounts of data - Mejeriteknisk Selskab

380 390 400 410 420 430 440 450 460

330

340

350

360

370

380

, North Africa

, West Africa

, South-east Africa

North America

Latin America (European)

Northern Europe

Central Europe

Eastern Europe

South-Eastern Europe

France

Iberian Peninsula

, Australia

Latin America (Indian)

Near East

, North India

,South India

North Asia

, South China

South-east Asia

Korea/Japan

Shoulder Breadth

Bu

tto

ck-K

nee

Len

gth

Simply plot the two versus each other

Co-variation = new information that is notavailable in the individual variables

Lee S.,Bro R., Regional Differences in World Human Body Dimensions: The Multi-Way Analysis Approach, Theoretical Issues in Ergonomics Science, 2007

Page 12: Handling large amounts of data - Mejeriteknisk Selskab

Data analysis

7 variables

49 o

bje

cts

Austral

Austria

Barbado

Belgium

Brit Gu

Bulgari

Canada

Chile

Costa R

Cyprus

Czechos

Denmark

El Salv

Finland

France

. . .

West Ge

Yugosla

0.0195 0.8600 0.0010 0.0210 0.0985 0.8560 1.3160

0.0375 0.6950 0.0840 1.7200 0.0985 0.5460 0.6700

0.0604 3.0000 0.5480 7.1210 0.0911 0.0240 0.2000

0.0354 0.8190 0.3010 5.2570 0.0967 0.5360 1.1960

0.0671 3.9000 0.0030 0.1920 0.0740 0.0270 0.2350

0.0451 0.7400 0.0720 1.3800 0.0850 0.4560 0.3650

0.0273 0.9000 0.0020 0.2570 0.0975 0.6450 1.9470

0.1279 1.7000 0.0110 1.1640 0.0801 0.2570 0.3790

0.0789 2.6000 0.0240 0.9480 0.0794 0.3260 0.3570

0.0299 1.4000 0.0620 1.0420 0.0605 0.0780 0.4670

0.0310 0.6200 0.1080 1.8210 0.0975 0.3980 0.6800

0.0237 0.8300 0.1070 1.4340 0.0985 0.5700 1.0570

0.0763 5.4000 0.1270 1.4970 0.0394 0.0890 0.2190

0.0210 1.6000 0.0130 1.5120 0.0985 0.5290 0.7940

0.0274 1.0140 0.0830 1.2880 0.0964 0.6670 0.9430

. . . . . . . . . . . . . . . . . . . . .

0.0338 0.7980 0.2170 3.6310 0.0985 0.5280 0.9270

0.1000 1.6370 0.0730 1.2150 0.0770 0.5240 0.2650

Gunst & Mason (1980) Regression analysis and its applications: A data-oriented approach, NY, Marcel Dekker, p. 358

Page 13: Handling large amounts of data - Mejeriteknisk Selskab

7 variables

49 o

bje

cts

Austral

Austria

Barbado

Belgium

Brit Gu

Bulgari

Canada

Chile

Costa R

Cyprus

Czechos

Denmark

El Salv

Finland

France

. . .

West Ge

Yugosla

0.0195 0.8600 0.0010 0.0210 0.0985 0.8560 1.3160

0.0375 0.6950 0.0840 1.7200 0.0985 0.5460 0.6700

0.0604 3.0000 0.5480 7.1210 0.0911 0.0240 0.2000

0.0354 0.8190 0.3010 5.2570 0.0967 0.5360 1.1960

0.0671 3.9000 0.0030 0.1920 0.0740 0.0270 0.2350

0.0451 0.7400 0.0720 1.3800 0.0850 0.4560 0.3650

0.0273 0.9000 0.0020 0.2570 0.0975 0.6450 1.9470

0.1279 1.7000 0.0110 1.1640 0.0801 0.2570 0.3790

0.0789 2.6000 0.0240 0.9480 0.0794 0.3260 0.3570

0.0299 1.4000 0.0620 1.0420 0.0605 0.0780 0.4670

0.0310 0.6200 0.1080 1.8210 0.0975 0.3980 0.6800

0.0237 0.8300 0.1070 1.4340 0.0985 0.5700 1.0570

0.0763 5.4000 0.1270 1.4970 0.0394 0.0890 0.2190

0.0210 1.6000 0.0130 1.5120 0.0985 0.5290 0.7940

0.0274 1.0140 0.0830 1.2880 0.0964 0.6670 0.9430

. . . . . . . . . . . . . . . . . . . . .

0.0338 0.7980 0.2170 3.6310 0.0985 0.5280 0.9270

0.1000 1.6370 0.0730 1.2150 0.0770 0.5240 0.2650

Incredibly simple questions

What European country is most

similar to Japan?

What country is most bizarre?

Data analysis

Page 14: Handling large amounts of data - Mejeriteknisk Selskab

14

Principal component analysis

Page 15: Handling large amounts of data - Mejeriteknisk Selskab

7 variables

49 o

bje

cts

Austral

Austria

Barbado

Belgium

Brit Gu

Bulgari

Canada

Chile

Costa R

Cyprus

Czechos

Denmark

El Salv

Finland

France

. . .

West Ge

Yugosla

0.0195 0.8600 0.0010 0.0210 0.0985 0.8560 1.3160

0.0375 0.6950 0.0840 1.7200 0.0985 0.5460 0.6700

0.0604 3.0000 0.5480 7.1210 0.0911 0.0240 0.2000

0.0354 0.8190 0.3010 5.2570 0.0967 0.5360 1.1960

0.0671 3.9000 0.0030 0.1920 0.0740 0.0270 0.2350

0.0451 0.7400 0.0720 1.3800 0.0850 0.4560 0.3650

0.0273 0.9000 0.0020 0.2570 0.0975 0.6450 1.9470

0.1279 1.7000 0.0110 1.1640 0.0801 0.2570 0.3790

0.0789 2.6000 0.0240 0.9480 0.0794 0.3260 0.3570

0.0299 1.4000 0.0620 1.0420 0.0605 0.0780 0.4670

0.0310 0.6200 0.1080 1.8210 0.0975 0.3980 0.6800

0.0237 0.8300 0.1070 1.4340 0.0985 0.5700 1.0570

0.0763 5.4000 0.1270 1.4970 0.0394 0.0890 0.2190

0.0210 1.6000 0.0130 1.5120 0.0985 0.5290 0.7940

0.0274 1.0140 0.0830 1.2880 0.0964 0.6670 0.9430

. . . . . . . . . . . . . . . . . . . . .

0.0338 0.7980 0.2170 3.6310 0.0985 0.5280 0.9270

0.1000 1.6370 0.0730 1.2150 0.0770 0.5240 0.2650

Gunst & Mason (1980) Regression analysis and its applications: A data-oriented approach, NY, Marcel Dekker, p. 358

Page 16: Handling large amounts of data - Mejeriteknisk Selskab

0

0Austral

AustriaBarbadoBelgium

Brit Gu

BulgariCanada

Chile

Costa R

Cyprus Czechos

Denmark

El Salv

FinlandFrance

Guatema

Hong Ko

HungaryIceland

India

IrelandItaly

Jamaica

Japan Luxembo

Malaya

Malta

MauritiMexico

Netherl

New Zea

Nicarag

Norway

Panama Poland

Portuga

Puerto

Romania

Singapo

Spain

Sweden SwitzerTaiwan

Trinida

Un King

USA

USSR

West Ge

Yugosla

T1 47%

T2

2

7%

Principal Component Analysis

Outliers are easily spotted

Page 17: Handling large amounts of data - Mejeriteknisk Selskab

0

0

Austral

Austria

Barbado

Belgium

Brit Gu

Bulgari

Canada Chile Costa R

Cyprus

CzechosDenmark

El Salv Finland

France

Guatema

Hungary

IcelandIndia

Ireland

Italy Jamaica

Japan

Luxembo

Malaya

Mauriti

Mexico

Netherl

New ZeaNicarag

Norway

Panama

Poland Portuga

Puerto

RomaniaSpain Sweden

Switzer

Taiwan

Trinida Un King

USA USSR

West Ge

Yugosla

T1 46%

T2

2

7%

Principal Component Analysis

The exploratory aspect

Not two (Rich/poor), but three basic phenomena: Wealth, island, health

Page 18: Handling large amounts of data - Mejeriteknisk Selskab

Principal Component Analysis

Using additional information

-6 -4 -2 0 2 4 6-2

-1

0

1

2

3

4

5

6

Score 1

Sc

ore

2

Score plot - Singapore and Hong Kong removed

Eastern Europe

Rich

Poor

Page 19: Handling large amounts of data - Mejeriteknisk Selskab

• handle many variables • detect outliers• find new patterns• do true fingerprinting• generate new hypotheses

Page 20: Handling large amounts of data - Mejeriteknisk Selskab

1 2 3 4

60

70

80

Time (days)

Pre

dic

ted

am

mo

nia

co

nc.

(%)

At-line QA

On-line NIR

On-line NIR versus at-line QA measurements

Page 21: Handling large amounts of data - Mejeriteknisk Selskab

0 4 8 12

48

56

64

Time (days)

Pre

dic

ted

am

mo

nia

co

nc.

(%)

Automated

control

by on-line NIR

feedback

(step change tracking)

On-line NIR versus at-line QA measurements

Zachariassen et al, Use of NIR spectroscopy and

chemometrics for on-line process monitoring of

ammonia in Low Methoxylated Amidated pectin

production Chemometrics and Intelligent Laboratory

Systems 76(2005)149-161

Page 22: Handling large amounts of data - Mejeriteknisk Selskab

Understanding oxidation of butter and cheese

• Oxidation from light causes rancid taste of butter• Important for packaging and shelf-storage• Riboflavin thought to be important

Page 23: Handling large amounts of data - Mejeriteknisk Selskab

Butter at:Different lightWith/without O2

Different storage time

• Sensory analysis (quality)• Fluorescence EEM

Page 24: Handling large amounts of data - Mejeriteknisk Selskab

riboflavin

Hematoporph.

Chlorophyll a

Protoporf.

Chlorophyll b

riboflavin

Hematoporph.

Chlorophyll a

Protoporf.

Chlorophyll b

Typical data

Page 25: Handling large amounts of data - Mejeriteknisk Selskab

0 5 0 1 0 0 1 5 0 2 0 0 2 5 00

0 .0 5

0 . 1

0 .1 5

0 . 2

0 .2 5

600 620 635 650 665 680 695 705

nm

0 5 1 0 1 5 2 0 2 5 3 0 3 50

0 . 0 5

0 . 1

0 . 1 5

0 . 2

0 . 2 5

0 . 3

0 . 3 5

0 . 4

0 . 4 5

350 365 380 395 410 425 440 455 nm

Excitation Emission

Riboflavin

HematoPProtoP

Cloro B

Cloro A

Chlorin X

Understanding the chemistry

Page 26: Handling large amounts of data - Mejeriteknisk Selskab

Relation between rancid taste and concentrations

Rancid taste

Page 27: Handling large amounts of data - Mejeriteknisk Selskab

Relation between rancid taste and concentrations

Rancid taste

S S

S

S S

S

Protoporphyrin

Unknown

Riboflavin

Page 28: Handling large amounts of data - Mejeriteknisk Selskab

Oxidation:

Not just riboflavin

‘New’ ones seem more important

Affects optimal packaging

0

0,2

0,4

0,6

0,8

1

1,2

1,4

1,6

1,8

400 500 600 700

Ab

s.

Riboflavin

Chlorophyl B

UV

S S

S

S S

S

Protoporphyrin

Unknown

Riboflavin

Page 29: Handling large amounts of data - Mejeriteknisk Selskab

Examples

• Predicting consumer liking• Detecting adulteration• Understanding protein structure• Optimizing flavor of cheese• Quality control of raw material

Page 30: Handling large amounts of data - Mejeriteknisk Selskab

www.models.life.ku.dkM-files Papers CoursesData sets And many other things