27.03.2017 creativedatamining lecture527.03.2017 lecture5 dr.daniel zünd daniellegriego...

18
27.03.2017 Lecture 5 Dr. Daniel Zünd Danielle Griego Artem Chirkin Creative Data Mining uncover and evaluate Lecture 5 | 27.03.2017

Upload: others

Post on 07-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

27.03.2017

Lecture 5

Dr. Daniel Zünd

Danielle Griego

Artem Chirkin

Creative Data Mining

uncover and evaluate

Lecture 5 | 27.03.2017

Page 2: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

Lecture 5 | 27.03.2017

What we’ll cover today

– Time series data

– Time series data visualisation

– Time series analysis

– Python example

Lecture 5 | 27.03.2017

Page 3: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

Lecture 5 | 27.03.2017

Time Series

Data

Time series data is a collection of data points in-

dexed or labeled in time order.

Example data:

– The first column is the timestamp inmilliseconds,

starting at January 1, 1970 00:00:00 UTC.

– The second column is the heart rate of the par-

ticipant at that time.

Example (heart rate):

1460457325, 69.97

1460457337, 73.13

1460457345, 79.52

1460457349, 82.82

1460457353, 86.35

1460457357, 89.87

1460457361, 93.15

1460457369, 99.72

1460457373,103.12

1460457377,106.45

1460457381,109.83

1460457385,113.18

1460457389,116.38

1460457393,118.53

Lecture 5 | 27.03.2017

Page 4: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

Lecture 5 | 27.03.2017

Time Series

Visualisation

Time series data is normally plotted using line graphs.

The best way to do this is using matplotlib in

Python. It provides many different ways to plot

any kind of data andmany different types of graphs.

Link: matplotlib.org

Lecture 5 | 27.03.2017

Page 5: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

Lecture 5 | 27.03.2017

Time Series

Visualisation

import matplotlib.pyplot as plt

timestamp = [1460457325, 1460457337, 1460457345, 1460457349, 1460457353]

hr = [69.97, 73.13, 79.52, 82.82, 86.35]

plt.plot(timestamp, hr)

plt.show()

Link: matplotlib.org

Lecture 5 | 27.03.2017

Page 6: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

Lecture 5 | 27.03.2017

Time Series

Curve Fitting

The goal is to find a function that fits the data best.

This can be used to:

– Predict and forecast.

– Describe the behavior of themeasured value over

time.

– …

Links: matplotlib.org & scikit-learn.org

Lecture 5 | 27.03.2017

Page 7: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

Lecture 5 | 27.03.2017

Time Series

Curve Fitting

The example on the right tries to find the parame-

ters α and β that minimize the error for the following

function:

y = α+ βx

These functions may have any desired complexity

and do not necessarily have to be linear.

Links: matplotlib.org & scikit-learn.org

Lecture 5 | 27.03.2017

Page 8: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

Lecture 5 | 27.03.2017

Time Seriesimport matplotlib.pyplot as plt

import numpy as np

from sklearn import linear_model

times = [1460457325, 1460457337, 1460457345, 1460457349, 1460457353]

hr = [69.97, 73.13, 79.52, 82.82, 86.35]

timestamp = np.matrix(times).transpose()

hr = np.matrix(hr).transpose()

regr = linear_model.linearregression()

regr.fit(timestamp, hr)

plt.scatter(timestamp, hr)

plt.plot(timestamp, regr.predict(timestamp))

plt.show()

Links: matplotlib.org & scikit-learn.org

Lecture 5 | 27.03.2017

Page 9: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

Lecture 5 | 27.03.2017

Time Series

Curve Fitting

Source: www.dtreg.com

Lecture 5 | 27.03.2017

Page 10: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

Lecture 5 | 27.03.2017

Time Series

Weather Forecast

Source: www.meteoschweiz.admin.ch

Lecture 5 | 27.03.2017

Page 11: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

Lecture 5 | 27.03.2017

Time Series

matplotlib

import matplotlib.pyplot as plt

import csv

source = ”./beautiful-ugly.csv”

with open(source, ’r’) as f:

fig = plt.figure()

plt.plot(time, answers)

plt.show()

js = csv.reader(f)

time = next(js)[2:]

answers = next(js)[2:]

Lecture 5 | 27.03.2017

Page 12: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

Lecture 5 | 27.03.2017

Time Series

matplotlib

import matplotlib.pyplot as plt

import csv

source = ”./beautiful-ugly.csv”

with open(source, ’r’) as f:

fig = plt.figure()

plt.plot(time, answers, marker=’o’, linewidth=3)

plt.margins(0.05)

plt.grid(True)

plt.show()

js = csv.reader(f)

time = next(js)[2:]

answers = next(js)[2:]

Lecture 5 | 27.03.2017

Page 13: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

Lecture 5 | 27.03.2017

Time Series

matplotlib

import matplotlib.pyplot as plt

import csv

source = ”./beautiful-ugly.csv”

with open(source, ’r’) as f:

fig = plt.figure()

plt.plot(time, answers, marker=’o’, linewidth=3)

plt.margins(0.05)

plt.grid(True)

plt.title(”Survey Answers”, fontsize=25)

plt.xlabel(”Location”, fontsize=20)

plt.ylabel(”Answer”, fontsize=20)

plt.show()

js = csv.reader(f)

time = next(js)[2:]

answers = next(js)[2:]

Lecture 5 | 27.03.2017

Page 14: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

Lecture 5 | 27.03.2017

Time Series

matplotlib

import matplotlib.pyplot as plt

import csv

source = ”./beautiful-ugly.csv”

legend = [ ]

fig = plt.figure()

with open(source, ’r’) as f:

plt.margins(0.05)

plt.legend(legend, bbox_to_anchor=(1,1), loc=’2’)

plt.grid(True)

js = csv.reader(f)

time = next(js)[2:]

for row in js:

answers = row[2:]

legend.append(row[1])

plt.plot(time, answers, marker=’o’)

Lecture 5 | 27.03.2017

Page 15: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

Lecture 5 | 27.03.2017

Time Series

matplotlib

import matplotlib.pyplot as plt

import csv

import numpy as np

source = ”./beautiful-ugly.csv”

fig = plt.figure()

answers = [ ]

with open(source, ’r’) as f:

answers = np.array(answers)

plt.boxplot(answers)

plt.margins(0.05)

plt.title(”Survey Answers”, fontsize=25)

plt.xlabel(”Location”, fontsize=20)

plt.ylabel(”Answer”, fontsize=20)

plt.show()

js = csv.reader(f)

time = next(js)[2:]

for row in js:

answers.append(map(float, row[2:]))

Lecture 5 | 27.03.2017

Page 16: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

Lecture 5 | 27.03.2017

Time Series

pandas

import pandas as pd

import matplotlib.pyplot as plt

dataFrame = pd.read_csv(”./beautiful-ugly.csv”)

dataFrame.plot.box()

plt.margins(0.05)

plt.title(”Survey Answers”, fontsize=25)

plt.xlabel(”Location”, fontsize=20)

plt.ylabel(”Answer”, fontsize=20)

plt.show()

Lecture 5 | 27.03.2017

Page 17: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

Lecture 5 | 27.03.2017

Homework

Explore the change of the heart and the noise

along the experiment by visualising the values in

different ways.

Additionally, write a very small report interpreting

your visualisation, by comparing them to the path

the participants walked.

Hand-in: 10.04.2017 to [email protected]

Lecture 5 | 27.03.2017

Page 18: 27.03.2017 CreativeDataMining Lecture527.03.2017 Lecture5 Dr.Daniel Zünd DanielleGriego ArtemChirkin CreativeDataMining uncoverandevaluate Lecture5|27.03.2017

Lecture 5 | 27.03.2017

Thank you!

Lecture 5 | 27.03.2017