27.03.2017 creativedatamining lecture527.03.2017 lecture5 dr.daniel zünd daniellegriego...
TRANSCRIPT
27.03.2017
Lecture 5
Dr. Daniel Zünd
Danielle Griego
Artem Chirkin
Creative Data Mining
uncover and evaluate
Lecture 5 | 27.03.2017
Lecture 5 | 27.03.2017
What we’ll cover today
– Time series data
– Time series data visualisation
– Time series analysis
– Python example
Lecture 5 | 27.03.2017
Lecture 5 | 27.03.2017
Time Series
Data
Time series data is a collection of data points in-
dexed or labeled in time order.
Example data:
– The first column is the timestamp inmilliseconds,
starting at January 1, 1970 00:00:00 UTC.
– The second column is the heart rate of the par-
ticipant at that time.
Example (heart rate):
1460457325, 69.97
1460457337, 73.13
1460457345, 79.52
1460457349, 82.82
1460457353, 86.35
1460457357, 89.87
1460457361, 93.15
1460457369, 99.72
1460457373,103.12
1460457377,106.45
1460457381,109.83
1460457385,113.18
1460457389,116.38
1460457393,118.53
Lecture 5 | 27.03.2017
Lecture 5 | 27.03.2017
Time Series
Visualisation
Time series data is normally plotted using line graphs.
The best way to do this is using matplotlib in
Python. It provides many different ways to plot
any kind of data andmany different types of graphs.
Link: matplotlib.org
Lecture 5 | 27.03.2017
Lecture 5 | 27.03.2017
Time Series
Visualisation
import matplotlib.pyplot as plt
timestamp = [1460457325, 1460457337, 1460457345, 1460457349, 1460457353]
hr = [69.97, 73.13, 79.52, 82.82, 86.35]
plt.plot(timestamp, hr)
plt.show()
Link: matplotlib.org
Lecture 5 | 27.03.2017
Lecture 5 | 27.03.2017
Time Series
Curve Fitting
The goal is to find a function that fits the data best.
This can be used to:
– Predict and forecast.
– Describe the behavior of themeasured value over
time.
– …
Links: matplotlib.org & scikit-learn.org
Lecture 5 | 27.03.2017
Lecture 5 | 27.03.2017
Time Series
Curve Fitting
The example on the right tries to find the parame-
ters α and β that minimize the error for the following
function:
y = α+ βx
These functions may have any desired complexity
and do not necessarily have to be linear.
Links: matplotlib.org & scikit-learn.org
Lecture 5 | 27.03.2017
Lecture 5 | 27.03.2017
Time Seriesimport matplotlib.pyplot as plt
import numpy as np
from sklearn import linear_model
times = [1460457325, 1460457337, 1460457345, 1460457349, 1460457353]
hr = [69.97, 73.13, 79.52, 82.82, 86.35]
timestamp = np.matrix(times).transpose()
hr = np.matrix(hr).transpose()
regr = linear_model.linearregression()
regr.fit(timestamp, hr)
plt.scatter(timestamp, hr)
plt.plot(timestamp, regr.predict(timestamp))
plt.show()
Links: matplotlib.org & scikit-learn.org
Lecture 5 | 27.03.2017
Lecture 5 | 27.03.2017
Time Series
Curve Fitting
Source: www.dtreg.com
Lecture 5 | 27.03.2017
Lecture 5 | 27.03.2017
Time Series
Weather Forecast
Source: www.meteoschweiz.admin.ch
Lecture 5 | 27.03.2017
Lecture 5 | 27.03.2017
Time Series
matplotlib
import matplotlib.pyplot as plt
import csv
source = ”./beautiful-ugly.csv”
with open(source, ’r’) as f:
fig = plt.figure()
plt.plot(time, answers)
plt.show()
js = csv.reader(f)
time = next(js)[2:]
answers = next(js)[2:]
Lecture 5 | 27.03.2017
Lecture 5 | 27.03.2017
Time Series
matplotlib
import matplotlib.pyplot as plt
import csv
source = ”./beautiful-ugly.csv”
with open(source, ’r’) as f:
fig = plt.figure()
plt.plot(time, answers, marker=’o’, linewidth=3)
plt.margins(0.05)
plt.grid(True)
plt.show()
js = csv.reader(f)
time = next(js)[2:]
answers = next(js)[2:]
Lecture 5 | 27.03.2017
Lecture 5 | 27.03.2017
Time Series
matplotlib
import matplotlib.pyplot as plt
import csv
source = ”./beautiful-ugly.csv”
with open(source, ’r’) as f:
fig = plt.figure()
plt.plot(time, answers, marker=’o’, linewidth=3)
plt.margins(0.05)
plt.grid(True)
plt.title(”Survey Answers”, fontsize=25)
plt.xlabel(”Location”, fontsize=20)
plt.ylabel(”Answer”, fontsize=20)
plt.show()
js = csv.reader(f)
time = next(js)[2:]
answers = next(js)[2:]
Lecture 5 | 27.03.2017
Lecture 5 | 27.03.2017
Time Series
matplotlib
import matplotlib.pyplot as plt
import csv
source = ”./beautiful-ugly.csv”
legend = [ ]
fig = plt.figure()
with open(source, ’r’) as f:
plt.margins(0.05)
plt.legend(legend, bbox_to_anchor=(1,1), loc=’2’)
plt.grid(True)
…
js = csv.reader(f)
time = next(js)[2:]
for row in js:
answers = row[2:]
legend.append(row[1])
plt.plot(time, answers, marker=’o’)
Lecture 5 | 27.03.2017
Lecture 5 | 27.03.2017
Time Series
matplotlib
import matplotlib.pyplot as plt
import csv
import numpy as np
source = ”./beautiful-ugly.csv”
fig = plt.figure()
answers = [ ]
with open(source, ’r’) as f:
answers = np.array(answers)
plt.boxplot(answers)
plt.margins(0.05)
plt.title(”Survey Answers”, fontsize=25)
plt.xlabel(”Location”, fontsize=20)
plt.ylabel(”Answer”, fontsize=20)
plt.show()
js = csv.reader(f)
time = next(js)[2:]
for row in js:
answers.append(map(float, row[2:]))
Lecture 5 | 27.03.2017
Lecture 5 | 27.03.2017
Time Series
pandas
import pandas as pd
import matplotlib.pyplot as plt
dataFrame = pd.read_csv(”./beautiful-ugly.csv”)
dataFrame.plot.box()
plt.margins(0.05)
plt.title(”Survey Answers”, fontsize=25)
plt.xlabel(”Location”, fontsize=20)
plt.ylabel(”Answer”, fontsize=20)
plt.show()
Lecture 5 | 27.03.2017
Lecture 5 | 27.03.2017
Homework
Explore the change of the heart and the noise
along the experiment by visualising the values in
different ways.
Additionally, write a very small report interpreting
your visualisation, by comparing them to the path
the participants walked.
Hand-in: 10.04.2017 to [email protected]
Lecture 5 | 27.03.2017
Lecture 5 | 27.03.2017
Thank you!
Lecture 5 | 27.03.2017