coms21202: an introduction to doing things with data...coms21202: an introduction to doing things...

30
COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen [email protected] University of Bristol, Department of Computer Science Bristol BS8 1UB, UK March 9, 2020 Rui Ponte Costa & Dima Damen [email protected] COMS21202: Intro

Upload: others

Post on 24-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

COMS21202: An Introduction toDoing Things with Data

[modified from Dima Damen lecture notes]

Rui Ponte Costa & Dima [email protected]

University of Bristol, Department of Computer ScienceBristol BS8 1UB, UK

March 9, 2020

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 2: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

What is Data?

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 3: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

What is Data?

I Data: Symbols, Patterns and SignalsI Numeric (measurements, finances, ...)I Textual (emails, Web pages, medical records, ...)I Visual (images, video, graphics, animations)I Auditory (speech, audio)I Signals (GPS signals, neuronal activity, ...)I Many others...

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 4: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

This Unit

I This unit is about doing things with data... but notI storing, shuffling, searching (Data Structures and Algorithms)I sending (Networking)I compressing or encrypting (Crypto I and Crypto II)

I This unit is about:I extracting knowledge from dataI generating data and making predictionsI making decisions based on dataI ... often referred to as: Data Science

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 5: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

This Unit

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 6: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

This Unit

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 7: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

This Unit

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 8: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

This Unit

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 9: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

This Unit is an introduction to.....

sources:dmnews.com, infinitdatum.com, code-n.org

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 10: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

But it’s not about the data, but the science

Correlation does not imply causation! telegraph.co.uk

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 11: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

This Unit

Why is it important for Computer Science?I Fundamental to many application areas:

I Artificial Intelligence, Machine Learning, Deep LearningI Image Processing and Pattern RecognitionI Graphics, Animation and Virtual RealityI Computer Vision and RoboticsI Speech and Audio Processing.I With growing applications in: neuroscience, literature, agriculture, etc.

I Hence, preparation for application units in years 3 and 4.

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 12: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Ex1. A Fishy Problem

From: Pattern Classification by Duda, Hart and Stork

Data: images of fish

Aim: distinguish between sea bass and salmon

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 13: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Ex1. A Fishy Problem

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 14: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Ex1. A Fishy Problem

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 15: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Ex1. A Fishy Problem

Steps:1. Pre-processing [Unit - Part 1]» Rui Ponte Costa2. Feature Selection [Unit - Part 3]» Majid Mirmehdi3. Classification [Unit - Part 2]» Laurence Aitchison [unit director]

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 16: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Fishing for a SolutionE.g.:

1. Pre-processing e.g. Rotate and align, Segment fish from background2. Feature Selection e.g. measure length or brightness3. Classification e.g. find a threshold

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 17: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Fishing for a Solution

Multiple features could be selected, resulting in a multi-dimensional featurevector.

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 18: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Ex2. Speech Recognition

Data: analogue speech signals (time series numerical data)

Aim: convert audio into text

Steps:1. Pre-processing Digitisation2. Feature Selection Wave amplitude3. Inference Hidden Markov Models (Viterbi algorithm) [or Deep learning]

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 19: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Ex3. Spam FilterData: email texts (text data)Aim: determine whether the email is spamSteps:

1. Pre-processing Normalise words(e.g. vector encoding)2. Feature Selection Presence of words3. Classification Naive Bayes classifier

Select subset of words wi and determine P(wi |spam) and P(wi |¬spam)from frequencies in training data.For an email that contains w1,w2, ..,wn of the subset of words, assume

P(email |spam) = P(w1|spam)P(w2|spam)..P(wn|spam) (1)

and

P(email |¬spam) = P(w1|¬spam)P(w2|¬spam)..P(wn|¬spam) (2)

Email is spam if

P(email |spam) > P(email |¬spam) (3)

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 20: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Ex4. Autonomous Helicopter 1

1Stanford University [http://heli.stanford.edu/]Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 21: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Ex4. Autonomous Helicopter

Data: expert demonstrationAim: fly an autonomous helicopter

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 22: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Ex4. Autonomous Helicopter

Steps:1. Pre-processing align temporal sequences2. Feature Selection3. Model Building

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 23: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Ex4. Autonomous Helicopter

Steps:1. Pre-processing align temporal sequences2. Feature Selection control: acceleration, height, ...3. Model Building

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 24: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Ex4. Autonomous HelicopterSteps:

1. Pre-processing align temporal sequences2. Feature Selection control: acceleration, height, ...3. Model Building autonomous controller

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 25: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Ex4. A modern version (autonomous drone flying)

Skydio 2: https://youtu.be/imt2qZ7uw1s

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 26: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Unit Outlinehttps://uob-coms21202.github.io/COMS21202.github.io/

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 27: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Assessments

I CW1: One individual course work: report + code (40%)weeks 15-21 [submission in week 21]

I Discuss with others, but submissions are individualI Assessment for course work is marked in the form of a report - it’s

what you have understood about the data that mattersI CW2: Formative course work (i.e. not assessed)

I Exam (60%)

I Unit AveragesI 2018/2019 Avg: 66I 2016/2017 Avg: 60I 2015/2016 Avg: 56

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 28: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Labs

I Tuesdays 13:00 - 15:00 [by timetable]: Group 2I Thursday 09:00 - 11:00 [by timetable]: Group 1I Lab Environment [Jupyter + Python]

I Lab Work:I Do the labs in pairs

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 29: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Labs: Important!!

I Main source of 1:1 support will be from the TAs in the labs!I Labs are essential for the coursework!I Attendance will be taken.

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro

Page 30: COMS21202: An Introduction to Doing Things with Data...COMS21202: An Introduction to Doing Things with Data [modified from Dima Damen lecture notes] Rui Ponte Costa & Dima Damen rui.costa@bristol.ac.uk

Tasks

I Next Lab (Week 13): Introduction to Jupyter Notebook II Sheet on unit web page

I Next Problem Class (Thur 1-2): Data AcquisitionI Prepare your answers in advance [available online]

Rui Ponte Costa & Dima [email protected]

COMS21202: Intro