course organisation and project...

35
Course Organisation and Project Presentation Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern KTI, TU Graz 2014-03-05 Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 1 / 35

Upload: others

Post on 15-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Course Organisation and Project PresentationKnowledge Discovery and Data Mining 2 (VU) (707.004)

Roman Kern

KTI, TU Graz

2014-03-05

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 1 / 35

Page 2: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Overall Goal

Bring the theoretical knowledge acquired in KDDM1 into practicalapplication,

... or, what it is like to be a data scientist?

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 2 / 35

Page 3: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Lecturer

Name: Roman KernOffice: IWT & Know-Center,

Inffeldgasse 13, 6th Floor, Room 072Office hours: By appointment

Phone: +43-316/873-30860E-Mail: [email protected]

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 3 / 35

Page 4: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Lecturer

Name: Denis HelicOffice: IWT, Inffeldgasse 13, 5th Floor, Room 070

Office hours: Tuesday from 12 til 13Phone: +43-316/873-30610email: [email protected]

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 4 / 35

Page 5: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Language

Lectures in English

Communication in German/English

If in German: please informally (Du)!

Student presentations in English

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 5 / 35

Page 6: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Outline

1 Welcome and Introduction

2 Course Organization

3 Motivation

4 Projects

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 6 / 35

Page 7: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Welcome and Introduction

Teaching @ KTI

Introduction to KnowledgeTechnologies

Databases

Semantic TechnologiesWeb Science and Web

Technology

Multimedia InformationSystems I

Knowledge Discovery andData Mining I

Multimedia InformationSystems II

Knowledge Discovery andData Mining II

Network Science

Structured Data Unstructured Data Data Analysis

(Relational data)

(Ontologies)

(Web systems)

(Web data)

(Networks and analysis)

(Theory and basics)

(Applications)

(Visualizations)

+ Projects, Bachelor Thesis, Master Projects, Master Thesis, PhD Thesis

Applications

Evaluation Methodology

(User studies)

Sensors & User Models

(Sensor data)

Science 2.0

(Science and Social Media)

B

B

M M

MBM

M M

MMM

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 7 / 35

Page 8: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Course Organization

Course OrganisationWhen & What

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 8 / 35

Page 9: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Course Organization

Course Calendar

The course will take place

... Wednesday, 12:15

... in HS Modul

Note: Please have an eye on the calendar

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 9 / 35

Page 10: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Course Organization

Course Calendar

05.03.2014: Course organization & Project Presentation

12.03.2014: Bayesian Inference - LDA

19.03.2013: Information Retrieval and Applications

26.03.2014: Preprocessing Plattform & Practical Applications

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 10 / 35

Page 11: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Course Organization

Course Calendar

02.04.2014: Presentation: Data Set & Scope

09.04.2014: Sampling Methods

07.05.2014: Presentation: Approach & Algorithms

14.05.2014: Non-Parametric Methods

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 11 / 35

Page 12: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Course Organization

Course Calendar

21.05.2014: Pattern Mining

04.06.2014: Ensemble Methods

11.06.2014: Advanced Clustering & Classification Algorithms

18.06.2014: Presentation: Project Reports

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 12 / 35

Page 13: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Course Organization

Course Logistics

Course website:http://kti.tugraz.at/staff/rkern/courses/kddm2

Slides will be made available on the course website

Description of the practical projects and access to data sets

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 13 / 35

Page 14: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Course Organization

Grading

There is no written exam

Therefore grading is based on the practical projects:

... soundness of the approach

... the outcome of the projects

... the presentation of the results

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 14 / 35

Page 15: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Course Organization

Lecture

Basic structure of a typical lecture

Open questions (e.g. from the projects, past lectures)

Break of 5 minutes in the middle

Discussions at the end of the lecture

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 15 / 35

Page 16: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Motivation

MotivationWhy should one be interested in KDDM2?

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 16 / 35

Page 17: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Motivation

Demand

Job as data scientist

“Data Scientist: The Sexiest Job of the 21st Century”

http://hbr.org/2012/10/

data-scientist-the-sexiest-job-of-the-21st-century/

“Start with the fact that there are no university programs offering degrees in data science”

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 17 / 35

Page 18: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Motivation

Definition

What is a data scientist?

Data scientists are inquisitive: exploring, asking questions, doing “what if”analysis, questioning existing assumptions and processes. Armed with dataand analytical results, a top-tier data scientist will then communicateinformed conclusions and recommendations across an organization’sleadership structure.

www-01.ibm.com/software/data/infosphere/data-scientist/

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 18 / 35

Page 19: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Motivation

Technologies

Play with cool technologies

... in an hands-on approach

Discussion & feedback

Reports from the field

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 19 / 35

Page 20: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Projects

ProjectsPractical part of the course

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 20 / 35

Page 21: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Projects

Overview Projects

There are three practical projects

... from various stages of the KDD process

Group of single students

... or groups of two people

⇒ with bigger scope

The focus is more on the approach, rather than the final results

... but the results should be assessed (evaluated)

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 21 / 35

Page 22: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Projects

Overview Projects

Important: Please report groups/project before the first presentation

... by sending an e-mail to [email protected]

Please add a [KDDM2] to the mail subject

Students without project assignment will be unregistered

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 22 / 35

Page 23: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Projects

Project Reporting

There are three presentations to present your project

First presentation: talk about the data-set (approx. 3 slides/project)

Second presentation: talk about the planned approach (approx. 3slides/project)

Final presentation: overview of the project and the (evaluation)results (approx. 6 slides/project)

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 23 / 35

Page 24: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Projects

Project Reporting

The language of the presentation is English (slides, talk)

For teams of 2 people, the number of slides doubles

Prepare your presentation in advance

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 24 / 35

Page 25: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Projects

Project Discussion

If there are open questions in the projects

... use the Q&A section at the beginning and end of each lecture

(There is currently no discussion forum planned for the course)

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 25 / 35

Page 26: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Projects

Practical Aspects

Free to choose any programming language

Free (to an extend) in the choice of data-set

The code is yours (free to share it via an open-source license)

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 26 / 35

Page 27: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Projects

Project linked to KDD process

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 27 / 35

Page 28: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Projects

Project Overview

Text Mining (preprocessing)

Machine Learning (data mining)

Information Retrieval (selection)

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 28 / 35

Page 29: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Projects

Project #1

Text Mining

Transform a semi-structured data into a (more) structured data

Task: Identify different parts of a message (e.g. greeting)

... and identify parts that contain sentences (in contrast to tables,ascii art, ...)

Data set: E-Mail messages

... Apache mailing list archive, Enron

Advanced: Predict sender/receiver

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 29 / 35

Page 30: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Projects

Project #2

Machine Learning

Automatically assign tags (keywords) to items

Task (Option A): Supervised (e.g. classification)

Task (Option B): Unsupervised (e.g. clustering and cluster labelling)

Data set: Various tagging data sets

... e.g. Stack Exchange, Last.fm

Advanced: Compare option A with option B

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 30 / 35

Page 31: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Projects

Project #2 - Sample of the data - Stack Exchange

<row Id="1"

PostTypeId="1"

AcceptedAnswerId="17"

CreationDate="2010-07-26T19:14:18.907"

Score="6"

ViewCount="1070"

Body="&lt;p&gt;I’m using LyX all the time and over the last 2 years I’ve accumulated some very handy macros for my lecture notes. As it is today, every time I start a new document, I copy and paste the macros from one of my other documents. Is it possible, somehow, to automatically load macros for all files?&lt;/p&gt;&#xA;&#xA;&lt;p&gt;(I asked this on SO a while back and got no answers. I hope in this new home I’ll get a response).&lt;/p&gt;&#xA;"

OwnerUserId="5"

LastEditorUserId="510"

LastEditDate="2011-12-26T09:43:27.920"

LastActivityDate="2012-03-07T12:35:05.427"

Title="Automatically define macros in LyX documents"

Tags="&lt;macros&gt;&lt;lyx&gt;&lt;preamble&gt;"

AnswerCount="2"

FavoriteCount="2" />

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 31 / 35

Page 32: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Projects

Project #2 - Sample of the data - Last.fm

{

"artist": "Casual",

"timestamp": "2011-08-02 20:13:25.674526",

"similars": [["TRABACN128F425B784", 0.87173699999999998], ["TRIAINV12903CB4943", 0.751301], ... ]],

"tags": [["Bay Area", "100"], ["hieroglyiphics", "100"], ["classic", "50"], ["Hip-Hop", "50"], ["stream", "50"], ["OG", "50"], ["1979-2006: A Hip-Hop Odyssey - 800 Tracks In A 48 Minute Mix", "50"], ["heiroglyphics", "50"], ["oaksterdamn", "50"], ["heard on Pandora", "0"]],

"track_id": "TRAAAAW128F429D538",

"title": "I Didn’t Mean To"

}

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 32 / 35

Page 33: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Projects

Project #3

Information Retrieval

A user is writing a blog entry - while writing a list ofrecommendations is offered

Task: Create a list of Wikipedia articles that match a fragment oftext

Data set: Wikipedia (free to choose a language), Europeana

Advanced: Identify words/phrases in the text that directly have aWikipedia page (Wikification, link prediction)

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 33 / 35

Page 34: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Projects

Project Plan

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 34 / 35

Page 35: Course Organisation and Project Presentationkti.tugraz.at/staff/rkern/courses/kddm2/2014/intro.pdfCourse Organisation and Project Presentation Knowledge Discovery and Data Mining 2

Projects

The EndNext: Bayesian Inference

Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 35 / 35