course organisation and project...
TRANSCRIPT
Course Organisation and Project PresentationKnowledge Discovery and Data Mining 2 (VU) (707.004)
Roman Kern
KTI, TU Graz
2014-03-05
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 1 / 35
Overall Goal
Bring the theoretical knowledge acquired in KDDM1 into practicalapplication,
... or, what it is like to be a data scientist?
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 2 / 35
Lecturer
Name: Roman KernOffice: IWT & Know-Center,
Inffeldgasse 13, 6th Floor, Room 072Office hours: By appointment
Phone: +43-316/873-30860E-Mail: [email protected]
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 3 / 35
Lecturer
Name: Denis HelicOffice: IWT, Inffeldgasse 13, 5th Floor, Room 070
Office hours: Tuesday from 12 til 13Phone: +43-316/873-30610email: [email protected]
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 4 / 35
Language
Lectures in English
Communication in German/English
If in German: please informally (Du)!
Student presentations in English
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 5 / 35
Outline
1 Welcome and Introduction
2 Course Organization
3 Motivation
4 Projects
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 6 / 35
Welcome and Introduction
Teaching @ KTI
Introduction to KnowledgeTechnologies
Databases
Semantic TechnologiesWeb Science and Web
Technology
Multimedia InformationSystems I
Knowledge Discovery andData Mining I
Multimedia InformationSystems II
Knowledge Discovery andData Mining II
Network Science
Structured Data Unstructured Data Data Analysis
(Relational data)
(Ontologies)
(Web systems)
(Web data)
(Networks and analysis)
(Theory and basics)
(Applications)
(Visualizations)
+ Projects, Bachelor Thesis, Master Projects, Master Thesis, PhD Thesis
Applications
Evaluation Methodology
(User studies)
Sensors & User Models
(Sensor data)
Science 2.0
(Science and Social Media)
B
B
M M
MBM
M M
MMM
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 7 / 35
Course Organization
Course OrganisationWhen & What
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 8 / 35
Course Organization
Course Calendar
The course will take place
... Wednesday, 12:15
... in HS Modul
Note: Please have an eye on the calendar
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 9 / 35
Course Organization
Course Calendar
05.03.2014: Course organization & Project Presentation
12.03.2014: Bayesian Inference - LDA
19.03.2013: Information Retrieval and Applications
26.03.2014: Preprocessing Plattform & Practical Applications
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 10 / 35
Course Organization
Course Calendar
02.04.2014: Presentation: Data Set & Scope
09.04.2014: Sampling Methods
07.05.2014: Presentation: Approach & Algorithms
14.05.2014: Non-Parametric Methods
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 11 / 35
Course Organization
Course Calendar
21.05.2014: Pattern Mining
04.06.2014: Ensemble Methods
11.06.2014: Advanced Clustering & Classification Algorithms
18.06.2014: Presentation: Project Reports
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 12 / 35
Course Organization
Course Logistics
Course website:http://kti.tugraz.at/staff/rkern/courses/kddm2
Slides will be made available on the course website
Description of the practical projects and access to data sets
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 13 / 35
Course Organization
Grading
There is no written exam
Therefore grading is based on the practical projects:
... soundness of the approach
... the outcome of the projects
... the presentation of the results
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 14 / 35
Course Organization
Lecture
Basic structure of a typical lecture
Open questions (e.g. from the projects, past lectures)
Break of 5 minutes in the middle
Discussions at the end of the lecture
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 15 / 35
Motivation
MotivationWhy should one be interested in KDDM2?
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 16 / 35
Motivation
Demand
Job as data scientist
“Data Scientist: The Sexiest Job of the 21st Century”
http://hbr.org/2012/10/
data-scientist-the-sexiest-job-of-the-21st-century/
“Start with the fact that there are no university programs offering degrees in data science”
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 17 / 35
Motivation
Definition
What is a data scientist?
Data scientists are inquisitive: exploring, asking questions, doing “what if”analysis, questioning existing assumptions and processes. Armed with dataand analytical results, a top-tier data scientist will then communicateinformed conclusions and recommendations across an organization’sleadership structure.
www-01.ibm.com/software/data/infosphere/data-scientist/
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 18 / 35
Motivation
Technologies
Play with cool technologies
... in an hands-on approach
Discussion & feedback
Reports from the field
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 19 / 35
Projects
ProjectsPractical part of the course
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 20 / 35
Projects
Overview Projects
There are three practical projects
... from various stages of the KDD process
Group of single students
... or groups of two people
⇒ with bigger scope
The focus is more on the approach, rather than the final results
... but the results should be assessed (evaluated)
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 21 / 35
Projects
Overview Projects
Important: Please report groups/project before the first presentation
... by sending an e-mail to [email protected]
Please add a [KDDM2] to the mail subject
Students without project assignment will be unregistered
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 22 / 35
Projects
Project Reporting
There are three presentations to present your project
First presentation: talk about the data-set (approx. 3 slides/project)
Second presentation: talk about the planned approach (approx. 3slides/project)
Final presentation: overview of the project and the (evaluation)results (approx. 6 slides/project)
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 23 / 35
Projects
Project Reporting
The language of the presentation is English (slides, talk)
For teams of 2 people, the number of slides doubles
Prepare your presentation in advance
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 24 / 35
Projects
Project Discussion
If there are open questions in the projects
... use the Q&A section at the beginning and end of each lecture
(There is currently no discussion forum planned for the course)
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 25 / 35
Projects
Practical Aspects
Free to choose any programming language
Free (to an extend) in the choice of data-set
The code is yours (free to share it via an open-source license)
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 26 / 35
Projects
Project linked to KDD process
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 27 / 35
Projects
Project Overview
Text Mining (preprocessing)
Machine Learning (data mining)
Information Retrieval (selection)
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 28 / 35
Projects
Project #1
Text Mining
Transform a semi-structured data into a (more) structured data
Task: Identify different parts of a message (e.g. greeting)
... and identify parts that contain sentences (in contrast to tables,ascii art, ...)
Data set: E-Mail messages
... Apache mailing list archive, Enron
Advanced: Predict sender/receiver
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 29 / 35
Projects
Project #2
Machine Learning
Automatically assign tags (keywords) to items
Task (Option A): Supervised (e.g. classification)
Task (Option B): Unsupervised (e.g. clustering and cluster labelling)
Data set: Various tagging data sets
... e.g. Stack Exchange, Last.fm
Advanced: Compare option A with option B
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 30 / 35
Projects
Project #2 - Sample of the data - Stack Exchange
<row Id="1"
PostTypeId="1"
AcceptedAnswerId="17"
CreationDate="2010-07-26T19:14:18.907"
Score="6"
ViewCount="1070"
Body="<p>I’m using LyX all the time and over the last 2 years I’ve accumulated some very handy macros for my lecture notes. As it is today, every time I start a new document, I copy and paste the macros from one of my other documents. Is it possible, somehow, to automatically load macros for all files?</p>

<p>(I asked this on SO a while back and got no answers. I hope in this new home I’ll get a response).</p>
"
OwnerUserId="5"
LastEditorUserId="510"
LastEditDate="2011-12-26T09:43:27.920"
LastActivityDate="2012-03-07T12:35:05.427"
Title="Automatically define macros in LyX documents"
Tags="<macros><lyx><preamble>"
AnswerCount="2"
FavoriteCount="2" />
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 31 / 35
Projects
Project #2 - Sample of the data - Last.fm
{
"artist": "Casual",
"timestamp": "2011-08-02 20:13:25.674526",
"similars": [["TRABACN128F425B784", 0.87173699999999998], ["TRIAINV12903CB4943", 0.751301], ... ]],
"tags": [["Bay Area", "100"], ["hieroglyiphics", "100"], ["classic", "50"], ["Hip-Hop", "50"], ["stream", "50"], ["OG", "50"], ["1979-2006: A Hip-Hop Odyssey - 800 Tracks In A 48 Minute Mix", "50"], ["heiroglyphics", "50"], ["oaksterdamn", "50"], ["heard on Pandora", "0"]],
"track_id": "TRAAAAW128F429D538",
"title": "I Didn’t Mean To"
}
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 32 / 35
Projects
Project #3
Information Retrieval
A user is writing a blog entry - while writing a list ofrecommendations is offered
Task: Create a list of Wikipedia articles that match a fragment oftext
Data set: Wikipedia (free to choose a language), Europeana
Advanced: Identify words/phrases in the text that directly have aWikipedia page (Wikification, link prediction)
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 33 / 35
Projects
Project Plan
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 34 / 35
Projects
The EndNext: Bayesian Inference
Roman Kern (KTI, TU Graz) Course Organisation and Project Presentation 2014-03-05 35 / 35