knime as a teaching tool in higher education · university of leicesterdata data analysis for...

24
KNIME as a Teaching Tool in Higher Education Giuseppe Di Fatta Associate Professor [email protected] The First KNIME User Day UK London, June 25, 2013 June 23, 2013

Upload: others

Post on 16-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

KNIME as a Teaching Tool in Higher Education

Giuseppe Di Fatta Associate Professor [email protected]

The First KNIME User Day UK London, June 25, 2013

June 23, 2013

Page 2: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta

Data Science

• Data Science is an interdisciplinary subject that draws know-how

and skills from a broad range of academic subject areas.

– Computer Science

– Statistics

– Specific application domains

2

• More specifically in CS:

– Advanced Computing paradigms, such

as Cloud Computing, Parallel and

Distributed Computing

– Data Mining and Knowledge Discovery

– Information retrieval and WWW

– Visualisation

And, most importantly, a data scientist

should have an attitude to curiosity and

discovery.

Page 3: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta

Press Coverage

3

There will be a shortage of talent necessary for

organizations to take advantage of big data. By 2018, the

United States alone could face a shortage of 140,000 to

190,000 people with deep analytical skills.

Page 4: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta 4

“A New Breed”: the “data scientist” is a high-

ranking professional with the training and curiosity to make discoveries in the world of big data.

Page 5: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta 5

There are 103,298 data scientists registered with Kaggle.

Page 6: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta 6

A data scientist is "a hybrid computer scientist/software

engineer/statistician“: demand/offer of specific HE programmes.

Page 7: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta

HEI MSc University of Reading Advanced Computer Science University of Bristol Advanced Computing Bournemouth University Applied Data Analytics University of St Andrews Applied Statistics and Datamining Royal Holloway, University of London Big Data Sheffield Hallam University Big Data Analytics University of Essex Big Data and Text Analytics University of Surrey Business Analytics Warwick Business School Business Analytics & Consulting University of Southampton Business Analytics and Management Sciences Birmingham City University Business Intelligence University of Westminster Business Intelligence and Analytics Brunel University Business Intelligence and Social Media De Montfort University Business Intelligence Systems and Data Mining University College London Computational Statistics and Machine Learning Robert Gordon University Computing: Information Engineering University of Leicester Data Analysis for Business Intelligence University of Warwick Data Analytics The University of Manchester Data and Knowledge Management - MSc ACS University of East London Data Mining and Knowledge Management University of Dundee Data Science Imperial College London Data Science & Management University of Greenwich Data Warehousing and Data Mining Sheffield Hallam University Database Professional The University of Edinburgh Informatics The University of Manchester Information Management University of East Anglia Knowledge Discovery and Datamining University of Kent Management Science (Business Analytics) Swansea University Modelling, Uncertainty and Data University of Liverpool MRes Advanced Science Sheffield Hallam University Web and Cloud Computing University College London Web Science and Big Data Analytics

Data Science Degree Offer

• Degrees related to Data

Science are on the rise.

• Mainly at PGT level.

• Offer in the UK (source:

http://datascience101.wordpress.

com/2012/04/09/colleges-with-

data-science-degrees/)

7

Page 8: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta

KNIME at Reading

• UG level - BSc Computer Science – P2 module Java Programming

• Eclipse/KNIME s Metaprogramming example of Java Reflection

– P3 module Data Mining Algorithms

• Coursework based on KNIME

– popular 1-year industrial placement

• Masters level

– MSc Advanced Computer Science (NEW)

– MRes Systems Engineering (Masters by Research)

Opportunity for industrial projects and/or short industrial

placement

• PhD in Computer Science

8

Page 9: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta

MRes Systems Engineering

• Masters by Research (MRes)

– The MRes degree focuses on

• a year-long research project (150 credits) and

• three taught modules (30 credits) that are relevant to the research

project.

– The project can be on any topic within the diverse research interests of

the School's academic staff, with opportunities to carry out part of the

project in an industrial or academic partner.

9

Page 10: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta 10

MSc Advanced Computer Science MSc ACS: Master of Science in Advanced Computer Science

– 1-year taught postgraduate degree, intended for students who have already studied CS or a closely related subject as their first degree.

– Teaching and learning methodologies: block-based lectures, assisted practical activities, seminars, self-directed research, student projects, student presentations.

– Content is based on the research strengths of the School of Systems Engineering

Forging the next-generation computer scientists/data scientists

For students who want to pursue a career in academia by continuing onto a PhD programme or in industrial R&D employment

For students looking for IT industry employment and own initiatives: it includes modules in Entrepreneurship, Social, Legal and Ethical aspects

http://www.reading.ac.uk/sse/pg-taught/sse-mscadvancedcomputerscience.aspx

Page 11: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta 11

Thames Valley Reading is at the heart of the Thames Valley, which is often referred to as the

‘Silicon Valley of Europe’. (http://www.thamesvalley.co.uk)

– Home to 10 of the top 50 global organisations in the world and 13 of the world’s top 30 billion dollar brands.

– a number of business sectors; a strong expertise in technologies and science.

ICT companies involved in our UG placement programme:

Page 12: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta 12

MSc ACS - Specifications

Start date: October 2013

Awarding Institution: University of Reading

Teaching Institution: University of Reading

QAA subject Benchmarking group(s): Computing

Faculty: Faculty of Science

Programme length: 1 years

Programme Director: Dr. Giuseppe Di Fatta ([email protected])

Programme Advisor: Dr. Hong Wei ([email protected])

Expected Accreditation: British Computer Society (BCS),

the Chartered Institute for IT (*)

(*) A request for accreditation has been submitted to BCS: the application can be completed only at the end

of the first intake and will be retrospectively valid.

Full-time 12 months

Part-time 24 months

Flexible modular up to 5 years

• Fee calculated on a pro rata basis per credit (+10% admin charge)

Page 13: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta 13

MSc ACS - Modules

Autumn Term

Spring Term

Summer Term SEMPR12 MSc Project 60 7 C

Code Module Title Credits Level Compulsory/

Optional

SE4BC12 Brain-Computer Interface 10 7 C

SE4TD12 Data Analytics and Mining 10 7 C

SE4RS11 Research Studies 10 7 C

SE4VR12 Virtual Reality 10 7 C

SEMIP12 Image Processing 10 7 O

SE4AS12 Advanced Research Studies 10 7 O

MMM038 Practice of Entrepreneurship 20 7 O

SE3SL11 Social, Legal and Ethical Aspects in Engineering 10 6 O

SE4BD13 Big Data Analytics 10 7 C

SE4CC12 Cloud Computing 10 7 C

SE4GP12 GPU Computing 10 7 C

SE4SI11 Swarm Intelligence & Artificial Life 10 7 C

SE4VI11 Visual Intelligence 10 7 C

SE4MD12 Manipulator Dynamics & Haptics 10 7 O

SE4MI12 Medical Image and Signal Processing 10 7 O

SE4NN12 Advanced Neural Networks 10 7 O

Page 14: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta

Advanced Computing Paradigms

Parallel and Distributed Computing

• GPU Computing – Desktop Supercomputing

• Cloud Computing – IaaS, PaaS, SaaS

– Big Data processing (MapReduce, Hadoop)

– Large-scale Systems Architectures

14

Page 15: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta

Data Analytics and Mining

4th scientific paradigm: data-intensive/data-driven scientific discovery

• Data-centric Computing – Predictive/Advanced Analytics, Data Mining

• Data Workflows – Data Access, Transformation, Analytics and Mining

– Visualisation and Exploitation

15

Page 16: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta

Big Data Analytics

• Big Data Analytics principles, techniques and challenges – Volume, Velocity, Variety and Veracity

– Parallel data mining techniques and tools

– Analysis of fast streaming real time data

– Recommender systems

– Unstructured data analysis

• Module “Data Analytics and Mining”: http://www.reading.ac.uk/modules/document.aspx?modP=SEMDM13&modYR=1314

• Module “Big Data Analytics”: http://www.reading.ac.uk/modules/document.aspx?modP=SEMBD13&modYR=1314

• Module “Cloud Computing”: http://www.reading.ac.uk/modules/document.aspx?modP=SEMCC13&modYR=1314

• Module “GPU Computing”: http://www.reading.ac.uk/modules/document.aspx?modP=SEMGP13&modYR=1314

16

Page 17: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta

Student Activities with KNIME • Some examples of activities carried out by students as part of

their studies:

– Developing new KNIME nodes for specific data mining algorithms

• Module coursework

• Final Year Project

– Developing a new KNIME node for a company during industrial placement

year (covered by NDA).

– Personal extra curriculum activities (e.g., Elance jobs)

17

Page 18: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta 18

K-Means Clustering Partitional Clustering approach

– Clusters are disjoint subsets of the input data

– Each cluster is associated with a centroid (centre of mass)

– Iterative greedy approach

• New KNIME nodes implemented by students (PhD, MSc, BSc):

brute force K-Means with rich initialisation options

optimised (faster) algorithm (deterministically equivalent to brute force K-Means)

KD-Tree K-Means

BSP-Tree K-Means

Spherical K-Means (for text data)

Page 19: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta 19

Multidimensional Scaling (MDS) Multidimensional Scaling (MDS) algorithms produce a lower dimensional

representation of high dimensional data such that the between-object

distances are preserved as much as possible. – Often used for visualisation of high dimensional data

• MDS implementations in KNIME:

“PCA” is the classical linear approach (aka classical MDS, Torgerson scaling)

KNIME node “MDS” is based on Sammon mapping (a non-linear approach)

• New KNIME node implemented by an UG student:

Landmark MDS (LMDS) is based on the Nyström approximation

• In classical MDS complexity of computing eigenvectors O(m3)

• LMDS uses a subset of q points, called ‘landmarks’ (q<<m)

Page 20: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta 20

Self-Organising Maps (SOM) Self-Organizing Maps (SOM) are a particular type of artificial neural

networks. Unsupervised training produces a low-dimensional (typically 2D, a map) representation of

the input data. Training is aimed at preserving the topological properties of the input

space.

• New KNIME node implemented by an UG student (FYP):

SOM with UMatrix representation

Page 21: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta 21

Segmented Linear Regression New KNIME node for Segmented Linear Regression: segments separated by

a breakpoint can be useful to quantify an abrupt change of a response

function.

Application: degradation of tyre grip performance during F1 races.

Page 22: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta

Freelance Work in Data Science – Example

• A freelance work carried out by a PhD student with KNIME – work offered on

– Input: a set of aerial photographs of a site, taken in 36 different spectral bands.

– Task: image segmentation to identify different environment and building areas.

– Solution: feature generation (PCA) + clustering using KNIME

22

PCA + x-means clustering

Page 23: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

G. Di Fatta 23

Conclusions

• A Data Scientist is "a hybrid computer scientist/software

engineer/statistician“. – University degrees need to integrate more practical activities to frontal lectures,

in general.

– In particular, Data Science requires lots of hand-on activities with real-world

data and case studies.

• Training in Data Science is an actual investment.

• KNIME provides a user-friendly interface and has demonstrated

to be an excellent tool to introduce Data Analytics concepts

and Data Mining algorithms

– It supports the integration of practical activities in University courses.

Page 24: KNIME as a Teaching Tool in Higher Education · University of Leicesterdata Data Analysis for Business Intelligence ... SE4MI12 Medical Image and Signal Processing 10 7 O SE4NN12

Questions?

The University of Reading is ranked among the top 1% of

the world's universities, according to the Times Higher

Education (THE) World University Rankings 2012.