stanford university :// · serving data, exploratory data analytics, advanced topics on big data...

26
Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org

Upload: others

Post on 30-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

Thanks to Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University

http://www.mmds.org

Page 2: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

What is the purpose of big data systems?

To support analysis and knowledge discovery from very

large amounts of data

Page 3: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each
Page 4: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

Data contains value and knowledge

Page 5: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

But to extract the knowledge data needs to be

Stored emphasis on this class

Managed emphasis on this class

Analyzed emphasis on this class

Visualized

Data Analytics ≈ Data Mining ≈ Big Data ≈ Predictive Analytics ≈ Data Science

Page 6: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each
Page 7: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

Given lots of data Discover patterns and models that are:

Valid: hold on new data with some certainty

Useful: should be possible to act on the item

Unexpected: non-obvious to the system

Understandable: humans should be able to interpret the pattern

Page 8: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

Scalability

Streaming

Context

Quality

Usage

Page 9: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

This class stresses more on

Storage Systems

Distributed Computing Platforms

Algorithms, Scalability

Automation for handling large data

Machine

Learning

Visualization

Database Systems

Data Mining

Page 10: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

We will learn to process different types of data: Data is high dimensional

Data is a graph

Data is infinite/never-ending

Data is labeled We will learn to use different models of

computation: Distributed (MapReduce)

Streams and online algorithms

Single machine in-memory

Page 11: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

Hands-on experience working with systems and tools for storing and processing big data:

MapReduce/Hadoop

Hive/BigQuery

Apache Spark

OpenRefine

Page 12: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

How do you want that data?

Page 13: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each
Page 14: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

Website http://www.eecs.yorku.ca/~papaggel/courses/eecs4415/

Piazza Q&A website: Available from the website

http://piazza.com/yorku.ca/fall2018/eecs4415

You need to register with your yorku.ca email

Please participate and help each other!

e-mail for personal issues: [email protected]

Page 15: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

Course Prerequisites EECS-3421: Introduction to Database Systems

EECS-3101: Design and Analysis of Algorithms

General prerequisites No single topic in the course is too hard by itself But we will cover and touch upon many topics

and this is what makes the course hard Good background in: Database Systems

Algorithms

Programming: You should be able to write non-trivial programs (in Python)

Page 16: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

Component I Data-driven Organizations, Data Ingestions, Data Quality, Data Lakes, Data Cleaning

Component IIComputing Platforms, Storage Systems, Distributed Processing Systems (for general-purpose batch data, structured data, graph data, streaming data), Data processing methods (Aggregation, grouping, filtering)

Component IIIServing data, Exploratory Data Analytics, Advanced Topics on Big Data Mining

Page 17: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

Work Weight Comment

Weakly Readings 10% 1% each

3 Assignments 30% 10% each

Team Project(team project + source +

report)

30%

proposal: 15%

Project milestone: 25%

Class presentation: 20%

Final report: 40%

Final Exam 30%Final exam grade must be

> 40%

Page 18: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

You need to:identify a problemfind datadesign a big data architectureprepare data for analysis process datauncover insightscommunicate critical findingscreate a data-driven solution

+ team-work

Page 19: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

Need for data collectionNeed for data storageNeed for data analysisNeed for data visualization (optionally)

Collection Storage Analysis Visualization

…but, more of an iterative process than a sequence

Page 20: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

www.kaggle.com

Page 21: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

Text Data Multivariate DataNetwork Data

Page 22: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

Big Data Systems

Visualization Tools

Distributed Systems

Data Mining &

ML

Exploratory Data Analysis

Databases

Page 23: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

Current interest in Data ScienceYou are interested in the general area of data science

Interest in Big Data TechnologiesYou are interested in big data systems and engineering

Interest in Big Data AnalyticsYou are interested in finding interesting patterns and insights in large amounts of data

Page 24: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

Data Analytics

+ tools for data analytics

Page 25: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

Item Comment

Classes Tue @ 19:00-22:00

Classroom SC 216 (Stong College)

Credits 3

Websitehttp://www.eecs.yorku.ca/~papaggel/c

ourses/eecs4415/

Office hourDrop anytime by my office (LAS3050)

or by appointment

Page 26: Stanford University :// · Serving data, Exploratory Data Analytics, Advanced Topics on Big Data Mining. Work Weight Comment Weakly Readings 10% 1% each 3 Assignments 30% 10% each

Welcome!Contact:

Manos Papagelis, LAS 3050

[email protected]/~papaggel/