python & spark msc. johannes härtel ptt18/19 …softlang/pttcourse/...(c) 2018, softlang team,...

61
(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes Härtel Msc. Marcel Heinz

Upload: others

Post on 25-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Python & Spark PTT18/19Prof. Dr. Ralf Lämmel

Msc. Johannes HärtelMsc. Marcel Heinz

Page 2: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

The ‘Big Picture’

[Aggarwal15]

Page 3: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Plenty of Building Blocks are involved in this ‘Big

Picture’

Page 4: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Back to the ‘Big Picture’

[Aggarwal15]

Page 5: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Foundations

Page 6: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Technologies and APIsThere are several technologies and APIs related to data-analysis in Python but the most convenient one is Pandas.

The following tutorial is inspired by the Book ‘Python for data Analysis’ [McKinney12].

Page 7: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What is contained in this CSV?Some imports and configuration needed to read and print a CSV with Pandas.

CSV File

Python

Jack Nicholson

(angry)

Page 8: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What is contained in this CSV?Reading and printing CSV data with Pandas.

Page 9: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What are the first 5 ratings in this CSV?Selecting a range of rows returns another Dataframe.

Page 10: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What is the title a rating refers to?Selecting one column returns a Series (╯°□°)╯︵ ┻━┻

Page 11: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What is the gender and the genre of a rating?Selecting columns by passing a list returns a Dataframe ┬──┬◡ノ(° -°ノ)

Page 12: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What are ratings of female persons?First we need a condition for filtering. Such condition can be stated as a Series of booleans.

Page 13: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What are ratings of female persons?We can use this condition as a selection mechanism for rows.

Page 14: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What is the amount of female and male ratings?Let’s try this!

Page 15: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What is the amount of female and male ratings?But we can also use dedicated Pandas functionality to create a Series that is indexed by the the distinct values.

Page 16: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What is the amount of female and male ratings?… and we can make python plot this.

Page 17: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What is the average rating given by a user?First we need to group the ratings of users. The following shows how to get all ratings of one user.

Page 18: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What is the average rating given by a user?After grouping we can select the rating column and take the mean for each group.

Page 19: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What is the average rating given by a user?We can also create a summarization in terms of a boxplot.

Page 20: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What is a gender’s average rating of a film?A pivot table species rows and columns and aggregates the values using a passed function.

Page 21: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What are the top female rated films?i) We filter out films below a rating count of 250 to concentrate on the important candidates. ii) We increase the max rows since this is serious data! iii) We sort by column ‘F’ containing the average female ratings.

Page 22: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What are the top female rated films?

Page 23: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What is the film with the biggest disagreement in female and male rating?We add a new column to the ‘film_mean_ratings’ Dataframe assigned to the difference between the female and male column.

Page 24: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What is the film with the biggest disagreement in female and male rating?

Page 25: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What is the movies with the most disagreement among all viewers?The standard deviation can be used to describe such disagreement in ratings.

Page 26: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

What is the movie with the most disagreement among all viewers?

Page 27: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Back to the ‘Big Picture’

[Aggarwal15]

Page 28: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Data

Page 29: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Data Integration (JSON)JSON data can be loaded from a file and accessed comparable to dictionaries.

JSONFile

Python

cf. [web_json]

Page 30: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Data Integration (SQL)An sqlite package provides, for instance, an in-memory database.

cf. [web_sql]

Page 31: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Data Integration (CSV)Some CSV data needs to be combined before being processed.

cf. [McKinney12]

Page 32: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Data Integration (CSV)Comparable to joining tables in SQL, Pandas can merge different Dataframes.

cf. [McKinney12]

Page 33: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Some Class Doing

Nothing

SomeClassDoingNothing

Feature Extraction (Java)The ‘right’ features need to be extracted from artifacts for further processing.

[AntoniolCCD00]

some class doing

nothing

Page 34: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Feature Extraction (Java)The ‘javalang’ package provides a parser for Java written in Python that can be installed from git.

[web_jl]

Page 35: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Feature Extraction (Java)The Java abstract syntax tree can be created from a file using ‘javalang’.

Java

Page 36: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Java

SomeClassDoingNothing

Feature Extraction (Java)Intuitively, the most relevant feature in this artifact is the classname.

Page 37: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Feature Extraction (Java)Camel-case is split and strings are made lower-case.

SomeClassDoingNothing

Some Class Doing

Nothing

some class doing

nothing

Page 38: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Back to the ‘Big Picture’

[Aggarwal15]

Page 39: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Analytical Processing

Page 40: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

ClassificationSupport vector machines are provided by the ‘scikit-learn’ package as a supervised machine learning technique doing classification.

cf. [scikit_cls]

[Aggarwal15]

Page 41: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

ClassificationSupport vector machines in Python Spark.

[spark]

Page 42: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

ClusteringThe ‘scipy’ package provides hierarchical clustering as a unsupervised machine learning technique used to group this two-dimensional data.

cf. [web_cluster]

Page 43: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

ClusteringHierarchical clustering outputs a linkage array that can be depicted as a dendrogram.

cf. [web_cluster]

Page 44: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

ClusteringK-means clustering in Python Spark.

[spark]

Page 45: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Back to the ‘Big Picture’

[Aggarwal15]

Page 46: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Output

Page 47: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Plot Types (Boxplot)Gives a summary of distribution of numeric variables.

Package:● Matplotlib● Seaborn

cf. [seaborn]

Page 48: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Plot Types (Line chart)Depicts the evolution of one or many columns.

Package:● Matplotlib

Page 49: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Plot Types (Bar chart)Depicts the ranking present in one column.

Package:● Matplotlib

Page 50: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Plot Types (Scatter plot)Depicts the correlation of two columns.

Package:● Matplotlib● Seaborn

Page 51: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Plot Types (Pie plot)Depicts the part-whole relation.

cf. [py_pie]

Package:● Matplotlib

Page 52: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Scaling and AxisThe table shows metrics on, e.g., the contributed code of Developers (column ‘DCon_PE_d’). While a few developers share very high contribution values most developer’s contributions is very low for one project.

Page 53: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Scaling and AxisAxis can have different scales to correctly depict the data.

Page 54: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Scaling and AxisSetting the axis on log does not work due to the 0 entries.

Page 55: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Scaling and AxisHowever, symlog works as it starts to scale linear under a given threshold.

Page 56: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

SubplotsSupplots can be used to group multiple plots that optionally share axis.

Page 57: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

SubplotsSome sample of subplots showing the relation between API usage and lines of code for individual APIs.

Page 58: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

SubplotsSome other sample of different kinds of subplots sharing axis.

Page 59: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

Back to the ‘Big Picture’

[Aggarwal15]

Page 60: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau

References● [Aggarwal15] Aggarwal, Charu C. “Data mining: the textbook”, Springer, 2015.● [McKinney12] Wes, McKinney. "Python for data analysis.", 2012.● [AntoniolCCD00] Antoniol, Giuliano, et al. "Information retrieval models for recovering traceability links between code and

documentation." icsm. IEEE, 2000.● [Haslwanter16] Haslwanter, Thomas. "An Introduction to Statistics with Python.", Springer, 2016.● [web_json] https://developer.rhino3d.com/guides/rhinopython/python-xml-json/● [web_sql] https://www.pythoncentral.io/introduction-to-sqlite-in-python/● [webGG] https://python-graph-gallery.com/● [web_jl] https://github.com/c2nes/javalang● [pandas_interpolate] https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.interpolate.html● [scikit_cls] http://scikit-learn.org/stable/modules/svm.html● [web_cluster] https://joernhees.de/blog/2015/08/26/scipy-hierarchical-clustering-and-dendrogram-tutorial/● [NL_reuters] https://github.com/fergiemcdowall/reuters-21578-json.git● [seborn] https://seaborn.pydata.org/● [py_pie] https://pythonspot.com/matplotlib-pie-chart/● [spark] https://spark.apache.org/docs/latest/● [spark_bp]

https://umbertogriffo.gitbooks.io/apache-spark-best-practices-and-tuning/content/avoiding_shuffle_less_stage,_more_fast.html

Page 61: Python & Spark Msc. Johannes Härtel PTT18/19 …softlang/pttcourse/...(C) 2018, SoftLang Team, University of Koblenz-Landau Python & Spark PTT18/19 Prof. Dr. Ralf Lämmel Msc. Johannes

(C) 2018, SoftLang Team, University of Koblenz-Landau