coursera data science classes -...

22
Coursera Data Science Discussion Earl F Glynn Principal Programmer/Analyst UMKC Center for Health Insights [email protected] 19 April 2014 1

Upload: letram

Post on 27-Mar-2018

221 views

Category:

Documents


3 download

TRANSCRIPT

Coursera Data Science Discussion Earl F Glynn

Principal Programmer/Analyst UMKC Center for Health Insights

[email protected]

19 April 2014

1

Outline

• Coursera

• Johns Hopkins University Data Science Specialization series

• First three classes in series

– Data Scientist’s Toolbox

– R Programming

– Getting and Cleaning Data

2

Online Classes

• www.coursera.org • Wide variety of classes from many universities • Many technical topics • Class materials online and free • Learn on your own schedule • Interact with others worldwide via class forums • Do as much or as little as you want • Video lectures, quizzes, peer assessment assignments,

programming assignments, exams • Can receive Statement of Accomplishment (PDF) • Free or paid “signature track”

3

Johns Hopkins University

Data Science Specialization

• www.coursera.org/specialization/jhudatascience/

• Nine classes in series taught by Professors Brian Caffo, Jeff Leek and Roger Peng

• Each class four weeks long

• Three classes start each month

• All nine classes to run concurrently by June

• All use R programming language

• Free or paid “signature track”

• Signature track adds a Capstone project

4

5

Course Dependencies https://d396qusza40orc.cloudfront.net/rprog/doc/JHDSS_CourseDependencies.pdf

6

• Short video about each class in series • Install R, R packages [Mac or PC] • Install RStudio, an IDE for R • Command line interface • Install Git software • Establish GitHub account • Work with software repositories • Basic markdown (.md files)

• 2 dozen videos • 3 quizzes • 1 peer assessment project

(GitHub submission)

7

8

9

Git / GitHub

10 Source: www.eqqon.com/index.php/Collaborative_Github_Workflow

GitHub Repository for Series

11

12

13

• Data Types • Reading/Writing Data • Control Structures • Functions • Scoping Rules • Subsetting a data.frame • Vectorized operations, including “apply” functions (apply,

lapply, tapply, mappy) • Debugging and R Profiler

• ~40 videos • 4 quizzes • 3 programming assignments • 1 peer assessment assignment

14

Programming Assignment 1 (week 2)

15

Programming Assignment 1 (week 2)

16

Programming Assignment 1 (week 2)

17

• Raw Data, Processed Data, “Tidy Data” • Downloading Files • Reading data: Excel, XML, JSON, MySQL, HDF5, HTML, APIs, fixed-

width fields, images, … • data.table Package • Subsetting, Sorting, Summarizing • Reshaping, Merging, Editing • Regular Expressions • Dates • Data Sources

• 2 dozen videos • 4 quizzes • 1 peer assessment project

18

20

Markdown File for GitHub Submission

21

Six more to go!

22