hacking data visualisations

Post on 05-Jun-2015

387 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

A quick look at how/why we process data visualization, a brief history of visualizations, and an intro to R.

TRANSCRIPT

Hacking Data Visualisations

MELINDA SECKINGTON !@MSECKINGTON

@mseckington

Hacking data visualisations

@mseckington

Why?

https://www.flickr.com/photos/laurenmanning/6632168961/

https://www.flickr.com/photos/jamjar/5491205608

“I feel that everyday, all of us now are being blasted by information design. It's being poured into our eyes through the Web, and we're all visualizers now; we're all demanding a visual aspect to our information. There's something almost quite magical about visual information. It's effortless, it literally pours in. And if you're navigating a dense information jungle, coming across a beautiful graphic or a lovely data visualization, it's a relief, it's like coming across a clearing in the jungle.”

DAVID MCCANDLESS - THE BEAUTY OF DATA VISUALIZATION

@mseckington

Tor NorretrandersTHE BANDWIDTH OF OUR SENSES

@mseckington

A brief history of data visualisations

Theatrum Orbis Terrarum May 20, 1570

The first modern atlas, collected by Abraham Ortelis. !This was a first attempt to gather all maps that were known to man at the time and bind them together.

A BRIEF HISTORY OF DATA VISUALISATION

https://www.flickr.com/photos/smailtronic/2361594300

A BRIEF HISTORY OF DATA VISUALISATION

Bills of Mortality

From 1603, London parish clerks collected health-related population data in order to monitor plague deaths, publishing the London Bills of Mortality on a weekly basis. !John Graunt amalgamated 50 years of information from the bills, producing the first known tables of public health data.

BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN

A BRIEF HISTORY OF DATA VISUALISATION

1644: First known graph of statistical data !

MICHAEL VAN LANGREN - ESTIMATES OF DISTANCE IN LONGITUDE BETWEEN TOLEDO AND ROME

A BRIEF HISTORY OF DATA VISUALISATION

A BRIEF HISTORY OF DATA VISUALISATION

1786 first bar chart William Playfair

Exports and imports of Scotland to and from different parts for one Year from Christmas 1780 to Christmas 1781

A BRIEF HISTORY OF DATA VISUALISATION

Street map of cholera deaths in Soho 1853 John Snow

Snow's 'ghost map' shows deaths from cholera around Broad Street between 19 August and 30 September 1854. Snow simplified the street layout, highlighting the 13 water pumps serving the area and representing each death as a black bar. His map demonstrates how cholera was spreading, not by a 'miasma' rising from the Thames, but in water contaminated by human waste

BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN

A BRIEF HISTORY OF DATA VISUALISATION

Diagram of the Causes of Mortality in the Army in the East !1858 Florence Nightingale

In her seminal ‘rose diagram’, Nightingale demonstrated that far more soldiers died from preventable epidemic diseases (blue) than from wounds inflicted on the battlefield (red) or other causes (black) during the Crimean War (1853-56)

BEAUTIFUL SCIENCE AT THE BRITISH LIBRARY - THE GUARDIAN

How?

HOW?

https://www.flickr.com/photos/jdhancock/8031897271

https://www.flickr.com/photos/laurenmanning/5658951917/

HOW?

@mseckington

HOW?

@mseckington

HOW?

@mseckington

HOW?

@mseckington

HOW?

@mseckington

A quick intro to R

A QUICK INTRO TO R

What is R? !

@mseckington

A QUICK INTRO TO R

What is R? !R is a free programming language and environment for statistical computing and graphics. !

@mseckington

A QUICK INTRO TO R

What is R? !R is a free programming language and environment for statistical computing and graphics. !Created by statisticians for statisticians.

@mseckington

A QUICK INTRO TO R

What is R? !R is a free programming language and environment for statistical computing and graphics. !Created by statisticians for statisticians. !Comes with a lot of facilities for data manipulation, calculation, data analysis and graphical display.

@mseckington

A QUICK INTRO TO R

What is R? !R is a free programming language and environment for statistical computing and graphics. !Created by statisticians for statisticians. !Comes with a lot of facilities for data manipulation, calculation, data analysis and graphical display. !Highly and easily extensible.

@mseckington

A QUICK INTRO TO R

!> data()!!list all datasets available !

@mseckington

!> data()!!list all datasets available !> movies = data(movies)!> movies <- data(movies)!!assign movies data to movies variable !

@mseckington

!> data()!!list all datasets available !> movies = data(movies)!> movies <- data(movies)!!assign movies data to movies variable !> dim(movies)![1] 58788! 24!!

@mseckington

!> data()!!list all datasets available !> movies = data(movies)!> movies <- data(movies)!!assign movies data to movies variable !> dim(movies)![1] 58788! 24!!> names(movies)![1] "title" “year" “length" “budget" "rating" “votes" ![7] “r1" “r2" “r3" “r4" “r5" “r6"![13] “r7" “r8" “r9" “r10" “mpaa" “Action" ![19] “Animation" "Comedy" “Drama" “Documentary" “Romance”"Short"!

@mseckington

!> movies[7079,]! !!! title ! ! ! ! ! year ! length budget rating votes !7079 Bourne Identity, The 2002 !119!! 75000000 7.3 ! 29871 !!r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa !4.5 4.5 4.5 4.5 4.5 14.5 24.5 34.5 14.5 4.5 PG-13!!Action Animation Comedy Drama Documentary Romance Short! 1 0 0 1 0 0 0!!returns 1 row => all the data for 1 movies !

@mseckington

!> movies[7079,]! !!! title ! ! ! ! ! year ! length budget rating votes !7079 Bourne Identity, The 2002 !119!! 75000000 7.3 ! 29871 !!r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 mpaa !4.5 4.5 4.5 4.5 4.5 14.5 24.5 34.5 14.5 4.5 PG-13!!Action Animation Comedy Drama Documentary Romance Short! 1 0 0 1 0 0 0!!returns 1 row => all the data for 1 movies !> movies[1:10,]!. . . !!returns rows 1 to 10

@mseckington

!> movies[,1]!. . .!!returns 1 column => titles of all movies

@mseckington

!> movies[,1]!. . .!!returns 1 column => titles of all movies !> movies$title!. . .!!same as movies[,1]!returns column with the label ‘title !

@mseckington

!> movies[,1]!. . .!!returns 1 column => titles of all movies !> movies$title!. . .!!same as movies[,1]!returns column with the label ‘title !> movies[,1:10]!. . .!!returns columns 1 to 10

@mseckington

!> hist(movies$year)

@mseckington

!> hist(movies$year)

Histogram of movies$year

movies$yearFrequency

1900 1920 1940 1960 1980 2000

02000

4000

6000

8000

@mseckington

!> hist(movies$year)!!> hist(movies$rating)

@mseckington

!> hist(movies$year)!!> hist(movies$rating)

Histogram of movies$rating

movies$ratingFrequency

2 4 6 8 10

02000

4000

6000

8000

@mseckington

!> hist(movies$year)!!> hist(movies$rating)!!> library(ggplot2)

@mseckington

!> hist(movies$year)!!> hist(movies$rating)!!> library(ggplot2)!!> qplot(rating, !! !!! data=movies, !!! geom="histogram")

@mseckington

!> hist(movies$year)!!> hist(movies$rating)!!> library(ggplot2)!!> qplot(rating, !! !!! data=movies, !!! geom=“histogram")!!> qplot(rating, !!!! data=movies, !!! geom="histogram", !! binwidth=1)

@mseckington

!> m = ggplot(movies, aes(rating))!!> m + geom_histogram()

@mseckington

!> m = ggplot(movies, aes(rating))!!> m + geom_histogram()!!> m + geom_histogram(!! ! ! aes(fill = ..count..))

@mseckington

!> m = ggplot(movies, aes(rating))!!> m + geom_histogram()!!> m + geom_histogram(!! ! ! aes(fill = ..count..))!!> m + geom_histogram(!! ! ! colour = "darkgreen", !! ! ! fill = "white", !! ! ! binwidth = 0.5)!!

@mseckington

!> m = ggplot(movies, aes(rating))!!> m + geom_histogram()!!> m + geom_histogram(!! ! ! aes(fill = ..count..))!!> m + geom_histogram(!! ! ! colour = "darkgreen", !! ! ! fill = "white", !! ! ! binwidth = 0.5)!!> x = m + geom_histogram(!! ! ! ! binwidth = 0.5)!> x + facet_grid(Action ~ Comedy)!

@mseckington

!> library(twitteR)!!> setup_twitter_oauth(!! ! "API key”, "API secret", "Access token", "Access secret”)!!

@mseckington

FUTURELEARN STATS

!> fl = read.csv(!! ! "futurelearn_dataset.csv", ! ! header=TRUE)!!

@mseckington

!> fl = read.csv(!! ! "futurelearn_dataset.csv", ! ! header=TRUE)!!> source_table = table(fl$age)!> pie(source_table)

@mseckington

!> fl = read.csv(!! ! "futurelearn_dataset.csv", ! ! header=TRUE)!!> source_table = table(fl$age)!> pie(source_table)!!> pie(source_table, !! ! radius=0.6, !! ! col=rainbow(8))

@mseckington

!> library(twitteR)!!> setup_twitter_oauth(!! ! "API key”, "API secret", "Access token", "Access secret”)!!> tweets <- searchTwitter('futurelearn', n=100)

@mseckington

!> library(twitteR)!!> setup_twitter_oauth(!! ! "API key”, "API secret", "Access token", "Access secret”)!!> tweets <- searchTwitter('futurelearn', n=100)!!> library(“tm”)!!> tweet_text <- sapply(tweets, function(x) x$getText())!> tweet_corpus <- Corpus(VectorSource(tweet_text))!!

@mseckington

!> library(twitteR)!!> setup_twitter_oauth(!! ! "API key”, "API secret", "Access token", "Access secret”)!!> tweets <- searchTwitter('futurelearn', n=100)!!> library(“tm”)!!> tweet_text <- sapply(tweets, function(x) x$getText())!> tweet_corpus <- Corpus(VectorSource(tweet_text))!!> tweet_corpus <- tm_map(tweet_corpus, !!! ! ! ! ! ! ! ! ! content_transformer(tolower))!> tweet_corpus <- tm_map(tweet_corpus, removePunctuation)!> tweet_corpus <- tm_map(tweet_corpus, !! !! ! ! ! ! ! ! ! function(x)removeWords(x,stopwords()))

!> library(wordcloud)!!> wordcloud(tweet_corpus)

@mseckington

!> library(wordcloud)!!> wordcloud(tweet_corpus)

@mseckington

What next?

A QUICK INTRO TO R

A QUICK INTRO TO R

WHAT NEXT?

@mseckington

https://www.flickr.com/photos/jamjar/5491205608

@mseckington

Recap

Data visualisations are awesome

@mseckington

R is awesome

@mseckington

Any questions? !

@mseckington

top related