data science: data visualization boot camp relationship ...information graphics: a comprehensive...

24
Data Science: Data Visualization Boot Camp Relationship Scatter Plot Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 24 January 2020 1/24

Upload: others

Post on 14-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

  • Data Science: Data Visualization Boot CampRelationshipScatter Plot

    Chuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhD

    24 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 2020

    1/24

  • 2/24

    Type Sample data Hands on Q & A Conclusion References Files

    Table of contents (1 of 1)

    1 TypeUsesGeneral considerations

    2 Sample data

    3 Hands on

    4 Q & A

    5 Conclusion6 References7 Files

  • 3/24

    Type Sample data Hands on Q & A Conclusion References Files

    A definition

    “Sometimes referred toas dot or symbol graph.Point graphs are a family ofgraphs that display quantita-tive information by means ofpoints represented by sym-bols such as dots, circles,or squares (called plot orplotting symbols). Whenthe points are connected bylines, the graph is generallyreferred to as a line graph.”

    R. L. Harris [2]

  • 4/24

    Type Sample data Hands on Q & A Conclusion References Files

    R supplied data set (1 of 2)

    Included in the R package ggplot2.

    “This dataset contains a subset of the fuel econ-omy data that the EPA makes available on . It contains only models which hada new release every year between 1999 and 2008 - this wasused as a proxy for the popularity of the car.”

    H. Wickham [3]

    library(ggplot2)

    ?mpg

    head(mpg)

    Resulting in:

  • 5/24

    Type Sample data Hands on Q & A Conclusion References Files

    R supplied data set (2 of 2)

    # A tibble: 6 x 11

    manufacturer model displ year cyl trans drv cty hwy fl class

    1 audi a4 1.8 1999 4 auto(l5) f 18 29 p comp

    2 audi a4 1.8 1999 4 manual(m5) f 21 29 p comp

    3 audi a4 2 2008 4 manual(m6) f 20 31 p comp

    4 audi a4 2 2008 4 auto(av) f 21 30 p comp

    5 audi a4 2.8 1999 6 auto(l5) f 16 26 p comp

    6 audi a4 2.8 1999 6 manual(m5) f 18 26 p comp

  • 6/24

    Type Sample data Hands on Q & A Conclusion References Files

    More recent mileage data

    Downloaded from:https://www.fueleconomy.gov/feg/download.shtml

    Described at: https://www.fueleconomy.gov/feg/ws/index.shtml#vehicle

    We will:

    1 Extract csv data from a zip file (39,865 rows)

    2 Select certain makes (attempt to replicate the sample data)

    3 Display different data for selected makes/models

    https://www.fueleconomy.gov/feg/download.shtmlhttps://www.fueleconomy.gov/feg/ws/index.shtml#vehiclehttps://www.fueleconomy.gov/feg/ws/index.shtml#vehicle

  • 7/24

    Type Sample data Hands on Q & A Conclusion References Files

    The first codes (1 of 3)

  • 8/24

    Type Sample data Hands on Q & A Conclusion References Files

    The first codes (2 of 3)

    rm(list=ls())

    library(ggplot2)

    data(mpg, package="ggplot2")

    mpg_select

  • 9/24

    Type Sample data Hands on Q & A Conclusion References Files

    The first codes (3 of 3)

    g + geom_point(aes(col=manufacturer))

    g + geom_jitter(aes(col=manufacturer))

    g + geom_jitter(aes(col=manufacturer)) +

    geom_smooth(aes(col=manufacturer), method="lm", se=FALSE)

  • 10/24

    Type Sample data Hands on Q & A Conclusion References Files

    The second codes (1 of 4)

  • 11/24

    Type Sample data Hands on Q & A Conclusion References Files

    The second codes (2 of 4)

    rm(list=ls())

    library(ggplot2)

    saveFileName

  • 12/24

    Type Sample data Hands on Q & A Conclusion References Files

    The second codes (3 of 4)

    labs(subtitle="mpg: Displacement vs City Mileage",

    title="Scatter chart",

    x="Engine displacement",

    y="City miles per gallon",

    color="Manufacturer")

    g + geom_point()

    g + geom_point(aes(col=make))

    g + geom_jitter(aes(col=make))

    g + geom_jitter(aes(col=make)) +

    geom_smooth(aes(col=make), method="lm", se=FALSE)

    g + geom_jitter(aes(col=make)) +

    geom_smooth(aes(col=make), method="lm", se=TRUE)

  • 13/24

    Type Sample data Hands on Q & A Conclusion References Files

    The second codes (4 of 4)

  • 14/24

    Type Sample data Hands on Q & A Conclusion References Files

    The third codes (1 of 3)

    Idea and data from[1].

  • 15/24

    Type Sample data Hands on Q & A Conclusion References Files

    The third codes (2 of 3)

    rm(list=ls())

    library(ggplot2)

    data

  • 16/24

    Type Sample data Hands on Q & A Conclusion References Files

    The third codes (3 of 3)

    labs(title="Assessed value vs. square footage",

    subtitle="Selected zip codes in the Seattle, Washington area",

    x="Square footage",

    y="Assessed value"

    )

    g + geom_point()

    g + geom_jitter()

    g + stat_binhex(colour="white") +

    scale_fill_gradient(low="white", high="black")

    g + geom_point(alpha = 0.1) + geom_density2d(colour="white")

  • 17/24

    Type Sample data Hands on Q & A Conclusion References Files

    The fourth codes (1 of 3)

    Idea and data from[1].

  • 18/24

    Type Sample data Hands on Q & A Conclusion References Files

    The fourth codes (2 of 3)

    rm(list=ls())

    library(ggplot2)

    data

  • 19/24

    Type Sample data Hands on Q & A Conclusion References Files

    The fourth codes (3 of 3)

    subtitle="Selected zip codes in the Seattle, Washington area",

    x="Square footage",

    y="Assessed value"

    )

    g + stat_binhex(colour="white") +

    scale_fill_gradient(low="white", high="blue") +

    facet_wrap("ZipCode")

  • 20/24

    Type Sample data Hands on Q & A Conclusion References Files

    Hands-on exercises

    1 The legend is off to the right in the margin. Move the legendinto the upper right area of the plot.

    2 Move the title to the center of the plot.

  • 21/24

    Type Sample data Hands on Q & A Conclusion References Files

    Q & A time.

    Q: Why was Stonehengeabandoned?A: It wasn’t IBM compatible.

  • 22/24

    Type Sample data Hands on Q & A Conclusion References Files

    What have we covered?

    Scatter plots are:

    Very quick and easy to puttogetherUsed to show 2, possibly relateddata setsEasily overwhelmed with toomany datapointsRelies on human ability to detectpatterns, or relationships in theplotted data

    Usually the first plot forexploratory data analysis.

    Next: Bubble charts (getting 3D out of 2D)

  • 23/24

    Type Sample data Hands on Q & A Conclusion References Files

    References (1 of 1)

    [1] Peter Bruce and Andrew Bruce,Practical Statistics for Data Scientists, O’Reilly Media, Inc.,2017.

    [2] Robert L. Harris,Information Graphics: A Comprehensive Illustrated Reference,Oxford University Press, 2000.

    [3] Hadley Wickham, ggplot2: Elegant Graphics for Data Analysis,Springer-Verlag New York, 2009.

  • 24/24

    Type Sample data Hands on Q & A Conclusion References Files

    Files of interest

    1 Code snippet to createimages in this presentation

    2 Extract Federal fuel data

    ## First codesrm(list=ls())

    library(ggplot2)data(mpg, package="ggplot2")

    mpg_select