data science: data visualization boot camp composition ...age of reference person, total money...

26
Data Science: Data Visualization Boot Camp Composition Stacked Area Plot Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD Chuck Cartledge, PhD 25 January 2020 25 January 2020 25 January 2020 25 January 2020 25 January 2020 25 January 2020 25 January 2020 25 January 2020 25 January 2020 25 January 2020 25 January 2020 25 January 2020 25 January 2020 25 January 2020 25 January 2020 25 January 2020 25 January 2020 25 January 2020 25 January 2020 25 January 2020 25 January 2020 1/26

Upload: others

Post on 14-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

  • Data Science: Data Visualization Boot CampComposition

    Stacked Area Plot

    Chuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhD

    25 January 202025 January 202025 January 202025 January 202025 January 202025 January 202025 January 202025 January 202025 January 202025 January 202025 January 202025 January 202025 January 202025 January 202025 January 202025 January 202025 January 202025 January 202025 January 202025 January 202025 January 2020

    1/26

  • 2/26

    Type Sample data Hands on Q & A Conclusion References Files

    Table of contents (1 of 1)

    1 TypeUsesGeneral considerations

    2 Sample data

    3 Hands on

    4 Q & A

    5 Conclusion6 References7 Files

  • 3/26

    Type Sample data Hands on Q & A Conclusion References Files

    A definition

    “In stacked area graphs – probably themost widely used variation of area graphs –each data series is added to the one belowit. Only the bottom data series touches thehorizontal axis. The top of the upper dataseries represents the total of all the data se-ries plotted. Stacked graphs are sometimesreferred to as multi-strata, stratum, strata,divided area, layer, subdivided area, or sub-divided surface graphs. . . . each data seriesis shown as a percent of the whole”

    R. L. Harris [1]

  • 4/26

    Type Sample data Hands on Q & A Conclusion References Files

    Realistic census finacial data

    The Current Population Survey isa joint effort between the Bureauof Labor Statistics and theCensus Bureau.Family Income: FINC-02 is thesurvey data in tabular formatbased on:

    Age of Reference Person,

    Total Money Income,

    Type of Family,

    Race and Hispanic Origin

    of Reference Person.

    Data downloaded fromhttps://www.census.gov/data/

    tables/time-series/demo/income-

    poverty/cps-finc/finc-02.html

    A download and extraction scriptis included (see Files frame).

    https://www.census.gov/data/tables/time-series/demo/income-poverty/cps-finc/finc-02.htmlhttps://www.census.gov/data/tables/time-series/demo/income-poverty/cps-finc/finc-02.htmlhttps://www.census.gov/data/tables/time-series/demo/income-poverty/cps-finc/finc-02.html

  • 5/26

    Type Sample data Hands on Q & A Conclusion References Files

    Same image.

    Data downloaded fromhttps://www.census.gov/data/tables/time-series/demo/income-

    poverty/cps-finc/finc-02.html

    A download and extraction script is included (see Files frame).

    https://www.census.gov/data/tables/time-series/demo/income-poverty/cps-finc/finc-02.htmlhttps://www.census.gov/data/tables/time-series/demo/income-poverty/cps-finc/finc-02.html

  • 6/26

    Type Sample data Hands on Q & A Conclusion References Files

    Behind the scenes.

    Description: FINC-02. Age of Reference Person, by Total MoneyIncome, Type of Family, Race and Hispanic Origin of ReferencePerson.Downloaded from: https://www.census.gov/data/tables/time-series/demo/income-poverty/cps-finc/finc-02.html

    We will:

    1 Download selected “Primary Families” data

    2 Extract income range, and count from the data files

    3 Consolidate the extracted data into one R data frame

    4 Save the extracted data to a local file

    5 Build histograms and density plots from the consolidated data

    Everything except building the displays has been done.

    https://www.census.gov/data/tables/time-series/demo/income-poverty/cps-finc/finc-02.htmlhttps://www.census.gov/data/tables/time-series/demo/income-poverty/cps-finc/finc-02.html

  • 7/26

    Type Sample data Hands on Q & A Conclusion References Files

    Realistic data

    The city of Virginia Beach makesmost of their Police Incidentreports available for download.“This dataset includes information about incidents where

    the police department responds to an offense and a

    report of crime is generated. This dataset excludes

    incidents assigned to 14 of the 152 Incident Based

    Reporting Codes (IBR). The specific IBR codes excluded

    are: ’runaway’, ’death investigation’, ’death, accidental’,

    ’death, drowning’, ’death, suicide’, ’death, auto fatality’,

    ’attempted suicide’, ’officer involved shooting, death’,

    ’officer involved shooting, no death’, ’missing person’,

    ’lost property’, ’habitual offender’, ’other non-reportable

    offenses’, and ’SVU information only’.”

    https://data.vbgov.com/

    Public-Safety/Police-

    Incident-Reports/iqkq-gr5p

    https://data.vbgov.com/Public-Safety/Police-Incident-Reports/iqkq-gr5phttps://data.vbgov.com/Public-Safety/Police-Incident-Reports/iqkq-gr5phttps://data.vbgov.com/Public-Safety/Police-Incident-Reports/iqkq-gr5p

  • 8/26

    Type Sample data Hands on Q & A Conclusion References Files

    Same image.

    https://data.vbgov.com/Public-Safety/Police-Incident-

    Reports/iqkq-gr5p

    https://data.vbgov.com/Public-Safety/Police-Incident-Reports/iqkq-gr5phttps://data.vbgov.com/Public-Safety/Police-Incident-Reports/iqkq-gr5p

  • 9/26

    Type Sample data Hands on Q & A Conclusion References Files

    The first codes. (1 of 4)

  • 10/26

    Type Sample data Hands on Q & A Conclusion References Files

    The first codes. (2 of 4)

    rm(list=ls())

    library(ggplot2)

    saveFileName

  • 11/26

    Type Sample data Hands on Q & A Conclusion References Files

    The first codes. (3 of 4)

    " respondents",

    collapse=" "),

    x="Reported income",

    y="Count",

    fill="Age bands",

    caption=paste0("Using data from:https://www.census.gov/data/",

    "tables/time-series/demo/income-poverty/",

    "cps-finc/finc-02.html")

    ) +

    theme(plot.title=element_text(hjust = 0.5)) +

    theme(plot.title=element_text(colour = "blue")) +

    theme(plot.subtitle=element_text(hjust = 0.5)) +

    theme(plot.subtitle=element_text(colour = "black")) +

    theme(plot.caption=element_text(hjust = 0.0)) +

    theme(plot.caption=element_text(colour = "red")) +

    theme(legend.title.align=0.5) +

    theme(panel.border = element_rect(colour = "black", fill=NA)) +

    scale_x_continuous(breaks=myBreaks,

  • 12/26

    Type Sample data Hands on Q & A Conclusion References Files

    The first codes. (4 of 4)

    label=gsub(" ", "", format(myBreaks,

    big.mark=","))

    )

    g + geom_area(position="fill")

    yAxisTickMarks

  • 13/26

    Type Sample data Hands on Q & A Conclusion References Files

    The second codes. (1 of 9)

  • 14/26

    Type Sample data Hands on Q & A Conclusion References Files

    The second codes. (2 of 9)

    rm(list=ls())

    library(ggplot2)

    saveFileName

  • 15/26

    Type Sample data Hands on Q & A Conclusion References Files

    The second codes. (3 of 9)

    x="Months",

    y="Percentage",

    fill="Incident Type",

    caption=paste0("Using data from:https://data.vbgov.com/",

    "Public-Safety/Police-Incident-Reports/",

    "iqkq-gr5p")

    ) +

    theme(plot.title=element_text(hjust = 0.5)) +

    theme(plot.title=element_text(colour = "blue")) +

    theme(plot.subtitle=element_text(hjust = 0.5)) +

    theme(plot.subtitle=element_text(colour = "black")) +

    theme(plot.caption=element_text(hjust = 0.0)) +

    theme(plot.caption=element_text(colour = "red")) +

    theme(legend.title.align=0.5) +

    theme(panel.border = element_rect(colour = "black", fill=NA)) +

    scale_x_continuous(breaks=myBreaks,

    label=months[myBreaks]

    )

  • 16/26

    Type Sample data Hands on Q & A Conclusion References Files

    The second codes. (4 of 9)

    g + geom_area(position="fill")

    dataCondensed

  • 17/26

    Type Sample data Hands on Q & A Conclusion References Files

    The second codes. (5 of 9)

    dataCondensed$count[i]

    }

    else

    {

    dataCondensed$counts[i]

  • 18/26

    Type Sample data Hands on Q & A Conclusion References Files

    The second codes. (6 of 9)

    changes

  • 19/26

    Type Sample data Hands on Q & A Conclusion References Files

    The second codes. (7 of 9)

    y="Count",

    fill="General\nincident type",

    caption=paste0("Using data from:https://data.vbgov.com/",

    "Public-Safety/Police-Incident-Reports/",

    "iqkq-gr5p")

    ) +

    theme(plot.title=element_text(hjust = 0.5)) +

    theme(plot.title=element_text(colour = "blue")) +

    theme(plot.subtitle=element_text(hjust = 0.5)) +

    theme(plot.subtitle=element_text(colour = "black")) +

    theme(plot.caption=element_text(hjust = 0.0)) +

    theme(plot.caption=element_text(colour = "red")) +

    theme(legend.title.align=0.5) +

    theme(panel.border = element_rect(colour = "black", fill=NA)) +

    scale_x_continuous(breaks=myBreaks,

    label=months[myBreaks]

    )

  • 20/26

    Type Sample data Hands on Q & A Conclusion References Files

    The second codes. (8 of 9)

    yAxisTickMarks

  • 21/26

    Type Sample data Hands on Q & A Conclusion References Files

    The second codes. (9 of 9)

    {

    print(paste0(" ",codesAndDescription$Offense.Description[j]))

    }

    }

    }

  • 22/26

    Type Sample data Hands on Q & A Conclusion References Files

    Hands-on exercises

    1 Your supervisor has a strange color sense, and wants you tochange the grid lines on the last image from the first set tored.

    2 Your supervisor is VERY old school, and does not appreciatethe advances that Leonardo Pisano Bigollo Fibonacci broughtto mathematics. Change the y-axis labeling to Romannumerals.

  • 23/26

    Type Sample data Hands on Q & A Conclusion References Files

    Q & A time.

    Q: How many marketing peopledoes it take to change a lightbulb?A: I’ll have to get back to you onthat.

  • 24/26

    Type Sample data Hands on Q & A Conclusion References Files

    What have we covered?

    Stacked area plots:

    Are most frequently used to showtrends and relationships,Used to identify and, or to addemphasis to specific information,or to show parts of a whole.

    Good for showing relativeporportions of the components tothe whole.

    Next: Pie charts

  • 25/26

    Type Sample data Hands on Q & A Conclusion References Files

    References (1 of 1)

    [1] Robert L. Harris,Information Graphics: A Comprehensive Illustrated Reference,Oxford University Press, 2000.

  • 26/26

    Type Sample data Hands on Q & A Conclusion References Files

    Files of interest

    1 Code snippet to createimages in this presentation

    ## First codesrm(list=ls())

    library(ggplot2)

    saveFileName