exploratory graphs

23
Explora tory Graphs Explora tory Graphs Roger D. Peng, Associate Professor of Biostatistics  Johns Hopkins Bloomberg School of Public Healt h

Upload: niti860

Post on 15-Oct-2015

75 views

Category:

Documents


0 download

TRANSCRIPT

  • Exploratory GraphsExploratory GraphsRoger D. Peng, Associate Professor of BiostatisticsJohns Hopkins Bloomberg School of Public Health

  • Why do we use graphs in data analysis?Why do we use graphs in data analysis?To understand data propertiesTo find patterns in dataTo suggest modeling strategiesTo "debug" analysesTo communicate results

    2/23

  • Exploratory graphsExploratory graphsTo understand data propertiesTo find patterns in dataTo suggest modeling strategiesTo "debug" analysesTo communicate results

    3/23

  • Characteristics of exploratory graphsCharacteristics of exploratory graphsThey are made quicklyA large number are madeThe goal is for personal understandingAxes/legends are generally cleaned up (later)Color/size are primarily used for information

    4/23

  • Air Pollution in the United StatesAir Pollution in the United StatesThe U.S. Environmental Protection Agency (EPA) sets national ambient air quality standards foroutdoor air pollution

    For fine particle pollution (PM2.5), the "annual mean, averaged over 3 years" cannot exceed .

    Data on daily PM2.5 are available from the U.S. EPA web site

    Question: Are there any counties in the U.S. that exceed that national standard for fine particlepollution?

    U.S. National Ambient Air Quality Standards-

    12 g/m3

    EPA Air Quality System-

    5/23

  • DataDataAnnual average PM2.5 averaged over the period 2008 through 2010

    Do any counties exceed the standard of ?

    pollution

  • Simple Summaries of DataSimple Summaries of DataOne dimension

    Five-number summaryBoxplotsHistogramsDensity plotBarplot

    7/23

  • Five Number SummaryFive Number Summarysummary(pollution$pm25)

    ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 3.38 8.55 10.00 9.84 11.40 18.40

    8/23

  • BoxplotBoxplotboxplot(pollution$pm25, col = "blue")

    9/23

  • HistogramHistogramhist(pollution$pm25, col = "green")

    10/23

  • HistogramHistogramhist(pollution$pm25, col = "green")rug(pollution$pm25)

    11/23

  • HistogramHistogramhist(pollution$pm25, col = "green", breaks = 100)rug(pollution$pm25)

    12/23

  • Overlaying FeaturesOverlaying Featuresboxplot(pollution$pm25, col = "blue")abline(h = 12)

    13/23

  • Overlaying FeaturesOverlaying Featureshist(pollution$pm25, col = "green")abline(v = 12, lwd = 2)abline(v = median(pollution$pm25), col = "magenta", lwd = 4)

    14/23

  • BarplotBarplotbarplot(table(pollution$region), col = "wheat", main = "Number of Counties in Each Region")

    15/23

  • Simple Summaries of DataSimple Summaries of DataTwo dimensions

    dimensions

    Multiple/overlayed 1-D plots (Lattice/ggplot2)ScatterplotsSmooth scatterplots

    > 2

    Overlayed/multiple 2-D plots; coplotsUse color, size, shape to add dimensionsSpinning plotsActual 3-D plots (not that useful)

    16/23

  • Multiple BoxplotsMultiple Boxplotsboxplot(pm25 ~ region, data = pollution, col = "red")

    17/23

  • Multiple HistogramsMultiple Histogramspar(mfrow = c(2, 1), mar = c(4, 4, 2, 1))hist(subset(pollution, region == "east")$pm25, col = "green")hist(subset(pollution, region == "west")$pm25, col = "green")

    18/23

  • ScatterplotScatterplotwith(pollution, plot(latitude, pm25))abline(h = 12, lwd = 2, lty = 2)

    19/23

  • Scatterplot - Using ColorScatterplot - Using Colorwith(pollution, plot(latitude, pm25, col = region))abline(h = 12, lwd = 2, lty = 2)

    20/23

  • Multiple ScatterplotsMultiple Scatterplotspar(mfrow = c(1, 2), mar = c(5, 4, 2, 1))with(subset(pollution, region == "west"), plot(latitude, pm25, main = "West"))with(subset(pollution, region == "east"), plot(latitude, pm25, main = "East"))

    21/23

  • SummarySummaryExploratory plots are "quick and dirty"

    Let you summarize the data (usually graphically) and highlight any broad features

    Explore basic questions and hypotheses (and perhaps rule them out)

    Suggest modeling strategies for the "next step"

    22/23

  • Further resourcesFurther resourcesR Graph GalleryR Bloggers

    23/23