data visualization guide

58
Climate Science Investigations BEACON Data Visualization Guide 1 | Page B. Burress Chabot Space & Science Center November 2011 Climate Science Investigations Data Visualization Guide for the Berkeley Atmospheric CO 2 Observation Network (BEACON) A Classroom Guide developed by Chabot Space & Science Center For the University of California Berkeley BEACON Project November 2011

Upload: vudieu

Post on 10-Feb-2017

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

1 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Climate Science Investigations

Data Visualization Guide for the Berkeley Atmospheric

CO2 Observation Network (BEACON)

A Classroom Guide developed by

Chabot Space & Science Center

For the University of California Berkeley BEACON Project

November 2011

Page 2: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

2 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Contents

Introduction ......................................................................................... 4

Purpose ............................................................................................ 4

Sensors ............................................................................................ 4

Visualization ...................................................................................... 4

Greenhouse Gas Overview .................................................................. 5

What can be done with the BEACON sensor data? .................................. 6

Sources of Carbon Dioxide in the San Francisco Bay Area ....................... 7

Using Excel To Crunch Numbers and Make Graphs .................................... 8

Importing to Excel ............................................................................. 8

Examining the Data ........................................................................... 8

Selecting What You Want .................................................................... 9

Mins, Maxes, Averages ..................................................................... 10

If There Are Unrealistic Numbers In Your Data .................................... 12

A Simple Graph ............................................................................... 13

Two-Data Graph .............................................................................. 16

X Versus Y ...................................................................................... 18

Excel Tips and Tricks Summary ......................................................... 19

Different Ways to Graph Data Sets ....................................................... 21

Using Google Earth to locate BEACON sensors and identify potential CO2

sources and sinks ............................................................................... 26

Introduction .................................................................................... 26

Pinpointing a Sensor Node ................................................................ 27

Exercise: Pin a Sensor Node to a Map ............................................. 27

BEACON Sensor Node Site Profile....................................................... 30

Visualizing Data With Tableau Software ................................................. 31

About Tableau ................................................................................. 31

Tableau for Teaching ..................................................................... 31

Tableau Public .............................................................................. 31

Page 3: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

3 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Basics ............................................................................................ 32

Installing ..................................................................................... 32

Starting Up .................................................................................. 32

Preparing Data for Import .............................................................. 33

Creating a New Workbook .............................................................. 34

Importing Data ............................................................................. 35

Dimensions and Measures .............................................................. 36

Columns and Rows ........................................................................ 36

Marks .......................................................................................... 38

Measures ..................................................................................... 39

Georeferencing and Map Plotting ..................................................... 46

Time ........................................................................................... 46

General Graphs ............................................................................. 49

Show Me ...................................................................................... 49

Filters .......................................................................................... 51

Recipes .......................................................................................... 54

Recipe 1: Carbon Fingerpainting .................................................... 55

Recipe 2: Blowing Bubbles ............................................................. 56

Recipe 3: Coloring Within the Lines ................................................ 57

Page 4: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

4 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Introduction

Purpose

―BEACON‖ (BErkeley Atmospheric

Carbon dioxide Observation

Network) is a project of the

University of California Berkeley to

develop and deploy an array of

miniature sensors that continually

monitor and wirelessly transmit

greenhouse gas (GHG)

concentrations to a central data

server.

With a network of closely spaced sensor nodes measuring GHG

concentrations at frequent intervals, GHG levels over a small geographic

region can be monitored continually and in great detail.

Sensors

The sensors and their supporting

electronics and wireless transmitters are

packaged in protective containers that

can be set up practically anywhere with

available electrical power and wireless

Internet access. These sensor ―nodes‖

are deployed in a grid, covering a

selected geographic area at a desired

spacing.

Each sensor node reports the date

and time, the atmospheric relative

humidity and temperature, and GHG

(primarily CO2) concentration as often

as once each second.

Visualization

Scientific measurements generally

come down to large batches of numbers, and searching for meaningful

Page 5: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

5 | P a g e B. Burress – Chabot Space & Science Center

November 2011

relationships between what the numbers represent is often made easier

through visualizations: color-coded text, graphs, and maps, just to mention

some traditional methods—but the ways to visualize changing numbers and

relationships are only limited by the imagination. Shapes, colors, motion,

even sound are also potential modes of conveying numerical information.

Greenhouse Gas Overview

Heat-trapping gases—also

called ―greenhouse gases‖

(GHGs) for how they collect

and contain solar energy as

heat, like inside a

greenhouse--in Earth’s

atmosphere keep our planet

warmer than it would be

without them. Without

these gases—carbon dioxide,

methane, ozone, water

vapor, and others—Earth’s

surface would be on average

about 50 degrees Fahrenheit

colder than they are, and

our planet would be a frozen

ball. We can thank the

presence of these heat-trapping gases for maintaining a livable environment.

Human industrial activity over the last century and a half has added large

amounts of GHGs to the atmosphere —mainly carbon dioxide from the

burning of fossil fuels like coal, oil, and natural gas. This has increased the

atmosphere’s ability to trap and store heat. As a result, the global average

temperature has been steadily rising. It’s a bit like having too many

blankets on your bed: the added insulation traps more heat and can make

you uncomfortably hot.

This change in global climate has had wide ranging effects around the

planet, including rising sea levels as glaciers and land-based ice sheets melt,

shifts in local climates that force plant and animal species to migrate or

Page 6: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

6 | P a g e B. Burress – Chabot Space & Science Center

November 2011

become extinct, the spread of insect-borne disease into new areas, increases

in severe weather events such as droughts, hurricanes, flooding, and

tornadoes, and a change in the chemistry of the oceans that have serious

impacts on sea life.

Human response to controlling our own GHG emissions has been slow,

inconsistent, and wrought with challenges. Coming up with ―clean‖

alternatives to meet our transportation, energy generation, manufacturing

and agricultural needs is not simple. While scientists and engineers tackle

the problems of achieving cleaner, ―greener‖ ways to supply our society with

energy we need, the status quo of over a century of fossil fuel burning has

continued as the cheapest means.

How can we know if our efforts achieve the desired results: a slowing or

reduction in average GHG levels?

We can sample the atmosphere at different locations around the Earth, as

we have been for decades. Because air circulation constantly mixes

atmospheric gases, sampling the air in a few places around the Earth can tell

us how the overall, or average, gas concentrations change—but this doesn’t

tell us anything about the actual sources of gas emissions, just the global

result of their output.

To understand the levels of emission from specific sources, and how changes

in their output actually match up with efforts to reduce emissions or

government mandates to do so, GHG concentrations must be monitored with

adequate resolution, both in space and time, to detect local changes.

What can be done with the BEACON sensor data?

The BEACON GHG sensor nodes were created with a very specific

investigation in mind: to continually monitor atmospheric carbon dioxide

concentrations at specific locations across a small geographic region.

Combined with measurements of related atmospheric conditions, geography,

surrounding urban environment, time of day, time of week, and season of

the year, the landscape of raw numbers produced by the sensor grid

becomes a field of exploration to investigate. As with any exploration, start

by asking questions….

How do CO2 levels vary from place to place?

Page 7: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

7 | P a g e B. Burress – Chabot Space & Science Center

November 2011

How do CO2 levels vary at different times of the day, week, or year,

and why?

What are the sources of CO2 within the sensor grid?

What factors may cause variations in CO2 levels, from place to place or

time to time?

Are there correlations in CO2 levels with factors like weather, time of

day, traffic, weekday versus weekend?

Sources of Carbon Dioxide in the San Francisco Bay Area

Sources of human-generated CO2 in the San Francisco Bay Area include

cars, factories, oil refineries, and possibly landfills. Another potential source

that is indirectly related to human activity are fires—forest fires, structure

fires, even residential chimney output.

What about CO2 ―sinks‖—places and processes that remove CO2 from the

atmosphere? How might sinks affect the CO2 levels measured by the

BEACON sensor nodes? Forests are CO2 sinks; during photosynthesis, CO2 is

converted into oxygen and plant sugars, and so is effectively removed from

the atmosphere. How large an effect on local CO2 levels might forests have?

How would the effectiveness of a forest CO2 sink change with time of day

and time of year? How might we search for signs of the effect in the BEACON

sensor node data?

Are the ocean or bay CO2 sinks?

What else might affect local CO2 concentrations? Spare the Air days? Wind

patterns? Other weather conditions, like temperature and humidity?

All of these—the sources, the sinks, the patterns of air flow—are what you

will be hunting for as you explore the BEACON sensor grid data. I wonder

what we’ll find….

Page 8: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

8 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Using Excel To Crunch Numbers and Make Graphs

Raw numerical data may be viewed, browsed, and ―crunched‖ in a number

of ways. We are going to handle the basic number management and

crunching using Microsoft Excel. Familiarity with using Excel or a similar

spreadsheet program is an advantage, but this section of the guide offers a

basic introduction and how-to manual.

Importing to Excel

First, download a set of data from the data access website. For the purposes

of this practice session, select and download exactly one day’s worth of

data—say, from midnight on one day all the way around to midnight again.

Import the selected data to Excel by selecting Open and browsing for the

raw data file you have downloaded. Excel should load your columns of data

into individual columns in the spreadsheet.

Examining the Data

Take a look at what you have imported into Excel. Become familiar with the

format of the data, the time range covered, the time interval, and what the

actual measurements look like, at a glance.

Here is a sample portion of the data you might be looking at:

Time Relative

Humidity (%) Temperature

(°C) Raw-CO2 (ppm) Calibrated-CO2

(ppm) Slope-CO2

(ppm) Final-CO2

(ppm)

5/20/2011 17:30 57.82 18.96 402 596.0074933 488.52 488.52

5/20/2011 17:32 56.3 19.29 405 599.5199625 492.1 492.1

5/20/2011 17:34 55.49 19.43 400 593.6658471 486.31 486.31

5/20/2011 17:36 55.8 19.43 406 600.6907856 493.4 493.4

5/20/2011 17:38 55.02 19.84 401 594.8366702 487.62 487.62

5/20/2011 17:40 53.93 20.18 407 601.8616087 494.71 494.71

5/20/2011 17:42 53 20.47 406 600.6907856 493.6 493.6

5/20/2011 17:44 52.06 20.76 407 601.8616087 494.84 494.84

5/20/2011 17:46 51.11 20.97 402 596.0074933 489.05 489.05

5/20/2011 17:48 50.72 21.23 396 588.9825547 482.09 482.09

5/20/2011 17:50 50.49 21.18 397 590.1533778 483.33 483.33

5/20/2011 17:52 49.94 21.4 403 597.1783164 490.42 490.42

5/20/2011 17:54 49.81 21.48 406 600.6907856 494 494

5/20/2011 17:56 49.51 21.56 400 593.6658471 487.04 487.04

Page 9: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

9 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Your file will contain more than a dozen or so rows. Depending on the

frequency of the data and the range you select, of course, there may be

thousands of rows.

There may also be data parameters that you are not interested in—such as

the CO2 columns labeled ―raw,‖ ―calibrated,‖ and ―slope.‖ In this example,

these are the raw sensor measurements and intermediate numbers

calculated in a process of data calibration that leads to the ―Final CO2‖ value,

which is the data we are interested in. (These intermediate numbers are a

bit like the steps you write down in a mathematical problem where your

teacher asks you to show your work as well as the final result.)

Ask yourself if you can detect any relationships or dependencies between the

data parameters. When one parameter (say, relative humidity) goes up or

down, is there a corresponding behavior in another parameter—say,

temperature, or time, or CO2 concentration?

Selecting What You Want

While you can certainly work with the huge table of raw data that you

acquire, you will find it handy and sometimes necessary to copy out only the

data values and ranges that you are interested in and paste them into a

blank spreadsheet, leaving the raw data file untouched and unchanged, and

also eliminating columns or rows that you will never use (and so are just in

your way!).

In Excel, you can select specific ranges (columns or rows) by holding down

the CONTROL key and dragging the mouse across the cells that you want to

select. If you keep the CONTROL key pressed, you can select additional cells

elsewhere in the table for copying.

For example, let’s say I only want the first 5 rows of data in the table above,

and only the DATE/TIME, TEMPERATURE, and RAW CO2.

First select the first five cells in each of those columns by holding down

the CONTROL key and dragging with the mouse, one column at a time:

Page 10: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

10 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Time Relative

Humidity (%) Temperature

(°C) Raw-CO2

(ppm) Calibrated-CO2

(ppm) Slope-CO2

(ppm) Final-CO2

(ppm)

5/20/2011 17:30 57.82 18.96 402 596.0074933 488.52 488.52

5/20/2011 17:32 56.3 19.29 405 599.5199625 492.1 492.1

5/20/2011 17:34 55.49 19.43 400 593.6658471 486.31 486.31

5/20/2011 17:36 55.8 19.43 406 600.6907856 493.4 493.4

5/20/2011 17:38 55.02 19.84 401 594.8366702 487.62 487.62

5/20/2011 17:40 53.93 20.18 407 601.8616087 494.71 494.71

5/20/2011 17:42 53 20.47 406 600.6907856 493.6 493.6

5/20/2011 17:44 52.06 20.76 407 601.8616087 494.84 494.84

5/20/2011 17:46 51.11 20.97 402 596.0074933 489.05 489.05

5/20/2011 17:48 50.72 21.23 396 588.9825547 482.09 482.09

5/20/2011 17:50 50.49 21.18 397 590.1533778 483.33 483.33

5/20/2011 17:52 49.94 21.4 403 597.1783164 490.42 490.42

5/20/2011 17:54 49.81 21.48 406 600.6907856 494 494

5/20/2011 17:56 49.51 21.56 400 593.6658471 487.04 487.04

Notice that I selected the label header cell at the top of each column as well

as the first 5 cells; you don’t have to do this, but it’s a good idea to keep

them for later reference.

Next, COPY the cells

Open a new, blank spreadsheet, click in a cell where you want to place

the data, and PASTE

This is what you should get:

Time Temperature

(°C) Final-CO2

(ppm)

5/20/2011 17:30 18.96 488.52

5/20/2011 17:32 19.29 492.1

5/20/2011 17:34 19.43 486.31

5/20/2011 17:36 19.43 493.4

5/20/2011 17:38 19.84 487.62

Mins, Maxes, Averages

Let’s get some practice with Excel mathematical functions.

What is the minimum, maximum, and average relative humidity on the day

you have selected?

Page 11: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

11 | P a g e B. Burress – Chabot Space & Science Center

November 2011

To get the minimum in the range, use the formula ―MIN()‖. In an empty

cell—preferably at the bottom of the column of numbers you are working

on—type:

=MIN(cell range)

Note: always start a function with an ―=‖ sign to tell Excel you’re typing a

mathematical function for it to compute.

―Cell range‖ is typed as the first and last cells in the range separated by a

colon. For example, if the numbers we want to specify start in column B,

row 2—or cell B2—and ends at cell B500, then the MIN() function should be

typed like this:

=MIN(B2:B500)

Once you have typed in the formula and hit ENTER, the formula will be

replaced with the number it has calculated.

Next, compute the maximum value in the range, and the average of the

range. The formulae are (assuming the same range of numbers):

=MAX(B2:B500)

=AVERAGE(B2:B500)

Now that you really know what you’re doing, go ahead and calculate the

minimum, maximum, and average for temperature and CO2 concentration.

Summarize all your findings in this table.

Measurement Minimum Maximum Average

Relative Humidity (%)

Temperature (oC)

CO2 (ppm)

Any surprises? Anything look strange? Do all of the numbers look realistic? I

ask this because you may have detected a number that is not realistic. For

example, you might have found a minimum of something like ―-999‖ in the

Page 12: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

12 | P a g e B. Burress – Chabot Space & Science Center

November 2011

CO2 concentration data, and of course a negative number for this is

meaningless.

If There Are Unrealistic Numbers In Your Data

If you did find a non-realistic number using the Excel functions, first scan

through the actual data to find where the offending number is found.

You have just been introduced to one of the ways that scientists, or

computer programmers, signify a data point for which no data exists. It

may be that the CO2 sensor wasn’t working at the time the data point was

sampled, and so there is no measurement for CO2, even though the other

sensors (humidity, temperature) were functioning and reporting data.

You might ask, why not just enter a ―0‖ in cases where no data was taken?

The answer to that is that 0 isn’t a good choice as a ―stand-in‖ data point (or

space filler) because 0 may be a realistic value for that data parameter.

After all, it’s possible to measure 0 parts per million for CO2 concentration,

however unlikely. Likewise, it’s possible to measure 0% relative humidity

and 0 degrees Celsius temperature. Entering a zero to fill in the spot for

missing data could be misleading; how would you know it’s a ―no data‖ filler

and not just an unusual data point?

A wildly unrealistic number, like -999, is used because it is clearly

unrealistic—not only for CO2 concentration, but for relative humidity and

temperature. -999 is not a possible real value for any of those parameters.

Now that you may have detected that there are non-real numbers in your

data that are simply fillers, how do we deal with them in terms of calculating

the average value? Even though -999 is physically unrealistic as a CO2

concentration, it’s still a real number, and when you calculate the average of

the range, Excel includes it in the average. How do you calculate the

average or a range while at the same time telling Excel to ignore the

unrealistic values?

Answer: Use the AVERAGEIF function.

This function averages the numbers in the given range if they meet a

specified condition. In this case, to ignore the unrealistic value of -999 and

average only the real data values, the function might average the numbers

Page 13: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

13 | P a g e B. Burress – Chabot Space & Science Center

November 2011

that are greater than -999--assuming that only -999 has been chosen to

stand as the no-data filler.

To be safe, if possible we should set the qualifier at the spot separating the

realistic from the non-realistic values. In the case of CO2 concentration, that

would be anything greater than or equal to 0 is realistic, anything less than

0 not.

The function, AVERAGEIF(RANGE,‖QUALIFIER‖), becomes:

=AVERAGEIF(B2:B500,‖>=0‖)

Note that the range and qualifier are separated by a comma, and the

qualifier is in double-quotes. The qualifier in this case, >=, is how you type

―greater than or equal to‖ in Excel. So, this formula calculates the average

of all the numbers in the range B2 through B500 that are greater than or

equal to 0.

Question: What qualifiers would you choose to compute an AVERAGEIF for

relative humidity and temperature (Celsius), assuming that -999 again is the

no-data filler value?

A Simple Graph

Looking at and crunching numbers is a lot of fun, but if you’re a visually-

oriented person like me, those numbers start to look much friendlier when

they are graphed. Graphing numbers is the first step to visualizing

relationships between data points and parameters. Let’s get started….

A quick way to make a graph with your data is simply to select the data with

the mouse, choose Insert > Chart, and finally choose the type of chart you

want. You might choose a bar graph, a line graph, a scatter graph, or

another, depending on how you want to visualize the data. I recommend

that you start with a scatter graph and see where that takes you.

Page 14: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

14 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Here’s the less quick, but more

selective way to make a simple

graph. For this example,

assume that you have data in

two columns of your

spreadsheet: the date/time in

column A and the temperature

in column B, as shown on the

right.

What to do:

Choose Insert, Chart, Scatter Graph; a blank chart should appear.

Right-click somewhere in the blank chart, and in the menu that

appears, choose Select Data.

A dialog box will appear:

For starters, enter the range of the data you want to graph in the

―Chart data range‖ box. If you don’t know the formula that Excel

expects here, then simply click on the button to the right of that field.

A box titled ―Select data source‖ will appear, and at this point all you

need to do is click and drag the mouse to select the data you want to

graph—in this case temperature from cell B2 to B10 in the example

A B

1 Date/Time Temperature(°C)

2 5/20/2011 17:30 16.74

3 5/20/2011 17:32 16.46

4 5/20/2011 17:34 16.54

5 5/20/2011 17:36 16.56

6 5/20/2011 17:38 16.63

7 5/20/2011 17:40 16.5

8 5/20/2011 17:42 16.31

Page 15: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

15 | P a g e B. Burress – Chabot Space & Science Center

November 2011

table above. After selecting the cells, click the button to the right of

the Select Data Source field again. Notice that as you drag across the

data, Excel fills in the data range formula for you.

You should now have a quick and simple graph of the selected data. But

we want to graph the temperature versus date/time, so we need to define

the X-axis series:

Return to the Select Data Source dialog box; you should now see the

data series you entered in the left-hand list (under Legend Entries). It

will probably have a default name, like Series1.

Select that series, then click Edit. The ―Edit Series‖ dialog box will

appear.

The Series Y Values formula should already be filled in, since you

already defined the data series.

Here, you can give the data series a name, if you like—something

more interesting than Series1. Type a name for the series into the

Series Name field—something like ―Temperature.‖

Now, define the X-axis values. Click the data range selection button to

the right of the empty Series X Values field, then click and drag the

mouse to select the appropriate cells—in our example, this would be

Date/Time, or cells A2 through A10. Click the selection button again.

Page 16: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

16 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Now your graph should be in good shape.

If you’re having trouble reading the date and time because those strings

are overlapping and blurred together, you can change the format of the

X-axis labels by right clicking on the X-axis, choosing Format Axis, and

selecting Alignment. In the dialog box you can play with label directions

and custom angles—play around with that until you have what you like.

Two-Data Graph

To compare two different data series you could simply produce two separate

one-data-series graphs and compare them side by side. Excel lets you

graph two different series of data with common X-values on the same graph,

each with its own Y-axis.

Here’s how to set that up in Excel. In this example, we’ll graph both the

temperature and relative humidity measured over the same date/time

range.

A B C

1 Date/Time Temperature(°C) Relative Humidity(%)

2 5/20/2011 17:30 16.74 66.58

3 5/20/2011 17:32 16.46 66.16

4 5/20/2011 17:34 16.54 67.06

5 5/20/2011 17:36 16.56 66.43

6 5/20/2011 17:38 16.63 66.43

7 5/20/2011 17:40 16.5 66.64

8 5/20/2011 17:42 16.31 67.71

9 5/20/2011 17:44 16.21 70

10 5/20/2011 17:46 16.24 68.61

What to do:

Page 17: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

17 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Start by making the simple one-data graph of temperature versus

date/time as you did in the previous example.

Once that is done, right-click in the somewhere graph and choose

Select Data. The Select Data Source box will appear, and you should

see the Temperature data series that you created before.

Choose Add. The Edit Series dialog will appear; repeat the procedure

you did before to define the name, X-values, and Y-values for the data

series that you are adding to the graph. In this example, you will

select cells C2 through C10, relative humidity, for the Y-axis values,

and again select A2 through A10 for the X-axis values.

When you’re done, click OK.

You will see the data series that you just added on the list under Legend

Entries. Click OK.

Now, you will find the second data series plotted on the same graph as

the first. But both series are plotted on the same Y-axis scale, with the

temperatures in Celsius and the relative humidity in percent, placing the

temperature far down near the bottom of the scale and the humidity up

high.

To put the second data series on its own axis:

Right-click on the graphed line for the second data series (relative

humidity), then select Format Data Series from the menu that pops

up.

Under Series Options, select Secondary Axis, then Close.

Now the graph should be a bit more readable, with separate Y-axis scales

for each data series on the left and right sides of the graph.

If you want to adjust the Y-axis ranges for either series:

Right-click on that Y-axis, select Format Axis, and under Axis Options,

select ―Fixed‖ for the Minimum and Maximum values, and type in what

values you want them to have.

Finally, under Chart Tools, Layout, you can select and change the

Chart Title and Axis Titles, giving those appropriate names.

When you’re done, you might end up with something like this:

Page 18: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

18 | P a g e B. Burress – Chabot Space & Science Center

November 2011

How much time does the graph cover? Do you see any relationships between

the behavior of temperature and humidity?

X Versus Y

So far you’ve been plotting data series (temperature, relative humidity)

versus date/time to see how those data values change with time. But you

can, of course, plot any data series against any other to see relationships

between the two, if any. Let’s practice doing this by plotting relative

humidity (in the Y-axis) versus temperature (in the X-axis).

First, see if you can do this yourself! If you need help, follow the same

instructions for the One-Data graph that you did before, but choose relative

humidity for the Y-axis and temperature for the X-axis.

After creating appropriate chart and axis labels, you might end up with

something like this:

65.5

66

66.5

67

67.5

68

68.5

69

69.5

70

70.5

16.1

16.2

16.3

16.4

16.5

16.6

16.7

16.8

Re

lati

ve H

um

idit

y (%

)

Tem

pe

ratu

re (

Ce

lsiu

s)

Date and Time

Temperature and Relative Humidity

Temperature

Relative Humidity

Page 19: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

19 | P a g e B. Burress – Chabot Space & Science Center

November 2011

What does this graph tell us, if anything? Is there a relationship between the

behavior of relative humidity and temperature in this data? There might be—

at least, it appears that higher humidity occurs at lower temperatures.

There may not be enough data in this sample to suspect a relationship—so

maybe what we need is to plot more data, or from several different times,

and see how the picture takes shape….

Excel Tips and Tricks Summary

To calculate the average of a range of numbers in Excel, choose an

empty cell and type ―=AVERAGE(range)‖. For example, to calculate

the average of the numbers in column B, rows 5 through 100, type

―=AVERAGE(b5:b100)‖.

―MAX()‖ and ―MIN()‖ are two useful functions, to calculate the

maximum and minimum of a range of numbers. For example,

―=MAX(b5:b100)‖.

To jump to the top or bottom of a column of numbers, hold down the

CONTROL key and press the up or down arrow.

To quickly select a large column of numbers, click in the topmost cell

of the number range and press ―SHIFT-CONTROL-DOWN ARROW‖.

To select two (or more) columns of numbers (such as to make a

graph), select the first column normally by clicking and dragging from

65.5

66

66.5

67

67.5

68

68.5

69

69.5

70

70.5

16.1 16.2 16.3 16.4 16.5 16.6 16.7 16.8

Re

lati

ve H

um

idit

y

Temperature (Celsius)

Relative Humidity versus Temperature

Page 20: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

20 | P a g e B. Burress – Chabot Space & Science Center

November 2011

top to bottom, then while keeping the first column selected, hold down

the CONTROL key and select the second column by click and drag.

To create a graph, after having selected one or more ranges of

numbers, select INSERT, then choose the type of chart you want to

make.

Page 21: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

21 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Different Ways to Graph Data Sets

Just to shake things up a bit, let’s make a graph in a different style than the

conventional X-Y scatter graph.

Here’s the assignment: Graph one day (24 hours) of CO2 concentration data

for two or more sensor sites using a radial graph.

Here’s how to do it:

Prepare a data table for each sensor site you are graphing, containing

columns for date/time and final CO2 data, including headers for each

column. For the purposes of making a single graph of the data from

multiple data collection sites, it will be easier to copy each table of

data to the same blank spreadsheet (for example, side by side or one

below the other).

For each data table, copy and paste one row of data for each hour of

the day, so that you have 24 rows of data in each table. Here are a

few of the top rows for one such table:

Berkeley Botanical Garden Time Final-CO2 (ppm)

5/20/2011 17:30 483.1882

5/20/2011 18:30 478.1499 5/20/2011 19:30

… 483.1882

For this example I have chosen to create data tables from 5 different sensor

node sites, so at this point I have five tables, each containing 24 rows (not

including the column headers). Also, to keep the example simple, I merely

copied one data point from each hour of the day—17:30, 18:30, 19:30,

etc.—and ignored the data from the rest of each hour. One could also take

the average of the data for each hour and use that instead…

To create the radial graph (or ―radar‖ graph), begin the same way you

created the one-data X-Y scatter graph: select the date/time and final

CO2 columns from the table of data for the first sensor node site.

Insert a Chart of type ―radar‖—choose the style that shows both

plotted lines and data points.

Page 22: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

22 | P a g e B. Burress – Chabot Space & Science Center

November 2011

This should create a radial graph--a wheel with 24 ―spokes‖. See the final

graph below. The date/time ―axis‖ is the perimeter of the circle, starting

from the top position and going around the circle clockwise, like the time on

an analog clock dial. In fact, since we have selected one data point per hour

for 24 hours, the radial graph should read like a 24-hour clock dial!

The CO2 data is plotted around the date/time circle, the ―vertical‖ scale

being the radial spokes of the wheel, the center of the wheel being one limit

of the range and the perimeter of the circle being the other limit.

To add the series of data from a second table, follow the same steps

as when you added a second series of data to the X-Y scatter graph:

right-click on the graph and choose SELECT DATA. The dialog box that

appears will not only let you add another data series, but you can edit

the one you already created to give it a name—the name of the sensor

node site.

Once you have added the second series, that one too will be plotted on

your graph, around the circle.

Add each additional data series in the same manner.

Here’s the graph I created from data from 5 different sensor node sites:

Page 23: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

23 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Where does the data time series begin—where on the graph, and at

what date/time? Where and when does it end?

What is the range of the radial scale—the scale that the CO2 is plotted

to?

What does this graph tell you about the CO2 concentrations at different

times of the day at different sites?

The same data can, of course, be graphed in an X-Y scatter graph. That

would look something like this:

400

420

440

460

480

500

5205/20/11 17:30

5/20/11 18:305/20/11 19:30

5/20/11 20:30

5/20/11 21:30

5/20/11 22:30

5/20/11 23:30

5/21/11 0:30

5/21/11 1:30

5/21/11 2:30

5/21/11 3:305/21/11 4:30

5/21/11 5:305/21/11 6:30

5/21/11 7:30

5/21/11 8:30

5/21/11 9:30

5/21/11 10:30

5/21/11 11:30

5/21/11 12:30

5/21/11 13:30

5/21/11 14:30

5/21/11 15:305/21/11 16:30

CO2 Concentration

Berkeley Botanical Downtown Berkeley MSRI-SSL CoryHall VLSB

Page 24: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

24 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Why choose the radial graph instead of the scatter graph? Does one of them

make it easier to see certain relationships in the data than the other?

Maybe, maybe not—that may depend on what relationship you’re looking

for. Or, it may come down to a personal preference on the style of data

visualization.

I will mention two reasons I might choose the radial graph for this particular

set of data.

1. The circular form is suggestive of a repeating cycle, as opposed to the

ongoing linear time shown by the X-Y scatter graph. If the period of

time you are graphing is a natural cycle, like a day or a year, then the

plotted data ends up in the same part of the cycle that it began.

2. Another thing that this radial graph does is to emphasize the larger

values of CO2 concentration: the trace for the Downtown Berkeley set,

which has the highest values of the bunch, is the biggest, while that

for the VLSB site—the lowest values—is smallest.

400

420

440

460

480

500

520

5/20/11 14:24 5/20/11 19:12 5/21/11 0:00 5/21/11 4:48 5/21/11 9:36 5/21/11 14:24 5/21/11 19:12

Botanical Garden Downtown Berkeley MSRI-SSL Cory Hall VLSB

Page 25: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

25 | P a g e B. Burress – Chabot Space & Science Center

November 2011

There are other types of graphs you might have fun trying out on your data.

Some won’t make sense to use, but others may allow you to show aspects of

the data that are not easily revealed by ―conventional‖ graphs. Try some!

Page 26: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

26 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Using Google Earth to locate BEACON sensors and identify potential CO2

sources and sinks

Introduction

The data produced by the greenhouse gas sensor network is ―geo-

referenced‖—meaning, it comes along with the geographic coordinates

where the sensor is located. As with any geo-referenced data, it can be

represented, in one form or another, on a map.

For pinpointing sensor node locations and exploring the surrounding

environment, my tool of choice is Google Earth. If you haven’t used Google

Earth, don’t worry; it’s fairly easy to use. It’s also free to download at

www.google.com/earth.

Page 27: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

27 | P a g e B. Burress – Chabot Space & Science Center

November 2011

I won’t go through how to use Google Earth in great detail—you can learn a

lot just by playing around with it. Here are a few things for you to try that

will move you on your way to becoming a Google Earth Guru:

Double-click on the Earth globe—anywhere you like. What happens?

Enter a place you want to find in the Search pane on the left. You can

enter a street address, the name of a famous place, or even the

latitude and longitude coordinates of a place. Then click the search

button (magnifying glass). You will get a list of places Google Earth

thinks might be what you’re looking for. Double click the best choice.

The Layers pane contains sets of geo-referenced data that can be

turned on and turned off as needed; spend some time exploring what

is here, and what happens when you turn them on. Some data layers

are ―real-time‖, like Weather and Traffic.

If you want to save a location you have found, right-click on it and

select Save To My Places. That place will be moved into the Places

pane under My Places. As you build a library of favorite places, you

can organize them by creating folders to put them in, by category or

whatever criterion you desire.

Play with the navigation controls at the upper right corner of the

screen. With them you can rotate, move, and zoom your view. The

only way to learn how to use them is to just jump in and start using

them….

Pinpointing a Sensor Node

Now let’s do something practical using Google Earth. Let us say you have

downloaded the CO2 data from one of the BEACON sensor nodes and want to

place it on the map, and find out what’s going on in its vicinity.

Exercise: Pin a Sensor Node to a Map

Given that the geographic coordinates of this sensor are 37.832133° N

latitude and 122.255525° W longitude, let’s go hunting.

In the Search pane, under Fly To, enter the coordinates in the search

box: 37.832133 N 122.255525 W and click the search button.

Double-click on the found location.

Since you entered an unambiguous geographic coordinate, that’s all that

should show up in the list of found places (whereas if you enter

Page 28: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

28 | P a g e B. Burress – Chabot Space & Science Center

November 2011

something ambiguous, like ―Wal-Mart‖, you’ll get a long list of all

suspected Wal-Mart stores).

What happened? Where are we?

If you zoom in far enough—and assuming you entered the coordinates

correctly—you should find yourself looking at a sports field somewhere in

Oakland. Look around—what else can you identify? Buildings? Tennis

courts? Swimming pools? Parking lots? Streets?

To help identify where we are, let’s turn on a data layer that might be able

to tell us:

In the Layers pane, check the box for Places. Wait a moment while

Google Earth finds things and adds them to the map.

You should see some icons appear. You can hover the mouse over an

icon to view its label, or click on the icon to call up an information

bubble. See if you can identify where this sports field is located.

Now that you know where the sensor node is located, let’s stick a thumbtack

in the map so that we can find it easily the next time we need to:

Click on the ―thumbtack‖ button found in the top menu icon bar. This

brings up a thumbtack and sticks it on the map, and also a properties

window that sets that thumbtack’s properties: location, name, size,

appearance, and others.

While the properties window is open, a box around the thumbtack will

flash; this means the thumbtack is being edited. You can drag the

thumbtack to any spot you desire while you are editing it—so go ahead

and drag it where you want it. Or, you can just enter the coordinates

in the Latitude and Longitude boxes in the properties window. Either

method is fine—but if you already know the sensor node’s exact

geographic coordinates, it’s probably easier just to type them into the

properties window.

Type a name for your thumbtack in the Name field—for this practice,

call it something like ―BEACON Sensor Node X‖.

If you like, you can change the icon from the yellow thumbtack to

something else. To do this, click on the button to the right of the Name

field and select the icon you want to use.

When you’re done editing the thumbtack, click OK.

Page 29: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

29 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Now your thumbtack marker is permanent (until you delete it or edit its

properties). Notice that the marker appears automatically under My Places

in the Places pane. From here, you can move it into another folder you

create, if you like—like a folder called ―BEACON Sensor Nodes,‖ or

something like that.

Explore the node’s surroundings a bit. Zoom in and zoom out, drag the map

around, tilt, rotate—whatever; just explore!

If there’s something you want to investigate up close, you might be able to

use Street View. Most city streets in the United States can be viewed with

Street View. Let’s try it.

If you’re looking at the view above BEACON Sensor Node X as shown

in the picture above, let’s say we want to see what the surroundings

Page 30: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

30 | P a g e B. Burress – Chabot Space & Science Center

November 2011

look like from street level—like the street in the lower right corner.

(Which street is that?) All you have to do is click and drag the little

―person‖ icon that’s part of the navigation controls in the upper right

corner and drop it

onto the spot you

want to view

from. If there is

a Street View

image available,

you’ll be dropped

right into a

surrounding

panoramic picture

taken from that

spot. Try it!

(Hint: While

you’re dragging

the little person icon around the map, blue lines will appear over all

streets where Street View imagery exists.)

BEACON Sensor Node Site Profile

Google Earth is a great tool for exploring the area around each BEACON

sensor node, in search of possible GHG sources and sinks. Not only can you

see photographic imagery that reveals streets, residential areas, tree

coverage, industrial areas, and nearby bodies of water, the data layers can

provide specific information about features in the area.

Another piece of information Google Earth can provide us regarding a sensor

node site is the altitude of the location. In the main Google Earth image

view, look to the bottom to find the coordinates (latitude and longitude) and

the altitude of the location the mouse cursor is pointing.

You will explore the regions surrounding sensor node sites in more detail in

one of the investigation exercises.

Page 31: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

31 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Visualizing Data With Tableau Software

About Tableau

Tableau data visualization software is a commercial product. It is quite easy

to use to manipulate data and create compelling visualizations. The Tableau

home website is at www.tableausoftware.com. You may take a look at

Tableau by downloading a free 15-day trial version, or using the free on-line

Tableau Public version.

Tableau for Teaching

Though Tableau is commercial software, and requires a purchased license for

most uses, there is a program called Tableau for Teaching in which a limited

term free license may be obtained for use in public secondary schools,

colleges, and universities. For more information on eligibility, terms of use,

and how to qualify for a free TfT classroom license, go to

www.tableausoftware.com/academic.

Advantages and Limitations

The TfT license provides the full-function Tableau desktop software for a

teacher and classroom of students for a single teaching term (quarter or

semester). Its use is limited to in-class teaching instruction and student

project work.

Tableau Public

Tableau Public is a web-based version of the Tableau visualization software.

To get started, go to www.tableausoftware.com/products/public.

Advantages and Limitations

Tableau Public is free to everyone, may be used for an indefinite period, and

has most of the features of the desktop version. Workbooks are stored on a

Tableau web server and not your computer.

The amount of data that may be imported into a single Tableau Public

workbook is limited to 100,000 rows, and the total amount of workbooks

that may be stored on the web server is limited to 50 megabytes. This

should be ample space for most student projects.

Page 32: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

32 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Basics

Installing

All installation directions can be found on the Tableau website at the links

given earlier, whether you choose the Desktop or Public versions. This

training guide will focus on the free Tableau Public web-based version,

though the desktop version functions very similarly.

Starting Up

Below is a screenshot of the Tableau Public startup page—what you see

when you click on the icon created when you installed Tableau Public. This

is where you create new workbooks or open workbooks you have already

made. The graphical icon labeled ―TestBook‖ in this screenshot is an

example of an existing workbook that I created on my account. Below the

Open Data and workbooks icons section are helpful links to training,

templates, examples and tutorials.

Page 33: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

33 | P a g e B. Burress – Chabot Space & Science Center

November 2011

This training will lead you through some of the basic actions and functions

you need to import, work with, and produce visualizations of data. We will

not attempt to cover every menu command and feature Tableau contains;

Tableau’s Help guide is very useful for answering questions we will not

cover.

Preparing Data for Import

You may import data from different sources, such as spreadsheets and

databases. The Desktop version of Tableau can handle many data sources,

while the Public web-based version can only import from ―flat‖ data sources,

like simple spreadsheets and formatted text files. We’ll be working with flat

data files: downloaded csv (―comma-separated values‖) files or Excel

spreadsheets that you have put together.

After downloading the raw data, assemble the selected data you want to

work with into a single Excel spreadsheet. The easiest way to do this is to

copy and paste from the source file to a blank spreadsheet. Make sure that

your data contains columns for latitude and longitude, and that you copy in

the coordinates for each data collection site into the proper files before

assembling your master file. The master file can contain data from some or

all of the data source sites. When you are done, the final file should contain

continuous columns of data from all desired sites.

The table below is a sample compiled data spreadsheet, with five data points

from each of two different collection sites. Your table, depending on the

scope of your project, may contain thousands of rows from dozens of

collection sites!

Page 34: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

34 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Name Time Relative

Humidity (%)

Temperature (°C)

Final-CO2 (ppm)

Latitude Longitude

Botanical Gardens 5/20/2011 17:30 66.58 16.74 483.1882306 37.87515 -122.23861

Botanical Gardens 5/20/2011 17:32 66.16 16.46 479.1575977 37.87515 -122.23861

Botanical Gardens 5/20/2011 17:34 67.06 16.54 479.1575977 37.87515 -122.23861

Botanical Gardens 5/20/2011 17:36 66.43 16.56 479.1575977 37.87515 -122.23861

Botanical Gardens 5/20/2011 17:38 66.43 16.63 481.1729141 37.87515 -122.23861

Cory Hall 5/20/2011 17:30 54.99 19.48 463.9747733 37.87523 -122.25755

Cory Hall 5/20/2011 17:32 54.58 19.66 458.7626394 37.87523 -122.25755

Cory Hall 5/20/2011 17:34 54.37 19.64 459.8050662 37.87523 -122.25755

Cory Hall 5/20/2011 17:36 54.24 19.62 458.7626394 37.87523 -122.25755

Cory Hall 5/20/2011 17:38 54.21 19.58 462.9323465 37.87523 -122.25755

… … … … … … …

Creating a New Workbook

To create a new workbook from the main start page (shown earlier), either

click on the Open Data button, or select from the menu bar ―File> New.‖

You may also create a new workbook, or open an existing one, from the File

menu of any workbook.

Below is a screenshot of a Tableau workbook. This is what it looks like

―empty,‖ before you import any data or create any visualizations. Refer to

this picture when going through the overview below.

Page 35: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

35 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Importing Data

Once your Excel spreadsheet of assembled data is finished and saved to

your computer, you can import it into your empty workbook by choosing:

Data > Connect To Data > Microsoft Excel > OK

An Excel Workbook Connection dialog window will appear.

Step 1: Browse for you Excel data file and Open it

Step 2: Select Single Table, then the Sheet number of your data within

the Excel file

Step 3: Choose Yes or No depending on whether the first row of data

in your spreadsheet contains the data field header names

Page 36: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

36 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Click OK

Your data should proceed to load. When it is

done, you will see items appear in the

Dimensions and Measures windows on the left

side of your workbook.

Dimensions and Measures

When the data is loaded into your workbook,

Tableau assigns the data fields as either

Dimensions or Measures. In the language of

conventional data graphing, Dimensions are

independent variables and Measures are

dependent variables.

So, Measures are the values that are functions

of one or more Dimensions.

Tableau usually assigns fields containing time,

date, and text as Dimensions and fields

containing numeric values as Measures.

You can reassign data fields as you need to.

For example, in the data I have loaded, the

latitude and longitude fields have been assigned

to Measures, but I would rather treat the

location coordinates as independent variables,

and reassign them as Dimensions.

To do this, I can either right-click on each of

them and select Convert to Dimension, or I can

simply click and drag them from the Measures

window to the Dimensions window. When I’m

done, both Latitude and Longitude fields will

appear in the Dimensions window.

Columns and Rows

Now, take a look at the elements of the workbook labeled Columns and

Rows, and the table-like graphic with the ―Drop field here‖ labels. These are

where you place your data when you want to create a graph.

Page 37: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

37 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Data placed in Columns will be assigned to the horizontal axis of the graph,

and data placed in Rows is applied to the vertical axis. Don’t confuse the

Columns and Rows in the Tableau graphing area with the columns and rows

of data in your original spreadsheet.

To assign any data field from the Dimensions and Measures windows to the

visualization, you can drag and drop the field into the desired location—

either the bars labeled Columns and Rows above, or directly into one of the

―Drop field here‖ spaces in the graphic.

Also, you can let Tableau help you decide by simply double clicking on a data

field; Tableau will then place that field in the visualization where it thinks it

should go. This is not always where you might want it, but since moving

data around is as easy as dragging and dropping, it’s simple to make

changes and experiment with many different arrangements. You can learn a

lot simply by throwing data fields around the page! Try it….

Exercise: Quick Map Plot

I previously assigned my Latitude and Longitude fields to the Dimensions

window, making them independent variables.

Double-click on the Latitude Dimension

Double-click on the Longitude Dimension

See the screenshot below to see what happened when I did it and see if you

got the same result.

My results: The Longitude Dimension was placed in the Columns field, the

Latitude in the Rows field, and a map with five blue dots appeared, along

with a menu for Map Options.

Tableau recognized the nature of the Latitude and Longitude fields and

automatically assigned them to the axes to make map-sense: with

longitude along the horizontal axis and latitude in the vertical. It also

automatically produced a geographical plot of the locations.

See what happens if you drag the Latitude to Columns and the Longitude to

Rows (or, click the Swap button along the top menu bar). The map goes

away and is replaced by a more conventional looking graph. So, the

Page 38: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

38 | P a g e B. Burress – Chabot Space & Science Center

November 2011

geographical display only appears automatically if the coordinate fields are in

the appropriate axes.

Marks

The Marks window controls the nature of the plot symbols in the graph (in

our present example, the map).

You can control the color and the size of the plot points by clicking in the

Color field and dragging the bar below the Size field, respectively. You can

change the transparency or the color of the border or even add a halo to the

plot symbols using the dropdown menu next to Color. And you can set the

type of graph (line, point, bar, pie, etc.) using the dropdown at the top of

the Marks window.

But where Tableau really becomes interesting is when you manipulate plot

color and size and labels by assigning other data fields to those qualities.

Let’s do an exercise to see what I mean.

Page 39: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

39 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Exercise: Data-Defined Plot Symbols

In the Measures window—the dependent variables that I loaded from my

master spreadsheet—there are Final CO2 concentration, Relative Humidity,

and Temperature. To express these values on the geographical map that

was created in the previous exercise, we can assign one or more of them not

to one of the plot axes, but to a plot symbol quality—like color or size, or

even as a text label.

So, let’s make some assignments. I’ve decided that I want temperature to

be represented by the color of the circles, and the size of the circles to

represent the relative humidity.

Drag and drop the Relative Humidity field into the Size bubble of the

Marks window

Drag and drop the Temperature field into the Color bubble

What happened? Tune into the next section, Measures, to find out.

Measures

In our last episode, we dragged Relative Humidity and Temperature into the

Size and Color bubbles in the Marks window. This is what happened:

Page 40: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

40 | P a g e B. Burress – Chabot Space & Science Center

November 2011

While we don’t see much, if any, difference in the sizes of the plot symbols,

the colors do show some variation: different shades of green (or gray, if you

are reading a black and white print).

Also, two new windows showed up under the Marks window: one labeled

―SUM(Relative Humidity),‖ with a stack of partial circles and associated

numbers, and the other ―SUM(Temperature)‖ with a numbered color scale.

I’ll mention here that you can change the range of sizes and colors by

clicking the dropdown menu at the top of each of these new windows, but

first I’d like to focus on a matter of grave importance.

If you point the mouse at one of the plot symbols on the map and hover

there, an information window appears that reveals all of the data parameters

of that point—latitude, longitude, relative humidity, and temperature. Take

a look at the values of the last two—the dependent variables.

Knowing what the value of any single measurement of either of those

parameters should be, do the reported values for these two make sense?

Page 41: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

41 | P a g e B. Burress – Chabot Space & Science Center

November 2011

The key to the puzzle is in the ―Measure‖ of the data being viewed—not to

be confused with the term Measure used earlier to mean dependent variable.

Since the plot point for each geographic location contains all the data points

for each parameter you loaded, we have to define how those ranges of data

points are handled before being plotted.

Tableau’s default Measure for a range of multiple values is to Sum them—

add them all up. That’s why those data fields, now in the Color and Size

bubbles, show the function

SUM(parameter).

You can choose a different Measure

(data handling function) by clicking

the dropdown menu at the right end

of each of the MEASURE(parameter)

icons. Click the dropdown arrow,

then click Measure.

You will get a menu of the available

functions—Sum, Average, Median,

Minimum, Maximum, Deviations,

Counts, etc. When you select one,

that function that will be applied to

the data.

Exercise: Change a Data Handling Function

As an exercise, let’s change the function for both Temperature and Relative

Humidity to something other than SUM. For now, I’m going to choose

AVERAGE.

Click the dropdown menu icon at the right side of the SUM(Relative

Humidity) data icon

Click Measure on the menu that appears

Click Average from the next menu

Do the same thing for the SUM(Temperature) data icon

Now, back to the appearance of the plot we got. First, I’ll mention that in

the Map Options menu to the right of the workbook, I chose to select Streets

and Highways as well as Place Names, so these items have been added to

Page 42: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

42 | P a g e B. Burress – Chabot Space & Science Center

November 2011

my base map—in case you were wondering where that information came

from. There will be more discussion of Map Options later.

Exercise: Customize Data Point Appearance

Let’s customize the color scale of the temperature data a bit:

Clicking the dropdown at the top of the Temperature color scale

window, select Edit Colors. Here’s what I get:

I can change the color of the existing palette by clicking the green square

and choosing a different color. I can also choose a different color palette by

clicking the dropdown menu under Palette.

Click the Palette dropdown menu

Choose the color palette labeled ―Red-Blue Diverging‖

You may notice that in the color scale preview, Tableau has assigned the red

end to the lower temperature values and the blue end to the higher

temperatures. This isn’t as intuitive as I’d like, so:

Check the box labeled ―Reversed‖

Clicking OK, what do we get? If you’re following along, you will see the

change—but I’m not going to waste space with another screenshot until

we’ve dealt with the plot symbol sizes, right now:

Click the dropdown menu at the top of the Relative Humidity symbol

sizes window and click Edit Sizes. Here’s what we get:

Page 43: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

43 | P a g e B. Burress – Chabot Space & Science Center

November 2011

First, you can select how the sizes of the plot symbols vary through the top

dropdown: automatically, by range, or from zero. The option that gives the

greatest control over the range of sizes is ―by range.‖

Select ―By range‖

Now, the ―Mark size range‖ slider has two controls, one to set the smallest

symbol size and the other the largest.

Drag the Smallest and Largest size slides around until you are satisfied

with the size range

Click Apply when

you want to see the

change

What I settled on is

shown in the picture to

the right.

Notice first that the

changes I made to the

Measure data handling

function is shown:

AVG(parameter). So,

what the map is showing

for these two dependent

Page 44: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

44 | P a g e B. Burress – Chabot Space & Science Center

November 2011

variables is the average value of the data—and of course, without further

manipulation, these would be the averages of all the data points in the entire

time series.

So now, in a glance, we see the locations of the five sites I have chosen, the

average temperature measured at each site (color), and the average relative

humidity (size). If the data is accurate, what this tells us is the sites

farthest to the east (in the Berkeley Hills) had the highest average humidity

and lowest average temperatures, and the downtown Berkeley and UCB

main campus sites had higher average temperature and lower average

humidity.

Not bad, for starters. We can do a lot more. Keep in mind that the

techniques covered in this example can be applied to all other forms of

graphs.

Exercise: Add a Third Dependent Variable to the Map

As a final exercise for this section, let’s add one more thing to the mix.

Drag and drop the AVG(Relative Humidity) data field from Size to

Label in the Marks window

Drag the Final-CO2 data from the Measures window to the Size bubble

in Marks and change its data handling function from the default SUM

to AVERAGE

Page 45: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

45 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Now the meaning of the plot symbol sizes has changed; no longer do they

represent different levels of average humidity, but CO2 concentration.

Humidity, instead, is expressed through numerical labels. We can now look

for geographic dependencies on all three measured dependent variables.

Exercise: Text Labels

The ―Label‖ Mark bubble is also useful for displaying text labels. As a quick

extension of the previous exercise, remove the AVG(Relative Humidity) from

the Label bubble (either by right-clicking it and selecting ―Remove‖ or

dragging it back to the Measures window) and drag in the ―Name‖ item from

the Dimensions window and drop it into the Label bubble. My result:

Page 46: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

46 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Georeferencing and Map Plotting

You’re already well into plotting georeferenced data on maps; the preceding

exercises dealt with many of the basics.

The Map Options menu lets you change the color scheme of the base map

with three options: normal, gray, and dark.

You can also add to the base map any of the checkbox items in the list.

Mostly, these items—borders, labels, and street maps—help with the visual

location of data sites, but don’t show the stunning detail of Google Earth.

Used in concert, however, Google Earth and Tableau can do a lot.

Another useful addition to your base map can be found in the Data Layer

dropdown menu, from which you can choose from a number of

demographical data maps.

Exercise: Add a Demographic Data Layer to Your Map

Choose Population from the Data Layer dropdown under Map Options

In the expanded window that appears, choose ―Block Group‖ from the

―By‖ menu

You may also change the color palette through the ―Using‖ menu

Population density is now shown by block groupings.

Time

Let’s throw in another independent variable, or Dimension: Time.

Page 47: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

47 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Time data in our uploaded data set contains date and time, down to hours

and minutes.

Like the other data fields, we can drag and drop Time anywhere we like,

even though some arrangements will not make sense. Tableau dares you to

be adventurous.

Exercise: Add Time to the Visualization

With the data still being shown on the map as in the last exercise:

Drag the Time field from the Dimensions window and drop it into Rows

What happens?

What should have happened: a column labeled Year of Time appeared

alongside the map, with the year of the data set inside the column. Not very

useful yet? Let’s keep going….

Hover your mouse over the header ―Year of Time‖; you should see a

―+‖ symbol appear

Click the ―+‖

What happened?

A second column, labeled

―Quarter of Time,‖ should

have appeared, the column

displaying the Quarter of the

year (Q1, Q2, Q3, or Q4) that

the data was taken in. The

Quarter may not be very

useful to our visualization, but

let’s keep going….

Hover the mouse over

Quarter of Time and

click its ―+‖; a ―Month

of Time‖ column

appears

You can keep expanding the

Page 48: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

48 | P a g e B. Burress – Chabot Space & Science Center

November 2011

time by clicking the ―+‖ symbols; Month expands into Day, Day into Hour,

Hour into Minute, as long as those time divisions are present in the data set.

You may also collapse the breakout by clicking on the ―-― symbol of an

expanded time division bubble.

When you reach a point where the data in your set is spread across more

than one time division, the data will be divided into multiple rows, each row

representing the smallest time division.

If you find that you don’t want to display every one of the available time

divisions on your graph, there’s an easy way to show only the ones you’re

interested in. Here’s how:

First, collapse the time bubbles back to years by clicking the ―-―

symbol in the YEAR(Time) bubble

Right-click on the YEAR(Time) bubble and select from the pop-up

menu which time division you want displayed—choices are Year,

Quarter, Month, Day, and

More. More contains a sub-

menu with Hour, Minute, and

Second. For this exercise,

select Day.

You will end up with a single time

bubble--DAY(Time)—and a single

time column in your plot: Day of

Time. You may change this single

column to a different division in

the same way (right-click it), or

you may expand to HOUR by

clicking ―+‖ as before.

See the picture for an example

result. What this portion of the

example shows is the mapped data

for hour 17:00 and 18:00 on the

20th day of the month.

Recall that our data Measures were

Page 49: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

49 | P a g e B. Burress – Chabot Space & Science Center

November 2011

set to AVERAGE of temperature, humidity, and CO2 levels, so what each map

plot in this example shows us is the average of these for each hour.

If you expand further, to MINUTE(Time), each row will show a map plot of

the average data values for each minute.

Use the scrollbar to the right to scroll through all the map plots in the data

range.

General Graphs

And now for something completely different.

We’ve had some practice with plotting georeferenced data on maps. If you

recall, what got us started on map plotting was when we double clicked the

Latitude and Longitude Dimensions and saw Tableau place them in the Rows

and Columns of our plot and automatically treat those Dimensions as

geographic locations, creating a map.

But if you also recall, when we deliberately dragged the Latitude to Columns

and Longitude to Rows, Tableau did not create a geographic map, but a

simple graph. That, in general, is what happens when you drag data into

your plot—the automatic creation of a geographical map was a special case.

If you start with a blank workbook and start clicking on data values, you can

see where Tableau decides to put them, or

you can drag the data where you like:

Columns, Rows, Label, Color, and Size.

Tableau will [attempt to] plot your data as

you choose.

Show Me

You can change the style of your

visualization and how the data is

represented manually, as we’ve been

doing. You can also change an existing

visualization to another style quickly by

using the Show Me button.

Show Me contains a library of pre-defined

data visualizations. When you click Show

Page 50: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

50 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Me, a menu will appear showing the library, some of which may be grayed

out. Tableau lets you choose visualizations it thinks will make sense based

on the data you’ve already placed in your current visualization.

Here’s the fun part: try some! See what you get. Some may not work out

perfectly, but you may stumble upon one that really pops—and maybe

shows you relationships or patterns in your data that you didn’t see before.

Hovering over one of the library icons brings up a quick description of the

visualization type.

Exercise: A Quick Plot Style Change Using Show Me

Choose the ―Line (Discrete)‖ icon—first column, third row of the library

This is what I got:

Page 51: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

51 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Notice how Tableau arranged the data: Date/Time went into Columns, all

three measured parameters went into Rows, and Latitude and Longitude

were thrown into a box called Level of Detail.

Each measured parameter is graphed versus time in its own row, and there

is a data curve for each of the five locations in my data set. Tableau also

placed the hours scale at the top and the minutes scale on the bottom.

The Level of Detail window allows you to separate the data in a series by

another data parameter—in this case each Latitude-Longitude pair. In this

case, this has produced a separate curve for each data site location.

Continuing to play around with the data:

Drag the Longitude field in Level of Detail and drop it into Color in the

Marks window

What did this do? Each data curve still represents one of the five data

collection sites, but they are now color coded by Longitude.

There are many other Show Me’s try out, and many nuances of settings and

functions in Tableau that you might learn given enough time. But for now,

before we use what we’ve learned to proceed with some investigations,

there’s one last tool you should be familiar with: Filters.

Filters

You can place a Filter on any data parameter and specify exactly how you

want that data to be filtered—what values or ranges of values in a data

series to include in the visualization.

Exercise: Apply a Data Filter

First, let’s set up our earlier visualization showing the data plotted on the

map. You can start that from scratch if you like, or rearrange the

visualization from the last exercise to get there:

Drag Longitude to Columns

Drag Latitude to Rows

Drag Time to Rows (make sure to collapse time into a single bubble

before dragging—or find out what happens when you drag DAY(Time)

and leave HOUR(Time) behind)

Drag AVG(Relative Humidity) to Label

Page 52: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

52 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Drag AVG(Temperature) to Color

Drag AVG(CO2) to Size

You should now be back at the earlier map

we visualized in the exercise from Time

section, showing plotted maps for every hour

of the day that you can scroll through from

top to bottom.

Now, let’s say that we’re only interested in

seeing the data taken at, say, a certain hour

of the day or with values within a certain

range.

Let’s apply a filter to HOUR(Time) so that we

only see plots for the hour of 12 (noon in the

24 hour time format used in our data):

Right-click on the HOUR(Time) bubble

Select Filter from the pop-up menu

In the General

tab of the window that

pops up, click ―None‖

to uncheck all the

hours

Check the box

next to 12

Click OK

You should now be

looking at maps

showing your data for

only the hour of 12 for

each day in the data

set, as in the picture

below.

Take this one step

further by adding a

Page 53: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

53 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Filter for CO2. Let’s only plot CO2 data with values greater than 460 ppm:

Right-click on AVG(CO2)

Select Filter

From the window that pops up, click ―At Least‖ and enter ―460‖ in the

text field at the lower (left) end of the slide scale (or drag the slider to

460)

Click OK

I don’t know what you got, but on my map, some of the data points

vanished! Those ones, presumably, are the ones whose average values were

less than 460 ppm.

Page 54: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

54 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Recipes

Just to give you some ideas, here are a few quick visualization ―recipes‖ I

cooked up. In case you’re reading a black and white print, be warned, these

look much better in color....

Page 55: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

55 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Recipe 1: Carbon Fingerpainting

Drag Time to Columns

Drag AVG(CO2) to Rows

Drag AVG(Temperature) to Color

Drag AVG(Relative Humidity) to Size

Edit Color scale with blue for cooler and red for warmer temperature

Edit Size scale to allow ease of seeing differences across the humidity

range

Edit Time axis limits to show only one full day of data

Look for trends in CO2 levels and how they correlate with weather conditions

and time of day.

Keep in mind that the plotted data is for all data collection sites, so trends

and other behaviors may be regional effects, or arise from changes at

specific sites whose locations are not visualized, and so contribute

anonymously.

Page 56: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

56 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Recipe 2: Blowing Bubbles

Drag Time to Columns

Right-click Time and select DAY(Time)

Drag AVG(Relative Humidity) to Rows

Drag AVG(Temperature) to Color

Edit Colors with blue showing coolest and red showing warmest

temperatures

Drag AVG(CO2) to Size

Edit Size for easy viewing of CO2 differences

Keep in mind that the data plotted is from all collection sites, averaged over

an entire day.

Page 57: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

57 | P a g e B. Burress – Chabot Space & Science Center

November 2011

Recipe 3: Coloring Within the Lines

Drag Time to Columns

Drag AVG(CO2) to Rows

Drag AVG(Temperature) to Color

Edit Color scale with blue as cooler and red as warmer temperature

Edit vertical scale (right click on it and Edit) and set scale range to

400-500 ppm

Look for a correlation between CO2 levels (data curve) and temperature

(color of data curve).

There is an obvious temperature correlation with time of day, as might be

expected, shown by the repeating color patterns: cold at night, warm at day.

The measured CO2 levels also have a daily repeating pattern, as well as a

longer term trend.

Page 58: Data Visualization Guide

Climate Science Investigations – BEACON Data Visualization Guide

58 | P a g e B. Burress – Chabot Space & Science Center

November 2011

The question may be asked, are the changing CO2 levels dependent on

temperature, or maybe on one or more other factors with a daily cycle?