the 2016 watson analytics global competition examining the ... 2016 examining the... · social...
TRANSCRIPT
The 2016 Watson Analytics Global Competition
Examining the Relationship between the U.S. Economy and Temperature Change
Tera Black
Christopher Hutwelker
Jeffrey Peck
Dr. Michael Gendron
Faculty Sponsor
Central Connecticut State University
New Britain, Connecticut
April 6, 2016
U.S. Economy & Temperature Change 2
Contents
Table of Figures .............................................................................................................................. 2
Introduction ..................................................................................................................................... 3
Literature Review............................................................................................................................ 3
Methodology ................................................................................................................................... 4
Hypothesis................................................................................................................................... 4
Data Sources and Cleanup .......................................................................................................... 4
IBM Data Set Scores ................................................................................................................... 5
Results ............................................................................................................................................. 5
Predictions................................................................................................................................... 6
Social Media ............................................................................................................................... 7
Temperature Anomalies .............................................................................................................. 8
Gross Domestic Product Growth ................................................................................................ 8
Industry Sectors Breakdown ..................................................................................................... 10
Limitations .................................................................................................................................... 13
Future Research ............................................................................................................................ 14
Discussion ..................................................................................................................................... 14
References ..................................................................................................................................... 15
Data Sources ................................................................................................................................. 15
Table of Figures
Figure 1Predictive Tool .................................................................................................................. 6 Figure 2 Social Media Dashboard Pane .......................................................................................... 7 Figure 3 Temperature Anomalies Dashboard Pane ........................................................................ 8 Figure 4 GDP Growth Dashboard Pane .......................................................................................... 9 Figure 5 Top Industry Predictor Dashboard Pane ......................................................................... 10 Figure 6 Related Industries Dashboard Pane ................................................................................ 11 Figure 7 Related Industries Dashboard Pane ................................................................................ 12 Figure 8 Percentage of Agricultural Gross Output by the Total Private Industries Gross Output 13
U.S. Economy & Temperature Change 3
Introduction
The World Economic Forum recently said, “Climate change is the most severe global
economic risk of 2016” (Hulac, 2016). To determine the effects climate change has had on the
U.S. economy over the past decade, Watson’s technology will compare the relationship between
climate change and the economy. Prior research on this topic is mostly limited to future
predictions and not analyzing past years to determine if there is any relation. There are numerous
studies showing a relationship between one specific indicator of climate change and economy to
make predictions for the future. However, limited studies exist on comparing the effect an
indicator of climate change has had on past years as a method to predict future research.
To determine if there is a relationship between the U.S. economy and climate change we
will use two measurements. The U.S. economy is measured using the annual Gross Domestic
Product (GDP) metric. The climate change indicator used was annual temperature anomalies
since temperature is “one of the most obvious signals of climate change” (National Oceanic and
Atmospheric Administration, 2016). The purpose of this study is to determine if a relationship
exists between anomalous temperature change and the U.S. GDP.
The specific goal of this study is to analyze the effect temperature anomalies have on the
gross output of the agricultural industry. Starting in 1960, the relationship between the
agricultural industry’s gross output as a percentage of total gross output of private industries and
the temperature anomalies will be determined. With the ability to use Watson, the potential
benefits of determining the relationship are significant. Being able to predict the agricultural
gross output using temperature anomalies stands to improve the decision making of
policymakers. Providing agricultural workers an improved means of assessing and forecasting
future years will strengthen the industry and improve the economy. This potential relationship
will also be applicable in other countries providing them the same economy strengthening tools.
Literature Review
In 2015, a study by Burke et. al. (2015) analyzed the global non-linear effect of
temperature on economic production. Economic productivity is best defined as “the efficiency
with which societies transform labor, capital, energy, and other natural resources into new goods
or services”. Further, this study states that as the effects of climate change increase, future
economic productivity will decrease. This study examines how a specific indicator affects the
global economy without considering potential impacts of other indicators.
The University of Maryland produced a report in 2007 on the economic impact of climate
change and the potential cost of inaction against climate change. Included in the report were the
costs and benefits for each region of the United States (i.e. New England, Midwest, Southest,
etc.). This report also addresses the differences between researchers in regards to how climate
change affects the economy. Most researchers believe climate change will negatively affect the
economy, yet other researchers believe climate change will improve the economy. The
improvement is related to new industries that will be created as a result (i.e. green energy
resources, among others). The report by the University of Maryland dismissed this theory by
U.S. Economy & Temperature Change 4
stating “although there may be temporary benefits from a changing climate, the costs of climate
change rapidly exceed benefits and place major strain on public sector bugdet, personal income
and job security” (University of Maryland, 2007). The report concludes by stating that each area
of the U.S. will be affected differently, but will have a cumulative negative effect on the U.S.
economy. While the Univeristy of Maryland used data for previous years to find their results, the
correlation over the past thirty years was not determined.
Methodology
Within the constraints of the project, the primary focus of the project centers around the
utilization of IBM Watson. This tool allows for easy analysis, visualization, and prediction of
one or many data sets. In conjunction with the Watson analytics tool, other applications such as
Microsoft Excel allow for data preparation for the upload process within IBM. Together these
tools provide the means of testing of our hypothesis.
Hypothesis
Gradual and rapid climate changes are something currently affecting the world around us.
This research will evaluate if climate change has a direct relationship with Gross Domestic
Product in the United States and if those effects can be predicted using IBM Watson.
Data Sources and Cleanup
Within the constructed data set, there is a section of data that contains climate
information consisting of world averages, annual minimum, maximum and average temperature
with the associated anomalies within the date range of 1960 to 2014. Additionally, exporting the
average annual precipitation and the Palmer Drought Severity Indices for analytical use. This
data is obtainable from the National Oceanic and Atmospheric Administration (NOAA) ‘Climate
at a Glance’ section. All of the GDP related data was exported from The World Bank website,
which is automatically limited to the date range of 1960 to present, which is the rationale used
for the date range selection within the weather data.
The World Weather Data was obtained through the climate data set on the National Aero Space
Association (NASA) website. NASA records the surface temperatures for the world and displays
the changes between 1880 and present day in Celsius. For consumer use, this information can be
downloaded in pre-averaged CSV files, which are converted to Fahrenheit to be consistent with
the rest of the weather data used within the project.
Another source of data used was the Bureau of Economic Analysis, which is an agency of
the United States Department of Commerce that provides economic statistics regarding GDP of
the United States. The interactive data location offers many data sets that can be downloaded, but
the specific data set utilized is the GDP-by-Industry data. This data set only contains content
from 1997 to 2014, so any visualizations with this have been appended with the proper date
range of weather and GDP specifications. Since all of the data throughout the project has been
utilized in years, there was little conversion necessary other than matching monetary leading
zeros.
The last source of data came from Watson’s Social Media tool, which allows for an
automatic search of keywords over a variety of social media platforms. This presented the
U.S. Economy & Temperature Change 5
perfect opportunity to search both of the topics within this project: Gross Domestic Product and
climate change. Once all of the associated terms have been decided upon and entered within the
application, Watson does the legwork and outputs a comprehensive data set, which can be used
in the other Watson tools.
Some formatting/cleanup was needed to provide data in a useable manner for Watson.
The weather data was exported in sections (average, min, max and drought) and compiled into
one data source. The World Bank data was downloaded with all countries formatted in rows.
This information was transposed into columns and reduced to data for the United States only.
Any fields within this data set that were not recorded or not being used were left blank. This
impacts some exclusionary conditions of Watson and additionally impacts the quality score.
These blanks were filled with zeros and excluded anywhere that was applicable in the
visualizations. The two independently cleaned data sets were then merged into one spreadsheet
that could be uploaded to the IBM Watson web interface.
IBM Data Set Scores
Within the cleanup process, one of the main areas used to assess the quality of the data
was the quality score metric within the Watson web application. Per the IBM Watson Analytics
documentation (IBM, 2015) making sure there are minimal blank rows, removing summary data,
eliminating column headings, etc. allowed for a higher rating. All of the data sets used in this
project have been a rating of 80 or above with a rating of “High Quality”.
Results
The utilization of Watson’s ability to analyze data, predict and use other external social
media-based sources makes it a logical application to not only seek trends, but also build useable
dashboards. This application allows for visualizations of the multiple data sources in the
dashboard area in a cohesive manner. This solution presented in this project creates a dashboard
that consists of seven panes, with related visualizations included within each.
U.S. Economy & Temperature Change 6
Predictions
Figure 1Predictive Tool
One of the most innovative tools that IBM Watson offers is its predictive tool. This
allows the selection of variables within your data set and creates a spiral graph that incorporates
relevant predictors. You have the ability to narrow your prediction down by easily adding or
subtracting fields allowing for combinations that are more predictive, or easier to understand.
Within Figure 1, the initial targets selected within the data are all GDP industry totals, specific
agriculture fields, the associated percentages, and temperature anomalies. This will give you a
quick glance at related fields within the datasets such as agriculture, forestry, fishing and hunting
gross outputs. Together these are an excellent predictor of total GDP in private industries at
91.3%. While this does not signify temperature anomalies have a direct causal relationship on
driving agriculture based GDP prices, it allows for an indicator that there is valid reason to
continue research.
U.S. Economy & Temperature Change 7
Social Media
Figure 2 Social Media Dashboard Pane
Watson’s Social Media tool was used to select a focus for this study at it allowed us to analyze
the popularity of the topic “climate change” by individual countries. As shown in Figure 2, the
relationship between popularity of climate change and GDP results are evident. The U.S. is the
leader in discussing climate change in relation to GDP. The next four countries with mentions
include England, Canada, Australia and England. This led us to focus on the U.S. and how
climate change has affected that economy over the past years.
To determine the potential influence of temperature in relation to GDP growth, the annual
average temperature anomaly is compared to the annual GDP growth. The average temperature
anomaly is defined by the National Oceanic Atmospheric Administration (NOAA) as “a
departure from a reference value or long-term average.” Additionally, a positive anomaly
indicates that the annual temperature was higher than the reference value, while a negative
anomaly indicates that the temperature was lower (National Oceanic and Atmospheric
Administration, 2016).
U.S. Economy & Temperature Change 8
Temperature Anomalies
Figure 3 Temperature Anomalies Dashboard Pane
A clear pattern emerges when visualizing the minimum and maximum temperature
anomalies by decade, visualized in Figure 3. From the 1960s to the present, the anomalies are
increasing, resulting in wider fluctuations in temperature each year. The significance of the
1980s cannot be overlooked. The temperature anomalies became a positive value, which
coincides with the start of global warming. The changing temperature anomalies provide an
opportunity to compare how one specific indicator, climate change, has effected the economy.
Gross Domestic Product Growth
Gross Domestic Product (GDP) is the “standard measure of the value of final goods and
services produced by a country during a period minus the value of imports” (Organization for
Economic Co-operation and Development, 2016). Additionally, “GDP is one of the most
comprehensive and closely watched economic statistics” (Bureau of Economic Analysis, 2015).
Using GDP and temperature anomalies allows us to compare the effect that temperature
fluctuations have had on the U.S. economy.
U.S. Economy & Temperature Change 9
Figure 4 GDP Growth Dashboard Pane
A pattern manifests when comparing the yearly GDP growth percentage with the
temperature anomaly average, seen in Figure 4. Several outliers due to external factors appear,
such as the oil crisis that occurred 1979 to 1982 and the Great Recession 2008 to 2009. The first
observation is that GDP growth occurs the most when the change in the temperature anomaly is
minor compared to the previous year. The average change in temperature anomaly is 0.45
degrees Fahrenheit compared to the average change in GDP was 3.04 percent. The highest GDP
growth was in 1984 when the temperature anomaly is 0.04 degrees Fahrenheit. Additionally,
2012 featured the highest anomaly and resulted in 2.211 percent GDP growth, less than half the
average change. The relationship between the annual temperature anomaly and the U.S.
economy is not clearly definable in this visual.
U.S. Economy & Temperature Change 10
Industry Sectors Breakdown
Figure 5 breaks down the U.S. economy as three industry sectors to find how varying
annual temperatures effect each sector. The U.S. Census Bureau defines three categories of
private industry sectors; goods producing (manufacturing), services, and information
communication and technology industries. Temperature change affects each sector differently
and will allow further analysis on how the economy is affected.
Figure 5 Top Industry Predictor Dashboard Pane
At first, it does not appear changing temperature anomalies from 1997 to 2014
significantly impacts each sector. However, due to limitations, properly formatted data prior to
1997 was not available. Being able to calculate data from 1960 would provide additional data to
better analyze each sector’s individual impact. As seen in previous figures, the annual
temperature has varied by increasing and decreasing since 1960. Without additional data,
analyzing the relationship between GDP and temperature would not be conclusive here.
U.S. Economy & Temperature Change 11
Figure 6 Related Industries Dashboard Pane
The U.S. Census Bureau further delineates the U.S. economy into 13 distinct industries.
The three industry sectors mentioned previously are comprised of these 13 industries. Figure 6
allows for the selection of each of the 13 industries individually to analyze industry wide GDP
and average temperature anomalies. After using Watson to compare how each industry is
influenced by temperature anomalies, the agricultural and recreation industries stood out. Both
industries have similar patterns and as a result, temperature change affects them similarly, as
displayed in Figure 7.
U.S. Economy & Temperature Change 12
Figure 7 Related Industries Dashboard Pane
The agriculture industry appears to have adapted in the U.S. to the effects of climate
change when compared to other industries. However, the National Climate Assessment report
states “increased innovation will be needed to ensure the rate of adaptation of agriculture and the
associated socioeconomic system can keep pace with climate change over the next 25 years”
(U.S. Global Change Research Programs, 2014). Similarly, the recreation industry is easily
adaptable to changes in temperatures and will continue to adapt to gradual climate change.
U.S. Economy & Temperature Change 13
Figure 8 Percentage of Agricultural Gross Output by the Total Private Industries Gross Output
Looking more closely at the agricultural industry, the relationship between agriculture’s
gross output and temperature anomalies was compared. Gross output is a more comprehensive
metric than GDP and is defined as “a measure of an industry’s sales or receipts, which can
include sales to final users in the economy (GDP) or sales to other industries (intermediate
inputs)” (U.S. Department of Commerce, 2014). Starting in 1960, a pattern is established as the
gross output of the agricultural industry, as a percentage by the total private industries gross
output, is the highest when the temperature anomaly is either negative or approximately 0.00. In
1997, the temperature anomaly remained positive, resulting in the lowest gross output by the
agricultural industry. The pattern indicates how temperature change effects the agricultural
industry, and provides valid evidence to expand the research on this relationship.
Limitations
When comparing GDP and climate change there are a series of limitations that should be
taken into consideration when analyzing this data. The majority of the data regarding GDP is
broken out by year and region of the world. This allows for forecasting over several decades, but
not all aspects of the GDP have been recorded since 1960 (when the data sets began). Due to the
time constraints of this Watson-based research project the decision was made to focus on the
annual trending for the U.S. only.
Climate data was obtained from the National Oceanic and Atmospheric Administration
(NOAA) website/FTP location. This data can be downloaded two ways: annually, averaged for
the whole U.S. or monthly, broken out by weather station. The monthly option contains all
weather recordings by location throughout the U.S. on a daily basis. These files are large and
U.S. Economy & Temperature Change 14
would take a significant amount of time to compile into one data source. Within the constraints
of this project, there was not adequate time to process this data into a useable format.
Future Research
Considering the available data, there are a few directions future research can take. The
main areas would be either expanding to a worldwide scale or expanding the weather and region
data in the U.S. The NOAA resource has enough data to breakout the U.S. by region and state, as
well the ability to get more granular. This could expand on learning patterns within the main
agricultural areas in the U.S. Additional new or existing data sets could be structured in a more
comprehensive manner. Changing the data sets from the main points being laid out in columns to
a crosstab style would allow for further comparison between data points. This may be dependent
on future Watson development or additional time to format and cleanup the data.
Discussion
The intention of this research is to evaluate if climate change has a direct relationship
with Gross Domestic Product (GDP) in the U.S. and if those affects can be predicted using IBM
Watson. The results of this study show that while there is a pattern or trend, a definitive
relationship cannot be established at this time. Due to the limitations discussed previously, along
with the visualized trends, it also cannot be said at this time that the relationship does not exist.
Based on the trends demonstrated during this research it is recommended that additional research
be conducted either by expanding to a worldwide scale or by evaluating the data on a more
granular level.
This study also examined the relationship between the agricultural industry and
temperature anomalies. The goal was to determine the feasibility of using temperature anomalies
as a metric to predict the agricultural industry’s gross output. We found there to be an inverse
relationship with the gross output and temperature anomalies. This may be due to a variety of
extenuating factors, but a relationship between gross output and temperature anomalies is
apparent. Further research would facilitate a way to increase the confidence level in our
assumptions and lead to use of predictive analysis tools within the agricultural industry. Potential
benefits of such tools would allow for prediction and improvements of yields, reductions in risk
and loss, possible avoidance of climate related hazards, and overall better management of
agricultural activities.
U.S. Economy & Temperature Change 15
References
Bureau of Economic Analysis. (2015). Measuring the Economy. U.S. Department of Commerce.
Burke, M., Hsiang, S. M., & Miguel, E. (2015). Global non-linear effect of temperature on
economic production. Nature, 235-239.
Environmental Protection Agency. (2014). Climate Change Indicators in the United States. U.S.
EPA.
Hulac, B. (2016). Top Economic Risk of 2016 is Global Warming. ClimateWire.
IBM. (2015). Data Loading and Data Quality. Retrieved from IBM Watson Analytics:
https://community.watsonanalytics.com/introduction-to-data-loading-and-data-quality/
National Oceanic and Atmospheric Administration. (2016, March 11). Global Surface
Temperature Anomalies. Retrieved from National Centers for Environmental
Information: https://www.ncdc.noaa.gov/monitoring-references/faq/anomalies.php
National Oceanic and Atmospheric Administration. (2016, March 12). Global Temperature
Anomalies - Graphing Tool. Retrieved from NOAA Climate.gov:
https://www.climate.gov/maps-data/dataset/global-temperature-anomalies-graphing-tool
Organization for Economic Co-operation and Development. (2016, March 12). Domestic
product. Retrieved from Organization for Economic Co-operation and Development:
https://data.oecd.org/gdp/gross-domestic-product-gdp.htm
U.S. Department of Commerce. (2014, April 22). Frequently Asked Questions. Retrieved from
Bureau of Economic Analysis: http://www.bea.gov/faq/index.cfm?faq_id=1034
U.S. Global Change Research Programs. (2014). National Climate Assessment. Washington,
D.C.: U.S> Global Change Research Programs.
University of Maryland. (2007). US Economic Impacts of Climate Change and the Cost of
Inaction. College Park, Maryland: Center for Integrative Environmental Research.
Data Sources
Bureau of Economic Analysis- Industry Data:
http://www.bea.gov/iTable/iTable.cfm?ReqID=51&step=1#reqid=51&step=51&isuri=1&5101=
1&5114=a&5113=pgoodgo,pservgo,ictgo&5112=1&5111=1997&5102=15
National Centers for Environmental Information- Climate at a Glance:
http://www.ncdc.noaa.gov/cag/
The World Bank- World Development Indicators:
http://data.worldbank.org/topic/climate-change