a data resource for cloud cover simulations

72
A Data Resource for Cloud Cover Simulations Graham Nicholas Sortino T H E U N I V E R S I T Y O F E D I N B U R G H Master of Science School of Informatics University of Edinburgh 2006

Upload: others

Post on 15-Apr-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Data Resource for Cloud Cover Simulations

A Data Resource for Cloud Cover Simulations

Graham Nicholas SortinoT

HE

U N I V E R S

IT

Y

OF

ED I N B U

RG

H

Master of ScienceSchool of Informatics

University of Edinburgh2006

Page 2: A Data Resource for Cloud Cover Simulations

Abstract

This document describes the construction of a database to assist in climate analysisrelated to global warming. It was motivated by National Center for AtmosphericResearch Atmospheric Physicist John Latham and University of Edinburgh EngineerStephen Salter who are attempting to develop a technique to mitigate temperature in-creases related to global warming. The database will assist them in determining op-timal locations for increasing cloud reflectivity to redirect more of the sun’s energyaway from the Earth, although it may be applicable to other domains as well. Resultsinclude a preliminary analysis of optimal locations suggested by this work.

i

Page 3: A Data Resource for Cloud Cover Simulations

Acknowledgements

I would like to thank the following people and organizations for their innumerable con-tributions:

University of Edinburgh:

Dr. James FrewXibei JiaCarwyn Edwards

International Satellite Cloud and Climatology Project:

Dr. William B. RossowDr. Yuanchong ZhangDr. Chris BrestEly N. Duenas

European Centre for Medium-Range Weather Forecasts:

Keith Fielding

British Atmospheric Data Center:

Dr. Shoaib SufiBrian Lawrence

Physical Oceanograpgy Distributed Active Archive Center:

Ted Lungu

Langley Atmospheric Sciences Data Center:

Paul Carter

National Center for Atmospheric Research:

Dr. Natalie Mahowald

...And last but not least my advisor Peter Buneman and co-conspirators StephenSalter & John Latham. Please accept my most sincere thanks for your leadership,guidance and patience.

ii

Page 4: A Data Resource for Cloud Cover Simulations

Declaration

I declare that this thesis was composed by myself, that the work contained herein ismy own except where explicitly stated otherwise in the text, and that this work has notbeen submitted for any other degree or professional qualification except as specified.

(Graham Nicholas Sortino)

iii

Page 5: A Data Resource for Cloud Cover Simulations

To Sarah Beth:

Without your love and support none of this would have been possible.

iv

Page 6: A Data Resource for Cloud Cover Simulations

Table of Contents

List of Figures vii

1 Introduction 1

1.1 Optimal Locations . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Novelty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background 6

2.1 The Earth Radiation Budget . . . . . . . . . . . . . . . . . . . . . . 62.2 Clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Latham & Salter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Description 16

3.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1.1 Shortwave Radiation . . . . . . . . . . . . . . . . . . . . . . 173.1.2 Albedo & Droplet Concentration . . . . . . . . . . . . . . . . 183.1.3 Cloud Amounts . . . . . . . . . . . . . . . . . . . . . . . . . 193.1.4 Spray Vessel Variables . . . . . . . . . . . . . . . . . . . . . 193.1.5 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.2.1 International Satellite Cloud Climatology Project (ISCCP) . . 233.2.2 European Centre for Medium-Range Weather Forecasts (ECMWF) 25

3.3 Data Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

v

Page 7: A Data Resource for Cloud Cover Simulations

3.4 Schema Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.5 Optimal Locations Algorithm . . . . . . . . . . . . . . . . . . . . . . 32

3.5.1 Pseudo Code . . . . . . . . . . . . . . . . . . . . . . . . . . 333.5.2 I/O Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.5.3 Example Run Through . . . . . . . . . . . . . . . . . . . . . 34

4 Analysis 36

4.1 Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.2 White Box Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.3 Black Box Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.3.1 Surface Air Temperature . . . . . . . . . . . . . . . . . . . . 394.3.2 Wind Speed and U & V Components . . . . . . . . . . . . . 404.3.3 Mean Total Cloud Cover . . . . . . . . . . . . . . . . . . . . 40

4.4 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.5 Initial Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5 Conclusion 52

5.1 Open Question & Future Work . . . . . . . . . . . . . . . . . . . . . 52

A User Guide 55

A.1 How to Access the Database & Run the Optimal Locations Algorithm 55

Bibliography 60

vi

Page 8: A Data Resource for Cloud Cover Simulations

List of Figures

2.1 (Not to Scale) Depiction of the Earth Radiation Budget. All units arein watts per meter squared. Adapted from [38] . . . . . . . . . . . . . 7

2.2 Determining Optical Depth [44] . . . . . . . . . . . . . . . . . . . . 92.3 It becomes more difficult to increase albedo the greater a clouds initial

optical thickness is. Graph is plotted from ISCCP Data [45]. . . . . . 102.4 The amount of each cloud type present in the atmosphere is not neces-

sarily proportional to the size of their respective box. Figure adaptedfrom [45] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 From left to right top to bottom: high (net-warmers), mid (neutral),and low level (net-coolers) clouds. Thin arrows represent shortwaveradiation and thick arrows represent longwave. Images taken from [25] 12

2.6 Determination of CCN requirement [22]. . . . . . . . . . . . . . . . . 132.7 Image: Salter’s proposed albedo spray vessel design (artwork by: John

MacNeill). Used with permission from [36]. . . . . . . . . . . . . . . 14

3.1 Primary variables for calculating optimal locations. . . . . . . . . . . 173.2 Various levels of incoming shortwave radiation measured in watts per

meter squared at different times of the year. Areas with dark red ex-perience the most incoming shortwave radiation while those in darkblue experience the least. Image provided by the NOAA-CIRES Cli-mate Diagnostics Center, Boulder, Colorado, from their Web site athttp://www.cdc.noaa.gov/[27] . . . . . . . . . . . . . . . . . . . . . . 18

3.3 Estimation of droplet concentration given a value for optical thicknessand cloud droplet radius. First determine column droplet concentrationand use that to estimate droplet concentration with a presumed heightof 800 meters [12] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

vii

Page 9: A Data Resource for Cloud Cover Simulations

3.4 Amounts of low-level stratocumulus, mid and high level clouds fromleft to right top to bottom respectively. Light blue indicates the great-est concentrations and dark red indicates the least. Image and dataprovided by [18]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.5 Wind direction and speed for May 10, 2006. Arrows indicate directionwith red arrows representing the strongest winds. This image was pro-duced from the SeaWinds instrument aboard the QuikSCAT satelliteprovided by NASA-PODAAC [29]. . . . . . . . . . . . . . . . . . . 21

3.6 Determination of wind speed and direction [39] . . . . . . . . . . . . 213.7 Determination of cloud base height in meters [39] . . . . . . . . . . . 223.8 ISCCP equal area grid (also known as a reduced Gaussian grid). There

are 6596 cells starting at cell 1 in the lower left and going towards theright and then up [45]. . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.9 Typical ECMWF Equal angle grid for their datasets. Used with per-mission from[42] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.10 This shows the lat and lon for different cells using an ISCCP equalarea grid. For a given cell number first determine its lat index. Latis then calculated by multiplying the index by 2.5◦. To obtain the lonsubtract the cell id from the begin cell number and multiply that by thelon interval. Table taken from ISCCP DX/D1/D2 Documentation [45]. 28

3.11 Runtime cost of integrating the 3 datasets into the database. All unitsare in minutes unless otherwise specified. . . . . . . . . . . . . . . . 29

3.12 Data is spread across the Primary and Supplementary tables, whichhave a 1 to 1 relationship. The look up table is used for computingoptimal locations and is discussed in the next section. Note: Due tospace limitations not all supplementary columns could be shown. . . . 30

3.13 a) Indicates estimated column, tuple, and page sizes. b) Indicates esti-mated database size. Note: This does not take into account the cost ofindexes or other DB2 management techniques which may increase size. 31

4.1 The 12 tests preformed for each variable/temporal pair . . . . . . . . 384.2 Dataset Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . 394.3 Comparison of surface temp in degrees Kelvin for NOAA-CDC data

(left) and this climate database (right). . . . . . . . . . . . . . . . . . 41

viii

Page 10: A Data Resource for Cloud Cover Simulations

4.4 Comparison of U & V (top and bottom respectively) wind componentsfor NOAA-CDC data on the left and this climate database on the right. 42

4.5 Comparison of mean wind speed for NOAA-CDC data (left) and thisclimate database (right). . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.6 Comparison of mean total cloud cover for NOAA-CDC data (left) andthis climate database (right). . . . . . . . . . . . . . . . . . . . . . . 44

4.7 Benchmark test results. . . . . . . . . . . . . . . . . . . . . . . . . . 454.8 A definition of optimal locations. . . . . . . . . . . . . . . . . . . . . 464.9 Quarterly optimal locations predictions for 2001. Note: uncolored

cells represent missing or undefined data. . . . . . . . . . . . . . . . 494.10 Full year optimal locations predictions for 2001. Note: uncolored cells

represent missing or undefined data. . . . . . . . . . . . . . . . . . . 504.11 Quarter 1 (January - March) 2001: Stratocumulus cloud concentration 504.12 Full year 2001: On the left high and mid level cloud amounts. On

the right areas with large concentrations of low level clouds and smallamounts of mid/high clouds above. . . . . . . . . . . . . . . . . . . . 51

ix

Page 11: A Data Resource for Cloud Cover Simulations

Chapter 1

Introduction

Global climate depends heavily upon the ability of clouds to reflect incoming solarenergy away from the Earth[24]. This has prompted much research into clouds as apotential deterrent to climate change. However, determining their viability is extremelydifficult because many variables are involved in sophisticated interactions. In addition,any study of these processes must examine data that is of global scope. A mass analysissuch as this suggests the use of database technology.

This document describes the construction of a database to assist in climate analysisrelated to global warming. The end result of which is what I believe to be the first multi-year climate resource capable of finding locations that exhibit desirable conditionson a global scale. It was motivated by National Center for Atmospheric ResearchAtmospheric Physicist John Latham and University of Edinburgh Engineer StephenSalter who are attempting to develop a technique to mitigate temperature increasesrelated to global warming.

Their proposal calls for the use of ocean spray vessels to seed clouds with the tinysalt particles left over from evaporated sea water. This will increase cloud reflectivityand redirect more of the sun’s energy back out towards space[36]. While theoreticallypossible [44], it is also quite controversial and its feasibility requires answering someimportant questions, such as: where should their vessels be used? How much shouldthey spray? How many will be needed to counteract anthropogenic (man-made) tem-perature increases.

The database will assist researchers such as Latham & Salter in answering theirquestion of where theses sea vessels should be used; although, it may be applicable toadditional domains as well.

Early results (section 4.4) suggest new locations such as the North Sea above West-

1

Page 12: A Data Resource for Cloud Cover Simulations

Chapter 1. Introduction 2

ern Europe. In addition, they provide further evidence to support the claim made by[36, 22] that the areas off the west coasts of the Americas and Africa are also wellsuited.

1.1 Optimal Locations

Establishing what constitutes an optimal location for using spray vessels involves con-sulting a number of climate variables and is highly subjective because no agreed upondefinition exists. Thus it is not the goal of this project to specify exactly the types ofconditions optimal locations exhibit. Instead it aims to provide researchers with themeans to determine this based upon their own definitions.

This optimal locations query can be defined in terms of a classic computer sciencemaximization problem where there are a number of variables, each of which must beweighted, scored, and tallied. Coordinates with the highest total score must then bereturned to the user. While the problem is simple to define, the solution is much morecomplex. Data which spans decades and is global in size must be fed into complexcalculations. This, coupled with a significant number of variables to be considered,requires a different way of thinking.

First and foremost, CPU calculations are not as important as disk accesses (I/O).This is because the amount of data involved ranges in the hundred’s of gigabytes anddisk accesses are three orders of magnitude more expensive then CPU calculations.Secondly, very large datasets must be indexed and organized in order to minimize I/Osearch cost. Given these constraints the logical choice is to utilize the relational modelfor data management which exceeds at minimizing disk I/O.

1.2 Challenges

There are many challenges facing this project. Each involves applying a mature andwell studied technology, the relational model, to a new domain.

Extensive climate research must first be conducted in order to understand the rela-tionships between the relevant variables. Once the domain of discourse is sufficientlyunderstood, datasets encompassing the relevant variables are to be procured. In orderto qualify as a viable dataset it is required to meet stringent qualifications such as: cov-erage of the necessary variables as well as having suitable temporal (time span) andspatial resolutions.

Page 13: A Data Resource for Cloud Cover Simulations

Chapter 1. Introduction 3

After datasets are chosen they must be cleaned, transformed and integrated intoa uniform format. A data parser must be robust, efficient, and able to handle inputdatasets in excess of one hundred gigabytes without failure. Finally, the schema is tobe constructed, which allows optimal locations to computed as efficiently as possible.

1.3 Novelty

A comprehensive literature review has strengthened my view that no attempt at an-swering this question has yet been made. There are a number of reasons why I believethis to be so.

Firstly, the required pre-requisite knowledge in terms of both data management andclimate sciences create high barriers to entry. Furthermore, it is difficult to gauge thecomplexity of multi-disciplinary projects because all dimensions may not be known inadvance.

Secondly, climate research typically looks at the relationship between one or twovariables such as [13, 37, 14] or global climate models such as [2, 35], which try topredict future conditions. Not much research has actually been put into looking forlocations that exhibit conditions based upon several variables. This may be due to thefact that it is a relatively new question and full knowledge of the exact microphysicalprocesses are still in dispute [11]. It may also be due in part to the fact that globalsatellite datasets have only been available for a relatively short time and much workhas been put into increasing their reliability and accuracy as opposed to analysis.

1.4 Outputs

The primary outputs of this project are:

• A global climate database built in DB21 covering the relevant variables.

• An efficient & robust parser implemented in Java for integrating multiple het-erogeneous datasets into a uniform format for insertion into the database.

• An efficient algorithm implemented as a DB2 stored procedure, which is ex-ecuted on the database and used in determining optimal locations to increasecloud reflectivity.

1DB2 is a registered trademark of IBM

Page 14: A Data Resource for Cloud Cover Simulations

Chapter 1. Introduction 4

In addition, its utility could easily be extended with the inclusion of new algo-rithms or components. For example, it could also be used to determine the immediateafter-effects of increasing cloud reflectivity based upon a number of variables alreadycontained within. Or complex visualization tools could be built to assist in analysis.Thus the real value here is the database itself. Therefore a strong emphasis has beenput into creating a resource that can provide a solid foundation for future work.

Since the output of this project is a tool to be used by others, results should notbe expected to include detailed predictions of where optimal locations are located andwhat exactly the correct definition of optimal is. These questions will be left for theusers to determine. Instead, results primarily show the tests conducted in order tobuild confidence in the correctness of the work produced. In other words users of thisresource can be assured that any results obtained are valid.

This being said, work was scheduled according to a timeframe, which allowedsome preliminary predications to be made (section 4.4). However, it must be empha-sized that they are just predictions the legitimacy of which requires further analysis byclimate experts.

1.5 Outline

The rest of this paper will proceed as follows.Chapter 2 is an introduction to the climate sciences that form the basis of this

project. It discusses the Earth Radiation Budget, which is responsible for controllingthe Earth’s climate and the important role clouds play in regulating it. This is then tiedinto a deeper discussion of the research being conducted by Latham & Salter as wellas the crucial role this dissertation project plays in their work.

Chapter 3 discusses the methodologies used in building the resources as well asjustifications of any design choices. It begins by analyzing the variables used in deter-mining optimal locations. Using these variables some hypotheses are made to gaugewhere optimal locations may be found. Next data sources and integration are carefullyexamined. It then concludes with a description of the database schema and the optimallocations algorithm.

Chapter 4 attempts to build confidence in this resource as a research quality tool.First a number of tests designed to determine its quality are examined. This is followedby an analysis of some preliminary optimal locations, which were predicted by thisproject.

Page 15: A Data Resource for Cloud Cover Simulations

Chapter 1. Introduction 5

Chapter 5 wraps up this dissertation with a short conclusion followed by a discus-sion of open questions and areas of future work.

Additionally, an appendix has been provided containing more detailed informationconcerning the work undertaken, as well as documentation for using the database todetermine optimal locations.

Page 16: A Data Resource for Cloud Cover Simulations

Chapter 2

Background

Chapter 2, the content of which forms the backbone of this project, aims to give thereader a better understanding of the complex micro & macro processes involved inregulating the climate of the Earth. It starts on the macro level looking at the largescale picture of the Earth Radiation Budget examining how it controls radiation levels,which influence global climate. Next the intricate microphysical processes of cloudsare analyzed, including the manner by which their reflectivity is controlled. This isthen tied into Latham & Salter’s work. The goal here is to gain a better understandingof how their research proposes to exploit these processes to alter cloud reflectivity andredirect more heat away from the Earth’s surface. This also includes looking at someunknown components of their work including the key question this dissertation projecthopes to help them answer: “Where should one use their proposed technique to reflectthe most energy, as efficiently as possible?”

2.1 The Earth Radiation Budget

The Earth Radiation Budget is a complex system, which regulates the temperature ofthe Earth by controlling the amounts of radiation that enter and leave the atmosphere[25]. Some of the strongest influences on the system are incoming short-wave (solar)radiation and outgoing longwave radiation. Solar radiation is produced by the sun,absorbed by the Earth and generally measured in the in the range of .4 to 5 µm wave-lengths. Longwave radiation is emitted by the Earth and generally measured in therange of 5 to 200 µm wavelengths [45].

Another variable important to the regulation of the Earth Radiation Budget is theatmosphere itself, which helps control the amount of incoming shortwave and outgo-

6

Page 17: A Data Resource for Cloud Cover Simulations

Chapter 2. Background 7

Outer Atmosphere

1370

343

237

237

Incoming Shortwave

Outgoing Longwave

Earth Radiation Budget

390

Figure 2.1: (Not to Scale) Depiction of the Earth Radiation Budget. All units are in watts

per meter squared. Adapted from [38]

ing longwave radiation. Taken altogether the atmosphere is what gives the Earth itsgenerally mild climate. Without it the Earth would experience wild temperature varia-tions between day and night such as those which occur on the moon where there is noatmosphere to trap solar energy [38].

The atmosphere controls radiation levels via an important property called albedo,which measures the amount of radiation reflected from a body on a scale from 0 to 100percent. 0 indicates nothing is reflected and 100 means that all radiation is reflected.Different bodies have different albedos. For example, the albedo of water or dirt isgenerally low while that of clouds and ice is typically much higher [38].

Generally speaking, the Earth Radiation Budget is in equilibrium. That is, themean annual incoming shortwave radiation flux is approximately equal to the meanannual outgoing longwave radiation flux. It is estimated that the average amount ofsolar radiation which reaches the outside of Earth’s atmosphere is 1370 Wm−2 (wattsper meter squared) and about 1

4 or 343 Wm−2 actually enters the upper portion of theEarth’s atmosphere. Of this, approximately 30% is reflected back out to space by theEarth’s global albedo. Thus 343− (343∗ .3) or about 237 Wm−2 actually reaches theearth’s surface.

The shortwave radiation which is not reflected is absorbed mostly by the Earth butalso to a smaller extent by the atmosphere. This absorbed radiation warms the Earth

Page 18: A Data Resource for Cloud Cover Simulations

Chapter 2. Background 8

and atmosphere and will eventually be emitted as longwave radiation back out towardsspace [25]. Taken as a whole, approximately 390 Wm−2 of mean annual flux longwaveradiation is released out towards space; however, only about 237 Wm−2 of that actuallyleaves the atmosphere. This is due to similar factors which reflect incoming shortwaveradiation. The difference between the longwave radiation which is released from thesurface and that which actually leaves the atmosphere helps give the Earth its relativelymoderate climate [38]. This system is modelled in figure in figure 2.1.

One factor, which the above model has not taken into account is the role playedby substantial increase in anthropogenic (man made) aerosols that have been releasedinto the atmosphere during the industrial age. Taken altogether it is estimated that theycontributed to an increase in longwave radiation at the surface of approximately 2.5Wm−2. This is the basis for global warming and climate change [38].

Counter intuitively, the effect of anthropogenic (man made) aerosols is not entirelynegative as they also help reduce the amount of shortwave radiation which reachesEarth and thus help to slightly counteract their warming effect. This is accomplishedin two ways: by directly scattering shortwave radiation in clear air and indirectly byincreasing the reflectivity of clouds [38, 37]. The prior will not be discussed muchfurther; however, the physics involved in the latter are of consequence to the approachoutlined by Latham [22] because as will be seen, aerosols whether natural or artificial,play a significant role in determining the albedo of clouds.

2.2 Clouds

The general introduction to the Earth Radiation Budget leaves out a very importantdeterminate in the climate of Earth: clouds. Thus before continuing this discussion ofradiation and reflectivity it is important to have a better understanding of clouds, theirproperties, functions, and why they are important.

2.2.1 Properties

The basic definition of a cloud is a collection of condensed drops of water either in aliquid or solid (ie. frozen) state, which form around tiny particles and are cooled ator below their dew point [30]. The tiny particles which clouds form around are calledcloud condensation nuclei (CCN) and are commonly composed of dust, salt, soot, oranother similar material. CCN are approximately 0.1 µm or more in diameter [22] and

Page 19: A Data Resource for Cloud Cover Simulations

Chapter 2. Background 9

hNr 22πτ =

N = droplet concentration in cm-3

= optical depth (which is directly related to albedo)

r

= average droplet radius in µm (typical radii are 10 µm for liquid drops and 30 µm for frozen drops)

h = height from cloud top to base commonly measured in km (a typical depth is 1 km)

τ

Figure 2.2: Determining Optical Depth [44]

range in concentration from 10 to 5000 per cubic centimetre. There are generally fewerper cubic centimetre (in the range of 10 - 100) over sea and greater numbers over land(commonly ranging from 500 - 1000), especially in the more industrialized northernhemisphere [44]. In fact in highly polluted areas cloud condensation nuclei levels canbe as high as 5000 per cubic centimetre. This is because as was alluded to previouslyanthropogenic (man made) aerosols also act as a CCN [8].

An important property of clouds is their albedo (or reflectivity), which is primarilyinfluenced by the size and concentration of the droplets that they are composed of.Droplet radius and concentration are influenced by the number of available CCN. ThusCCN determines droplet concentration and radius which determines albedo. In otherwords clouds with tightly packed small droplets are denser, can reflect more radiation,and thus have higher albedo’s [38]. This was first theorized by Atmospheric PhysicistSean Twomey in his oft cited paper ”The Influence of Pollution on the ShortwaveAlbedo of Clouds [44]” published in 1977. Twomey developed a formula to calculatethe optical depth of a cloud (see figure 2.2), which can then be used to determinealbedo.

In his formula optical depth τ, which is a measure of the amount of radiation pre-vented from passing through a column of the atmosphere [3] is calculated from theinputs: droplet concentration, average droplet radius, and the height from cloud top tobase. Once optical depth has been found albedo can easily be determined 1 via a directcorrelation (see figure 2.3).

Notice that a proportional increase in optical thickness does not necessarily result1A more accurate approximation of albedo can be found using the properties: single scattering

albedo and the asymmetry factor. A formal discussion of which can be found in [44]. However, thisproject will use optical depth as it is most appropriate for this situation.

Page 20: A Data Resource for Cloud Cover Simulations

Chapter 2. Background 10

Correlation of Albedo and Optical Thickness (TAU)

0.0%

20.0%

40.0%

60.0%

80.0%

100.0%

120.0%

-50.000 0.000 50.000 100.000 150.000 200.000 250.000 300.000 350.000 400.000

Optical Thickness (TAU)A

lbed

o

Figure 2.3: It becomes more difficult to increase albedo the greater a clouds initial

optical thickness is. Graph is plotted from ISCCP Data [45].

in a proportional increase in albedo. For example a very small increase in optical thick-ness of 0.03 (from 0.020 to 0.050) results in an increase in albedo of 7%. However,a much larger increase in optical thickness of over 260 (from 109.8 to 378.65) resultsin a proportionately much smaller increase of only 5.5%. Keeping in mind that CCNis a key determinate in optical thickness and albedo, this means that clouds with lowamounts of CCN can have their albedo’s increased a substantial amount with minimaleffort while those with higher amounts require a much greater increase to raise theiralbedo. This fact is extremely important to Latham and Salter’s work because in orderto increase albedo as efficiently as possible they must target clouds with low initialCCN concentrations.

The key point here is that a cloud’s albedo (reflectivity) is determined by the num-ber and concentration of its cloud condensation nuclei and those with lower levels ofconcentrations require less effort to raise their albedo than those with higher levels.

2.2.2 Classification

Albedo is an important factor in the regulation of the Earth Radiation Budget becauseit helps determine how much shortwave radiation emitted from the sun is reflectedback out towards space. Since clouds cover a large portion of the Earth’s surface andtheir albedo taken as a whole is greater than or equal to the 30% global albedo they areconsidered by some to be a potential deterrent to anthropogenic climate change. Thislast point is the essential idea behind Latham and Salter’s proposal [22, 21, 36]: useclouds to reflect more shortwave (solar) radiation by raising their CCN concentrations.

Page 21: A Data Resource for Cloud Cover Simulations

Chapter 2. Background 11

ISCCP Cloud Classification

Cirrus Cirrostratus Deep Convection

Altocummulus Altostratus Nimbostratus

Cumulus Stratocummulus Stratus

Cloud Optical Thickness

Clo

ud T

op P

ress

ure

(Mili

bars

)

50

180

310

440

560

680

800

10000 1.3 3.6 9.4 23 60 379

Low - (ground to 3200 meters)

Middle - (3201 to 6500 meters)

High - (6500 to 19000 meters)

Figure 2.4: The amount of each cloud type present in the atmosphere is not necessarily

proportional to the size of their respective box. Figure adapted from [45]

However, while it is true that clouds overall exert a cooling effect some do so more thanothers and furthermore, some such as high altitude cirrus clouds actually help warm theatmosphere [25]. Because of this it is worthwhile to briefly classify the different typesof clouds into 3 main categories: high, middle, and low altitude. Each main categorycan then be subdivided by optical thickness. See figure 2.4 for a general classification.

Low level clouds such as stratocumulus are classified as those that have an altitudeof between 0 and 3200 meters [18]. Overall they exert a cooling effect because they arevery thick, which gives them a greater albedo. Marine low clouds, for example, exertan annual globally-averaged net cooling effect of -15 Wm−2[15]. In addition, becausetheir tops are close to the ground their temperature is comparable as well thus they donot trap very much outgoing longwave radiation[25].

High clouds such as cirrus are typically very thin and thus have low albedo’s.Therefore, they allow more solar radiation into the atmosphere. Additionally becausethey are so high their temperature is very low compared to the Earth’s surface whichmeans they work to trap outgoing longwave radiation and send it back towards theEarth. Thus their net effect is to warm the atmosphere [25].

The third type of clouds are middle level and they exist at an altitude of between3200 and 6500 meters [18]. They can be very thick, however, unlike low level cloudswhose tops have a temperature comparable to that of the Earth’s surface the tops of

Page 22: A Data Resource for Cloud Cover Simulations

Chapter 2. Background 12

Figure 2.5: From left to right top to bottom: high (net-warmers), mid (neutral), and low

level (net-coolers) clouds. Thin arrows represent shortwave radiation and thick arrows

represent longwave. Images taken from [25]

mid level cloud are much higher and thus cooler. So even though their thickness givesthem a high albedo they also work to trap outgoing longwave radiation and send it backtowards Earth. Therefore they are considered neutral [25].

In terms of cooling ability the distinct characteristics each cloud type exhibits sug-gest that certain clouds are more desirable than others. Since low-level clouds areconsidered the best coolers an attempt at increasing albedo should be targeted to-wards them. Furthermore, locations with mid and/or high level clouds above should beavoided because this could trap some of the reflected radiation. As will be discussed inthe following section this knowledge was used by Latham and Salter in their proposedtechnique to counteract climate increases.

2.3 Latham & Salter

Given what is known about the Earth Radiation Budget and the integral role clouds playin its regulation a logical question one may ask is: “How can we exploit the relationshipbetween CCN and the blocking of incoming solar radiation to cancel out the effects of

Page 23: A Data Resource for Cloud Cover Simulations

Chapter 2. Background 13

HNmM =

M = CCN per unit area of the Earth’s surfaceH

= the altitude over Earth where that they will be sprayed

N

= the desired droplet concentration produced by the increase in CCN

m = the mass of the CCN (measured in grams)

Figure 2.6: Determination of CCN requirement [22].

global temperature increases?” This is the question that was addressed by NationalCenter for Atmospheric Research, Atmospheric Physicist John Latham first in 1990[21] and then expanded upon in his 2002 paper “Amelioration of Global Warming byControlled Enhancement of the Albedo and Longevity of Low-Level Maritime Clouds[22].” It was then tackled from an engineering perspective by University of EdinburghEngineer Stephen Salter [36] in order to address technical challenges. This sectionlooks at their theories and proposals.

In [22] Latham suggests seeding clouds with the salt derived from evaporated seawater because there is an abundant and readily available supply. He then tackles thepractical questions of how much cloud condensation nuclei (CCN) would be requiredto combat global temperature increases and what the impediments to doing so are.

Using a target droplet concentration (N) of 400 cm−3, which ranges from 2 to 8times the typical levels in ocean air, an altitude of 0.5km and a ocean salt mass of10−14g he estimates that globally 109kg or an increase of 1026 droplets are required(figure 2.6). However, a complication to his estimate is the fact that if artificially in-troduced salt CCN is not added to the lower atmosphere at levels that exceed naturallyoccurring amounts it would lead to a decrease in albedo and a warming effect wouldoccur. This is because cloud droplets form in preference to certain types of CCN andonce a preference is chosen they will not form to others. Although this can be coun-teracted by knowing in advance the amount of CCN required to increase albedo andensuring that each cloud is seeded with the correct amount.

An interesting side effect of artificially introducing CCN into the atmosphere is thatit may work to increase cloud lifetimes and decrease the amount of rain produced. Thisis because larger concentrations of CCN cause droplet radius to decrease and cloudswith smaller droplets take longer to form drizzle[22]. The main benefit of this is thatalbedo per cloud is being increased for a longer period of time, which leads to a greater

Page 24: A Data Resource for Cloud Cover Simulations

Chapter 2. Background 14

Figure 2.7: Image: Salter’s proposed albedo spray vessel design (artwork by: John

MacNeill). Used with permission from [36].

cooling effect.Latham also discusses many technical, scientific and ethical concerns regarding his

proposal all of which require further study before its feasibility can properly be gauged.Chief among the scientific questions is the exact microphysics behind the regulationof cloud albedo. While most scientists are in agreement to the relationship betweencloud condensation nuclei (CCN) and albedo, the exact mechanism is in dispute. Forexample, in [11] Atmospheric Scientist Q. Han argues that traditional methods of cal-culating albedo, such as the Twomey Equation (figure 2.2) may be incorrect becausethey assume that cloud water content stays the same as albedo is altered and this is notnecessarily true.

Ethically speaking, the biggest concern is the effect of raising CCN levels on aglobal scale. Determining this requires the use of global climate models such as [35,2], which have only recently become feasible due to the large amount of computingresources required to run them.

The technological barrier of creating a machine capable of “seeding” low-levelmarine stratocumulus clouds is speculated upon but not fully addressed by Latham. Itis undertaken in greater depth by University of Edinburgh Engineer Stephen Salter inhis paper “Sea-Going Hardware for the Implementation of the Cloud Albedo ControlMethod for the Reduction of Global Warming [36].” Salter, who in 1974 developed amethod to convert the motion of waves into electricity, has considerable experience in

Page 25: A Data Resource for Cloud Cover Simulations

Chapter 2. Background 15

dealing with engineering challenges on the sea. His design (see figure 2.7), the exactmechanics of which are not relevant to this project, works by sucking up sea waterusing specially designed vessels and spraying the remaining salt residue into the loweratmosphere.

2.4 Conclusion

Given what is known about the Earth Radiation Budget it is easy to see how importantclouds are in its regulation. Furthermore because we have a theoretical mechanism tocontrol cloud albedo and we know that low-level clouds such as stratocumulus are thebest net-coolers we also know what and how to target them. The important question,which remains to be answered, is: “where are the optimal locations to increase thealbedo of clouds located?” Some of the variables necessary to answer that questionsuch as: albedo, shortwave radiation, CCN, and droplet concentration have alreadybeen discussed. The rest will be examined in the following chapter.

Page 26: A Data Resource for Cloud Cover Simulations

Chapter 3

Description

The previous chapter covered a broad over-view of the Earth Radiation Budget andwhy clouds are seen by some as a mechanism to exert a measure of control over it. Itended by examining a proposal for increasing cloud albedo and noting that in order toaccomplish this one must know in what locations this should be done because someare considered more ideal then others.

With this backdrop, Chapter 3 tackles the optimal locations query from a computerscience perspective. The components of this challenge are analyzed chronologically.It begins by looking at the full set of variables used to determine optimal locations.Each is covered in sufficient depth to give the reader an understanding of how it isderived and the role it plays. Following this, the task of obtaining datasets covering therelevant variables and integrating them into the database is examined. This feeds into adescription of the climate database schema and finally the optimal locations algorithm.

3.1 Variables

While analyzing the variable list in figure 3.1 the first item to take note of is thatwe are most interested in stratocumulus clouds. Stratocumulus clouds, which havelong been studied by atmospheric scientists, exhibit a number of properties that makethem particularly desirable for CCN seeding. They have a low altitude, are prevalent,considered net-coolers, and have moderate to low initial albedo’s, which make themideal targets for increasing their reflectivity.

Secondly, each variable can be loosely broken down into two categories: derivedand original. Original variables are the simplest to use and the most accurate. Theycome directly from their respective data source and do not require much additional

16

Page 27: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 17

Variable Name Units Data Source Type Mean Albedo For Stratocumulus Clouds

Percentage from 0 to 100 ISCCP-D1 Derived

Percent Stratocumulus Clouds Percentage from 0 to 100 ISCCP-D1 Original

Percent Clouds above Stratocumulus

Percentage from 0 to 100 ISCCP-D1 Original

Estimated Droplet Concentration for Stratocumulus cm-3 ISCCP-D1 Derived Incoming Shortwave Radiation at 680 Milibars

Watts per meter-2

ISCCP-FD (PRF) Original

Surface Temperature Kelvin ECMWF Original Boundary Layer Height Meters ECMWF Original Direction the Wind is Blowing From 0-360 Degrees ECMWF Derived Cloud Base Height Meters ECMWF Derived

Mean Wind Speed Meters per second ECMWF Derived

Figure 3.1: Primary variables for calculating optimal locations.

work in order to load them into the database. Their main limitation is that they do notcover all variables we are interested in. Thus derived variables, which are calculatedfrom originals, are also used. While not as accurate as originals they do provide criticalinformation necessary to determine optimal locations.

Finally, there are a number of other variables, which are also relevant to this ques-tion; although they are not considered to be of primary importance. Therefore, theywill not be used in calculating optimal locations. However, once a location is foundthey will need to be used as a check to determine whether it really does exhibit the typesof conditions necessary for CCN seeding. Thus they are considered supplementary1

and are also included (just not calculated) in the climate database.Each variable will now be covered in more detail.

3.1.1 Shortwave Radiation

As discussed in the previous chapter incoming shortwave radiation indicates the amountof energy that is directed from the sun towards a point on Earth. This is important be-cause in order to maximize the effectiveness of Latham and Salter’s proposal onlylocations with large amounts of incoming solar radiation should be targeted. Interest-ingly, the amount of radiation which is directed towards a fixed location changes quitedramatically at different times in the year. For example, in July the largest amount ofincoming radiation is directed towards the North Pole, while in January the opposite istrue and if one averages values for an entire year the highest levels are at the equator.This suggests that in calculating optimal locations based upon the variable incoming

1More information concerning them can be found in the section 3.4.

Page 28: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 18

July 2005

January 2005

January to December 2005

Figure 3.2: Various levels of incoming shortwave radiation measured in watts per meter

squared at different times of the year. Areas with dark red experience the most incoming

shortwave radiation while those in dark blue experience the least. Image provided by

the NOAA-CIRES Climate Diagnostics Center, Boulder, Colorado, from their Web site

at http://www.cdc.noaa.gov/[27]

shortwave radiation one must take into account the varying amounts at different timesof the year (See figure 3.2). This project is primarily interested in the amount of incom-ing shortwave radiation which is hitting the tops of stratocumulus clouds. This pointis approximately at an atmospheric pressure level of 680 milibars, which correspondsto an altitude of about 3200 meters.

3.1.2 Albedo & Droplet Concentration

Stratocumulus cloud albedo, which is determined in part by droplet concentration,represents the amount of radiation on a scale from 0 to 100 percent that a body canreflect. Since it is known that it is easier to increase the albedo of a cloud with a lowinitial value these two variables should be minimised.

Albedo is easily derived from optical thickness using the graph in figure 2.3. De-termination of droplet concentration is much more difficult and cannot be calculatedfrom satellite remote sensing techniques alone [12]. Therefore, the approach outlinedin [12] is used to determine column droplet concentration from optical thickness and

Page 29: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 19

)21)(1(2 2 bbrN

ec −−

τ

hNNc ∗=

b = coefficient 0.193

er = droplet radius (µm) τ = optical thickness (TAU)

cN = Column Droplet Concentration (106/cm2)

N = Droplet Concentration (cm-3) h = height assumed to be 800 meters

Figure 3.3: Estimation of droplet concentration given a value for optical thickness and

cloud droplet radius. First determine column droplet concentration and use that to esti-

mate droplet concentration with a presumed height of 800 meters [12]

cloud droplet radius and from there droplet concentration is estimated using an as-sumed cloud height from base to top of 800 meters (See figure 3.3). Estimated columndroplet concentration while not an exact value does provide a good approximation ofwithin 19% and shows a strong correlation to regional droplet concentration datasets[12].

3.1.3 Cloud Amounts

In addition to albedo and droplet concentration it is also necessary to consider theamounts of stratocumulus clouds in the atmosphere. Areas rich in stratocumulusclouds should be considered prime targets, however, the heat trapping caused by midand high altitude clouds must be taken into account as well. Thus ideal locations arethose with large concentrations of low-level stratocumulus clouds and smaller amountsof high-level clouds (see figure 3.4).

3.1.4 Spray Vessel Variables

Sea wind speed, direction, surface temperature, cloud base height and boundary layerthickness are important for answering the more technical question of whether or notthe spraying vessels can be used in a given location. Generally speaking, the sprayingvessels should be used in moderately windy conditions in order to provide enough mo-mentum to lift the cloud condensation nuclei particles up into the clouds. The cloudsshould ideally have a base height which is below the top of the boundary layer in order

Page 30: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 20

Figure 3.4: Amounts of low-level stratocumulus, mid and high level clouds from left to

right top to bottom respectively. Light blue indicates the greatest concentrations and

dark red indicates the least. Image and data provided by [18].

to produce the strongest seeding. Wind direction is important as well because in orderto minimize disturbances the vessels should not be used in locations where the seededclouds could blow over land. See figure 3.5 for sample ocean wind data. With theexception of boundary layer height and surface temperature they are all derived usingstandard meteorological equations. Mean wind speed and direction are determinedfrom the U & V directional components of wind using the equation found in figure3.6. Cloud base height is calculated from surface temperature and dew point using theequation in figure 3.7.

3.1.5 Hypotheses

Now that there is a better understanding of the full set of variables necessary to de-termine optimal locations it is worthwhile to make some early predictions as to wherethey may be located.

Large amounts of incoming shortwave radiation and stratocumulus clouds are prob-ably the most crucial in determining optimal locations. Following them, the mostimportant variables are droplet concentration and albedo as well as the technical vari-ables: wind speed/direction, cloud base height, etc. These variables while importantwill probably not play as significant a role in determining optimal locations. This is be-cause without high levels of shortwave radiation and stratocumulus clouds increasingalbedo would create almost no cooling effect.

Page 31: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 21

Figure 3.5: Wind direction and speed for May 10, 2006. Arrows indicate direction with

red arrows representing the strongest winds. This image was produced from the Sea-

Winds instrument aboard the QuikSCAT satellite provided by NASA-PODAAC [29].

2/122 )( VUM +=

°+°−°= − 180)(tan36090 1

U

V

M = Wind Speed (m/s) U = Eastward wind component (m/s) V = Northward wind component (m/s) α = Direction wind is blowing from (degrees)

C = π2360 =° Angular rotation in a full circle. Used to convert between degrees and radians.

Figure 3.6: Determination of wind speed and direction [39]

As shown in figure 3.4, the areas with the biggest concentrations of stratocumulusclouds are those off the west coasts of the Americas and Africa. As for shortwaveradiation figure 3.2 indicates that the greatest yearly concentrations are near the equa-tor, which happen to be approximately where the 3 stratocumulus cloud concentrationsare located as well. Thus all 3 of these areas will most likely be considered optimallocations (at least when comparing yearly averaged data).

Although, by looking at smaller time increments other locations will likely be sug-gested as well. This is due to the fact that solar radiation changes its focus dependingon the time of the year. One would expect to see some additional locations in thesouthern hemisphere around January and similar trends in the northern hemisphere inJuly.

Page 32: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 22

)(122. dewsurface TTH −∗=

H = height from surface to start of clouds (meters)

surfaceT = Temperature at the surface (Celsius)

dewT = Dew point temperature (Celsius)

Figure 3.7: Determination of cloud base height in meters [39]

3.2 Datasets

In general climate variables are measured by two different methods: regional and/orsatellite measurements. Examples of regional measurements include: ground based,weather balloons, and aircraft. The advantages of regional based measurements arethat the temporal resolution of data can in some cases extend over hundreds of yearsand they are generally more cost effective; however, their spatial resolution is oftenlimited to a small area. Additionally, combining different sets of regional based mea-surements often proves difficult because instruments may be calibrated in differentways with measurements recorded in different formats and at different times. Satel-lite measurements have only been used very recently but offer a standardized globaldataset and massive amounts of data. A third hybrid approach integrates both regionaland satellite data. Because this project is looking at questions which are global inscope only satellite and/or hybrid datasets are used. This section examines the issue ofdatasets and variables in much more detail.

Each original variable is measured by a satellite instrument; however, dealing withthe raw output of these instruments is a task which requires extensive domain knowl-edge that is far beyond the scope of this project. Thankfully there are a number ofgroups which are dedicated to collecting, cleaning, integrating, and documenting thedata into more user friendly products. A group may create datasets which are integratedfrom multiple satellites/instruments or they may focus exclusively on one. For exampleNASA’s Physical Oceanography Distributed Active Archive Center (PODAAC) man-ages a dataset encompassing variables for the NASA QuikSCAT satellite. Conversely,the International Satellite Cloud Climatology Project (ISCCP) D1 dataset combinesmeasurements from 21 different satellites.

In general each group tends to focus on a specific subset of variables such as clouds,ocean properties, or radiation. Additionally, they typically use their own spatial and

Page 33: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 23

temporal (time) resolutions. These complications make the choice of data sourcesextremely important, especially where it pertains to data source compatibility. Giventhis a significant time investment was put into selecting data sources, which ensure notonly compatibility and adequate cover of the relevant variables but include temporaland spatial resolutions expansive enough to help answer Latham and Salter’s questions.

The next few sub-sections discuss the selected datasets, the variables of interest tothis project that they provide as well as their temporal and spatial resolutions.

3.2.1 International Satellite Cloud Climatology Project (ISCCP)

The International Satellite Cloud Climatology Project (ISSCP) was founded in 1982 bythe World Climate Research Program (WCRP) to look at the long-term global effectson clouds and climate [18]. They maintain a number of datasets, which are integratedfrom various NASA and international satellites2. Since ISCCP datasets are createdfrom multiple satellite measurements they provide longer-term temporal coverage thatextends beyond the lifetime of a single satellite. Additionally measurements are oftentaken several times a day and cover a larger portion of the Earth then a single satellite.

ISCCP datasets typically use equal area grids3 with an area of approximately 280kmby 280km for spatially indexing data. Equal Area grids, while at first disorientatingoffer the benefit of ensuring each cell has roughly the same area, whereas with equalangle grids4 such as those typically found on wall maps and road atlases, cell areavaries from very small at the poles to proportionately much larger at the equator. Anadditional benefit of using equal area grids is economy of storage because fewer datapoints must be saved to disk. Measurements are usually provided in 3-hour incrementsstarting at 0 UTC5 (Universal Time Code) and ending at 21 UTC for a total of 8 tem-poral periods per day covering both sun and twilight. For a given time and date thereare up to 6596 available cells in an equal area grid. Each cell includes informationconcerning the type of vegetation (water, ice, desert, etc), topographic altitude, thesatellite which took the measurement as well as a formula for calculating the boundingbox of that cell. A picture of the ISCCP equal area grid can be found in figure 3.8.

For this project two ISCCP datasets were used: ISCCP-D16 and ISCCP-FD (PRF).2ISCCP additionally maintains some regional datasets such as FIRE (First ISCCP Regional Experi-

ment); however, they are not relevant to this project and will not be discussed.3Also known as reduced Gaussian grids. [17]4Also known as regular Gaussian grids. [17]50 UTC represents 12am, 14 UTC represents 2pm, etc...6All D1 data were obtained from the NASA Langley Research Center Atmospheric Sciences Data

Page 34: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 24

Figure 3.8: ISCCP equal area grid (also known as a reduced Gaussian grid). There

are 6596 cells starting at cell 1 in the lower left and going towards the right and then up

[45].

Both the D1 and FD (PRF) datasets use the equal area spatial indexing described abovewith measurements taken at 3-hour intervals and each has a temporal resolution fromabout 1984 to 2004, with more data to be added in the future.

Each dataset will now be described.

3.2.1.1 ISCCP-D1

There are 3 ISCCP D Level Datasets: DX, D1, and D2. Each represents a differentstage in the data integration process. The DX dataset encompasses variables from eachindividual satellite at high resolution, D1 is spatially averaged with merged data fromthe different satellites, and D2 is temporally merged into mean values for a month. Themore integrated the dataset is the easier it is to use, however, that comes at a cost ofsmaller coverage of variables, spatial and temporal resolutions. For this project the D1dataset was chosen for its moderate ease of use and coverage.

Each D1 cell has over 200 variables focused on but not limited to cloud proper-ties. The variables it provides which are of primary concern to this project are cloudamounts, optical thickness/albedo (derived), and droplet concentration (derived) forstratocumulus clouds. There is also data for a number of other relevant variables suchas temperature, pressure, and ice/snow cover although this is not considered to be of

Center[4]

Page 35: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 25

primary importance for calculating optimal locations. Each variable is measured inthree different ways: by type of cloud, by pressure level7, and the aggregate of allclouds in a particular cell.

3.2.1.2 ISCCP-FD (PRF)

The FD or Flux Datasets consist of five different data products: TOA, SRF, PRF, INP,and MPF that measure upwelling longwave, upwelling shortwave, downwelling long-wave, and downwelling shortwave radiation levels. With the exception of the INP andMPF each FD product has identical spatial and nearly identical temporal resolutions tothe ISCCP-D1 dataset. For this project the FD-PRF (Radiative Flux Profiles) datasetwas chosen because it provides the best coverage of variables as well as temporal andspatial resolutions nearly equal to ISCCP-D1.

The FD-PRF measures radiative fluxes at five different pressure levels from thetop of the atmosphere (0 milibars), which is defined to be a height of 100km to thesurface of the Earth (1000 milibars). Thus for a given point in the atmosphere one candetermine the amounts and types of radiation which are going towards and away fromthe surface of the Earth.

An important benefit of the PRF dataset is that it provides fluxes at 680 milibars(aprox. 3200 meters), which is considered to be the maximum pressure level of stra-tocumulus clouds. This is in contrast to other FD datasets such as the TOA (Top ofAtmosphere) and SRF (Surface Radiative Fluxes) that only contain data for a singlelevel. This last point is of importance to this project because we are interested in theamount of incoming shortwave radiation that is hitting low level clouds not the amountthat strikes the surface or the amount at the top of the atmosphere.

3.2.2 European Centre for Medium-Range Weather Forecasts (ECMWF)

The ECMWF, headquartered in Reading, UK, is a highly regarded independent or-ganization supported by European member states. Their primary charter is to assistits clients in medium range weather forecasting. In addition, they also put together anumber of global datasets and make them available for research [40]. In contrast toISCCP, ECMWF datasets are typically distributed on equal angle 2.5◦ Lat by 2.5◦ Longrids. Their global grids typically have 10368 cells, which corresponds to 72 latitude

7Variables measured by cloud type and pressures are determined using the ISCCP Cloud DefinitionChart in figure 2.4.

Page 36: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 26

Figure 3.9: Typical ECMWF Equal angle grid for their datasets. Used with permission

from[42]

and 144 longitude cells (see figure 3.9).For this project a single ECMWF dataset the ERA - 40 was chosen. It will now be

discussed.

3.2.2.1 ERA- 40 Year Re-Analysis

One of the more popular ECMWF products is the ERA - 40 Year Re-Analysis. TheERA - 40 is an interpolated dataset consisting of both satellite and surface based mea-surements. It has a huge temporal resolution spanning from 1957 to 2002 with ten-tative plans to add future data. Measurements are taken at 6 hour increments startingat 0 UTC and in most cases missing/undefined data is interpolated instead of left outas is the case with ISCCP [20]. Unlike most other organizations whose datasets tendto focus in one area the ERA - 40 consists of a broad spectrum of variables rangingfrom soil temperature and wave height to ozone and cloud cover. The benefit of thisis that one can find general data on many parameters. However, it does lack a bit inthe detailed coverage demanded by some researchers. For example, the ISCCP-D1datasets measures cloud variables for over 17 types of clouds, whereas the ERA - 40reports just a few parameters and only by altitude. That being said the ERA - 40 isa high quality climate product and was chosen specifically for its coverage of surfacevariables, its extensive temporal resolution, and moderate compatibility with ISCCPdatasets.

The variables provided by ERA - 40, which are of primary relevance to this project

Page 37: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 27

include: surface temperature, boundary layer height, direction the wind is blowingfrom (derived), cloud base height (derived), and mean wind speed (derived).

3.3 Data Integration

After choosing the appropriate data sources the next task was to integrate the heteroge-neous sources into a uniform relational format. Data Integration follows the followingfive step process:

1. Parse Datasets

2. Data Cleaning

3. Calculate Derived Data Points

4. Transformation into Uniform Schema

5. Build Sets of Tuples & Insert Into DB

This process was implemented in Java and Java DataBase Connectivity (JDBC)drivers were used to load the integrated data into the database.

Each dataset is stored in a unique binary format and accessed using its own Appli-cation Program Interfaces (APIs). Furthermore, because of the need for a standardizedmethod of storing the datasets over multiple decades each organization packs and un-packs their data using oftentimes-complex FORTRAN routines. FORTRAN providesa strong measure of reliability because one can be confident that a program written 30years ago for an f778 compiler will still work on that same compiler today or in an-other 30 years. Given the long time span of many datasets this is extremely important.However, it comes at the expense of data being more difficult to manipulate than in amore modern but relatively new language such as Java. Thus in order to handle theraw data in a uniform manner it must first be unpacked and converted into more robustJava objects.

Once the data is extracted from the original sources it is then cleaned to obtainonly the desired variables and remove any potential anomalies. The cleaned data isthen used to calculate all of the derived data points outlined in the variables sectionabove.

8A commonly used FORTRAN compiler from the 1977 standard.

Page 38: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 28

Lat Index

Lon Interval

No. Cells

Begin Cell No.

End Cell No.

Lat Index

Lon Interval

No. Cells

Begin Cell No.

End Cell No.

1 120.00 3 1 3 37 2.50 144 3299 3442 2 40.00 9 4 12 38 2.50 144 3443 3586 3 22.50 16 13 28 39 2.52 143 3587 3729 4 16.36 22 29 50 40 2.54 142 3730 3871 5 12.86 28 51 78 41 2.55 141 3872 4012 6 10.59 34 79 112 42 2.57 140 4013 4152 7 9.00 40 113 152 43 2.61 138 4153 4290 8 7.83 46 153 198 44 2.65 136 4291 4426 9 6.92 52 199 250 45 2.69 134 4427 4560

10 6.21 58 251 308 46 2.73 132 4561 4692 11 5.63 64 309 372 47 2.79 129 4693 4821 12 5.22 69 373 441 48 2.86 126 4822 4947 13 4.80 75 442 516 49 2.93 123 4948 5070 14 4.50 80 517 596 50 3.00 120 5071 5190 15 4.24 85 597 681 51 3.10 116 5191 5306 16 4.00 90 682 771 52 3.21 112 5307 5418 17 3.79 95 772 866 53 3.33 108 5419 5526 18 3.60 100 867 966 54 3.46 104 5527 5630 19 3.46 104 967 1070 55 3.60 100 5631 5730 20 3.33 108 1071 1178 56 3.79 95 5731 5825 21 3.21 112 1179 1290 57 4.00 90 5826 5915 22 3.10 116 1291 1406 58 4.24 85 5916 6000 23 3.00 120 1407 1526 59 4.50 80 6001 6080 24 2.93 123 1527 1649 60 4.80 75 6081 6155 25 2.86 126 1650 1775 61 5.22 69 6156 6224 26 2.79 129 1776 1904 62 5.63 64 6225 6288 27 2.73 132 1905 2036 63 6.21 58 6289 6346 28 2.69 134 2037 2170 64 6.92 52 6347 6398 29 2.65 136 2171 2306 65 7.83 46 6399 6444 30 2.61 138 2307 2444 66 9.00 40 6445 6484 31 2.57 140 2445 2584 67 10.59 34 6485 6518 32 2.55 141 2585 2725 68 12.86 28 6519 6546 33 2.54 142 2726 2867 69 16.36 22 6547 6568 34 2.52 143 2868 3010 70 22.50 16 6569 6584 35 2.50 144 3011 3154 71 40.00 9 6585 6593 36 2.50 144 3155 3298 72 120.00 3 6594 6596

Figure 3.10: This shows the lat and lon for different cells using an ISCCP equal area

grid. For a given cell number first determine its lat index. Lat is then calculated by

multiplying the index by 2.5◦. To obtain the lon subtract the cell id from the begin

cell number and multiply that by the lon interval. Table taken from ISCCP DX/D1/D2

Documentation [45].

After cleaning, it is then transformed and merged into a uniform format. The for-mat chosen for this project is the ISCCP Equal Area Grid (figure 3.10) with temporalresolutions at 3-hour increments. Since the ECMWF datasets are stored in an EqualAngle coordinate system they must be converted to conform to the ISCCP grid. Thiswas accomplished using the algorithm described in the paper ”Use of Reduced Gaus-sian Grids in Spectral Models [17]”. This essentially involves merging the portions ofECMWF data points, which overlap ISCCP cells using weighted averages. Finally thetransformed data is built into tuples and inserted into the database using JDBC.

The Temporal Resolution of the ISCCP-D1 and FD (PRF) datasets extends from1983 to 2004 and ECWMF ERA - 40, from 1957 to 2002. This yields a set of fullycovered variables that spans from 1983 through 2002. On average the time required tointegrate a single days worth of variables and insert them into the database for a givendataset is 8 minutes. To do the same for all 3 datasets requires 24 minutes and a year

Page 39: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 29

1 Dataset 1 Day

Num Datasets

All Datasets 1 Day

All Datasets 1 Week

All Datasets 1 Month

All Datasets 1 Year

All Datasets 20 Years

8 3 24 168 (2.8 hours)

730 (12 hours)

8760 (≈6 days)

175200 (≈120 days)

Figure 3.11: Runtime cost of integrating the 3 datasets into the database. All units are

in minutes unless otherwise specified.

requires approximately 6 days. This is summarized in figure 3.11.The high runtime cost illustrates an important point: Given the relatively short du-

ration of this dissertation project, time is a significant factor and must be managedappropriately. In order to accomplish this a set of data integration metrics were devel-oped.

• The entire 5 step integration process must be fully automated for a single input.

• Metric 1 must be repeatable for at least 24 hours worth of input.

The first metric requires that integration on a small scale be accomplished withminimal intervention from a human. This allows work to be as automated as possible.The second metric is a scaled up version of the first and states that the same small-scalerequirements must still hold for large amounts of input and must be robust enough toavoid failure in reasonable situations9. Combined, they maximize uptime and ensurethat, for as much as possible, cost is delegated to computation and away from humaninterference, which is far more expensive.

3.4 Schema Design

The fully integrated data is loaded into a DB2 Database Management System using theschema shown in figure 3.12. This section examines the schema, data management,and efficiency issues. Note that a discussion of the “look up table” will be postponedto the next section, as it is used only for the determination of optimal locations anddoes not have any real significant impact on data management.

The schema, while not overly complex, will hold large amounts of data. Therefore,efficiency was deemed the most important design consideration. The primary tableconsists of columns representing the variables, which are to be used in calculating op-timal locations, hence, the name “calculated.” These calculated variables are the ones

9Examples of unreasonable situations are power failures, network downtime, etc...

Page 40: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 30

Day/Twilight & Type of Terrain Code

Pct Stratocumulus Clouds

Cloud Base Height

Est. Droplet Con. for Stratocumulus

Position (Lat,Lon)

Surface Temperature

Mean Albedo For Stratocumulus Clouds

Pct. Clouds Above Stratocumulus

Incoming Shortwave Radiation at 680 Milibars

Has

Primary (Calculated)

Data

Supplementary(Non Calculated)

Data

1

1

Albedo, CCN, Height, Temp, Pressure, etc...for other cloud types

LW Radiation Fluxes

SW Radiation Fluxes

Aggregate Cloud Data

Look Up Table

Min_Val

Max_Val

Variable

Satellite ID

Database Schema- Entity Relationship Diagram

Mean Wind Speed

Time (YYMMDDTT)

Boundary Layer Height

Cloud Base Height

Direction the Wind is Blowing From

Time (YYMMDDHH)

Position (Lat,Lon)

Figure 3.12: Data is spread across the Primary and Supplementary tables, which have

a 1 to 1 relationship. The look up table is used for computing optimal locations and

is discussed in the next section. Note: Due to space limitations not all supplementary

columns could be shown.

discussed in the previous sections and are essential to determining optimal locations forCCN seeding. Additionally, there are many other supplementary variables provided bythese datasets, which while not critical in calculating optimal locations are neverthe-less important, hence these are called ”non-calculated.” The non-calculated variablesare stored in the supplementary table and there exists a one to one relationship betweenthe two tables.

The reason for splitting up the data into two tables is simple. By minimizing thephysical size of tuples in the primary table it ensures that access times are likewiseminimized. Or to put it another way the smaller the number of tuples, the more can fiton a data page thus providing faster I/O access. This is evident in an almost 6 timesfaster response time for a primary table query when compared to the same query onthe supplementary table. The supplementary table, which will not be queried as often,can then warehouse the bulk of the data. As an additional benefit, if one needs datafrom both tables joining on the primary keys is a low-cost one to one operation.

In regards to indexing strategies, B+ clustered indexes in the form (Temporal, Spa-tial) were added to the primary key fields of both the supplementary and primary tables.Un-clustered B+ indexes were added to all columns in the primary table and important

Page 41: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 31

a) Tuple & Page Sizes

Num Cols in Table

Avg Size Per Col (Bytes)

Avg Size Per Tuple (Bytes)

Max Tuples Per Page

(assuming 4k pages)

Primary Table 13 3.7 Bytes 48.1 Bytes 83 Supplementary

Table 79 4 Bytes 316 Bytes 12

Total 92 3.9 Bytes NA NA

b) DB Sizes Primary Table Supplementary Table Total

Num Tuples Size in MB Num Tuples

Size in MB Num Tuples

Size in MB

Per Day 52,768 2.54 52,768 16.67 105,536 19

Per Week 369,376 17.77 369,376 116.72 738,752 134 Per Month 1,604,147 77.16 1,604,147 506.91 3,208,294 584

Per Year 19,260,320 926.42 19,260,320 6,086.26 38,520,640 7,013

Total Time (20 Years) 385,206,400 18,528.40 385,206,400 121,725.22 770,412,800 140,254

Figure 3.13: a) Indicates estimated column, tuple, and page sizes. b) Indicates esti-

mated database size. Note: This does not take into account the cost of indexes or other

DB2 management techniques which may increase size.

columns in the supplementary table. While the relatively large number of indexes doesincrease the cost of searching for query execution plans it pails in comparison to thecost of actually executing the queries thus the trade-off is considered beneficial.

In order to minimize tuple size the smallest possible data types were used to storevariables. Examples include using where possible: SMALLINTS (2 bytes) instead ofINTEGERS (4 bytes) and REALS (4 bytes) instead of DOUBLES (8 bytes) or DECI-MALS (5 bytes). The biggest data type savings is acquired by storing all temporal in-formation as INTEGERS of the form YYYY MM DD TT (Year, Month, Day, Time)instead of as TIMESTAMPS which require 10 bytes.

While I/O savings are not as significant on smaller databases they quickly becomeapparent as its size increases. The tables in figure 3.13 illuminate the cost savings ofsplitting data into multiple tables.

The first item to notice from figure 3.13 is the significant tuple size difference be-tween the Primary and Supplementary tables. As the database temporal period extendsfrom days to years the cost savings of dividing data into two tables becomes more andmore evident. This last point is also reflected in the relatively small ratio of tuples perpage for the supplementary table compared with that of the primary.

Page 42: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 32

Second, is the 140+ gigabyte size of the database for a temporal resolution of thefull 20 years. The complexity of managing a database of this size coupled with thetime required to parse the datasets into the database suggest that integrating all 20 yearsmay be too arduous given the short time span of this dissertation project. This could becircumvented by using high performance computing facilities and/or parallel databasemanagement technology, however, both the cost in terms of money and learning curvestill are limited by the this dissertation project. Furthermore, branching into other areassuch as high performance computing should be considered straying too far from thestated goals of this project. Therefore the focus is on adding data for the most recenttime periods only and for a time span of years instead of decades.

3.5 Optimal Locations Algorithm

After the database encompassing the desired variables was constructed the final taskwas to answer the question: how are optimal locations calculated. In order to deter-mine this, an optimal locations scoring algorithm was implemented as a DB2 StoredProcedure to calculate places, which exhibit the types of properties, identified as mostdesirable. This next section discusses that algorithm.

Say for example, a user is interested in finding the top 10 locations with the highestincoming shortwave radiation and the lowest albedo for the month of July 2001. Ad-ditionally the user states that they are most interested in shortwave radiation (ie. 100%importance) and only half as interested in albedo (ie. 50% importance). The algorithmsearches for all coordinates with temporal resolution of July 2001 (in other words all6596 of them). For each coordinate it calculates a score for the average albedo andincoming shortwave radiation taking into account its associated weight. It then returnsresults for the 10 coordinates with the highest total score. This is the basic approach tocalculating optimal locations.

This type of question defines what the user considers ideal locations for CCN seed-ing and is an example of what is meant by the phrase “optimal locations.” Further-more, it is worthwhile to note that to the best of this author’s knowledge gained fromextensive study of the problem, there was previously no method to answer this type ofquestion.

The following section describes the algorithm in more detail and examines a samplerun-through.

Page 43: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 33

3.5.1 Pseudo Code

Input. A list of variables, their associated weights to measure, and a bit set to 1 ifthe variable is to be maximized (ie. shortwave radiation) and 0 if it is to beminimized (ie. initial albedo). Note: Each weight is set on a scale from 0 to 1 or0 to 100 percent.

Step 0. Before any queries are asked the database first needs to be pre-processed. Thisis accomplished by determining for each non-key field in the primary (calcu-lated) table the variable name as well as the highest and lowest possible valuefor it. This will yield a total of x tuples where x is the number of non-key fieldsin the primary (calculated) table. These tuples are then stored in the lookup ta-ble and will be used to set the range of each of the properties in a given query.Typically pre-processing is done only once after changes to the underlying data.

Step 1. For Each Unique Coordinate

• Declare totalScore = 0 (ie. each coordinate has its own totalscore)

• For each variable:

– Determine the average of all values for that particular coordinate/variable(as valavg)

– Compute a score for that particular variable using the formula:

(valavg−minvarmaxvar−minvar

)∗wt = scorecord/var

where:

∗ minvar = the minimum possible value for that variable (providedby the lookup table)

∗ maxvar = the maximum possible value for that variable (also, pro-vided by the lookup table)

This assumes of course that the variable is to be maximized. If thevariable is to be minimized use the formula:

(1− valavg−minvarmaxvar−minvar

)∗wt = scorecord/var

It is worthwhile to note that in either case the maximum possible score

is 1.

– Take this score and add it to the totalScore for that coordinate

Page 44: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 34

Return. Out of all the totalScores for each coordinate return the top x, where x issome threshold function. For example, top 10, best 10%, score greater than 10etc...

3.5.2 I/O Cost

• Preprocessing = θ(n)

– Where n is the number of data pages in the primary table.

• Computation = θ(n∗ x)

– Where n is the number of data pages involved in the computation and x isthe number of properties to be taken into account.

3.5.3 Example Run Through

• Given:

Primary Table LookUp Table Time Position Albedo Shortwave Property Max Min

1 1,1 0.3 100 Albedo 0.7 0.2 2 1,1 0.5 120 Shortwave 240 80 3 1,1 0.3 110 3 1,2 0.4 110 2 1,2 0.6 100 Computed Avg Values (by pos)* 1 1,2 0.5 80 Position Avg Albedo Avg Shortwave 1 2,1 0.7 120 1,1 0.366666667 110 2 2,1 0.6 100 1,2 0.5 96 3 2,1 0.5 110 2,1 0.6 110 3 2,2 0.4 190 2,2 0.366666667 216 2 2,2 0.5 220 1 2,2 0.2 240

*ordinarily these values would be computed during step1

• Inputs:

– All points, all times, variable1 = albedo (weight = 0.5, to be minimized),variable2 = downwelling shortwave radiation (weight = 1.0, to be maxi-mized), return top 2 highest scoring. In other words find the 2 cells with thelowest albedo and highest shortwave radiation, where the variable short-wave is twice as important as albedo.

• Calculations:

– Position(1,1): totalScore = .333+ .188 = .521

∗ Albedo = (1− .366−.2.7−.2 )∗ .5 = .333

Page 45: A Data Resource for Cloud Cover Simulations

Chapter 3. Description 35

∗ Shortwave = (110−80240−80)∗1 = .188

– Position(1,2): totalScore = .2+ .1 = .3

∗ Albedo = (1− .5−.2.7−.2)∗ .5 = .2

∗ Shortwave = ( 96−80240−80)∗1 = .1

– Position(2,1) : totalScore = .1+ .188 = .288

∗ Albedo = (1− .6−.2.7−.2)∗ .5 = .1

∗ Shortwave = (110−80240−80)∗1 = .188

Position(2,2) : totalScore = .334+ .85 = 1.318

∗ Albedo = (1− .366−.2.7−.2 )∗ .5 = .333

∗ Shortwave = (216−80240−80)∗1 = .85

• Return Top 2:

– Position(2,2): totalScore = 1.318

– Position(1,1): totalscore = .521

Page 46: A Data Resource for Cloud Cover Simulations

Chapter 4

Analysis

This previous chapter discussed the construction of the 3 outputs of this project:

• Climate Database

• Data Parser

• Optimal Locations Algorithm

It also examined how these outputs aligned with project goals. Chapter 4 takes thisdiscussion a step furtherer by analyzing each component in order to ensure it performsas claimed. The attempt here is to build confidence in this resource as a research qualitytool.

This is accomplished by performing a series of black and white box tests. Firstthe white box tests analyze the data parser to ensure that any input datasets are beingcorrectly represented within the climate database. White box tests include only a singletemporal period. To examine multiple temporal periods a series of black box tests arepreformed, which compares, for a given variable, the results produced by this climatedatabase with those of a well respected dataset. After these tests, some benchmarkresults for the optimal locations algorithm are examined. Finally, while not one of thestated goals of this project it is also interesting to make a few early optimal locationspredictions.

4.1 Data Visualization

In order to assist in analysis a simple visualization tool was built using the GoogleMap’s Web Service[10]. The main advantage of this is that developing the tool was

36

Page 47: A Data Resource for Cloud Cover Simulations

Chapter 4. Analysis 37

achievable within this project’s time constraints; although, there are a few disadvan-tages. Firstly, Google Maps employs a different system of latitude and longitude thenmost other visualization tools. Longitude intervals are spaced evenly; however latitudeis not (for an example see figure 4.4). The further one progresses from the equator thegreater the interval between latitudes. This may give the impression that the area for agiven spatial coordinate is larger towards the North and South Poles; however, this isnot the case as each cell is the same size.

Second, latitude only extends from -80◦ to 80◦ instead of the typical -90◦ to 90◦.This doesn’t pose too much of a problem because the harsh climates of these uncoveredregions are not considered candidates for seeding anyway.

Third, the visualization tool used to display NOAA-CDC data uses contours, whichtend to emphasize interesting shapes. The tool developed for this project simply de-termines the score for a cell and colors it accordingly. In some cases these differencesmay appear to show a discrepancy when in fact one does not exist. Therefore, whenanalyzing results it is best to avoid comparing intricate NOAA-CDC shapes that cannotbe displayed by the tool developed for this project.

As a final note, uncolored regions indicate no data or undefined.

4.2 White Box Tests

The first tests were designed to make certain that any input datasets are being accu-rately represented in the database. In other words, the focus here is on ensuring thatthe parser is functioning properly. To accomplish this three variables were chosen andeach was tested at four random temporal periods for a total of twelve tests. See figure4.1 for a description of each test.

Each test was performed as follows: raw data from the input dataset was extractedfor each variable/temporal pair. Next, the parser was run for each test and the databasewas queried to obtain the corresponding information. The input and the output (ie. thedata which ended up in the database for a given variable/temporal pair) were comparedby taking the absolute value of the input minus the output. Ideally the two should beequal resulting in a perfect difference of 0. If not their discrepancy indicates the amountof error.

For a given variable/temporal pair there are 6596 inputs and outputs to be com-pared. This of course corresponds to the spatial resolution of the climate database.The difference was computed for each of these and the average difference of all 6596

Page 48: A Data Resource for Cloud Cover Simulations

Chapter 4. Analysis 38

Test Number

Variable Temporal

Period Avg.

Discrepancy 1 TOTAL_NUM_PIXELS 1994-11-02-00 0.00000000 2 TOTAL_NUM_PIXELS 2002-03-28-00 0.00000000 3 TOTAL_NUM_PIXELS 2005-06-30-00 0.00000000 4 TOTAL_NUM_PIXELS 1997-08-18-00 0.00000000 5 SW_DW_FULL_SKY_AT_680_MB 2003-02-23-21 0.00001460 6 SW_DW_FULL_SKY_AT_680_MB 1995-01-09-03 0.00001460 7 SW_DW_FULL_SKY_AT_680_MB 1997-08-17-12 0.00001506 8 SW_DW_FULL_SKY_AT_680_MB 1999-01-26-00 0.00001459 9 MEAN_TC_FOR_CLOUD_TYPE_2 1994-11-02-00 0.00000923

10 MEAN_TC_FOR_CLOUD_TYPE_2 2002-03-28-00 0.00000963 11 MEAN_TC_FOR_CLOUD_TYPE_2 2005-06-30-00 0.00000919 12 MEAN_TC_FOR_CLOUD_TYPE_2 1997-08-18-00 0.00000941

Figure 4.1: The 12 tests preformed for each variable/temporal pair

results for a given variable/temporal pair was taken as a measure of compatibility.Where again a perfect score is 0 and anything greater indicates a non-exact match.

The results show that the difference between input and output is no greater then0.000015 with tests 1-4 reporting 100% correlation. This suggests a very small marginof error; given that the average range of each of these variables is approximately 500.The cause of the minor discrepancy in tests 5-12 is likely due to imprecise floatingpoint numbers, which were used to store those variables. The fact that tests 1-4 showno discrepancy and were performed using the variable TOTAL NUM PIXELS, whichis stored as a precise integer data type, further back up this claim.

The discrepancy between input and output for tests 5-12 while imperfect indicates avery high correlation and should build confidence in the accuracy of the data containedwithin the climate database.

4.3 Black Box Tests

The next series of tests were designed to indicate whether or not the database actuallydescribes conditions similar to those of a thoroughly tested and already establisheddataset. To accomplish this, datasets provieded by NOAA’s Climate Diagnostics Cen-ter1 were used to compare a set of variables (see figure below) that have coverage ofboth NOAA-CDC data and this climate database.

1All CDC images in this section are provided by the NOAA-CIRES Climate Diagnostics Center,Boulder, Colorado, from their Web site at http://www.cdc.noaa.gov/

Page 49: A Data Resource for Cloud Cover Simulations

Chapter 4. Analysis 39

NOAA CDC

Variable NOAA-CDC Dataset

Matched with (Variable from

Climate DB) Units

Begin Time

End Time

Surface Air temperature

CDC Derived NCEP Reanalysis Products Surface Flux

TWO METER TEMP_KELVIN kelvin

2001-01-01

2001-01-31

U-Wind

CDC Derived NCEP Reanalysis Products Surface Flux

TEN METER U WIND COMPONENT m/s

2001-10-01

2001-12-31

V-Wind

CDC Derived NCEP Reanalysis Products Surface Flux

TEN METER V WIND COMPONENT m/s

2001-10-01

2001-12-31

Wind Speed

CDC Derived NCEP Reanalysis Products Surface Level Data

AVG WIND (10m) SPEED METERS PER SEC m/s

2001-08-01

2001-08-31

Mean Total cloud cover

NCEP Reanalysis Daily Averages Other Gaussian Grid

PCT OF CLOUDY PIXELS Pct

2001-06-01

2001-08-31

Figure 4.2: Dataset Comparisons

Before continuing it is worthwhile to note that because the datasets themselves mayhave different spatial and temporal resolutions small differences between this climatedatabase and NOAA-CDC data should be expected. Additionally, results could differbecause variables were measured in different ways. For example, the comparison ofsurface wind speed versus 10 meter wind speed may reveal a small discrepancy becauseof the slight altitude difference each is reported at. However, any disagreement shouldbe very small and not have any effect on the general trend. Given this the goal here isnot to account for those minor differences but instead look at the overall picture anddetermine whether suitable correlation between the two exists. If it does this shouldbuild confidence in the database to accurately describe real conditions.

4.3.1 Surface Air Temperature

The first test compares surface temperature (figure 4.3). There is a high degree ofsimilarity between the two results. Most notable is the cooler temperature in the Andesas compared with the rest of South America, the extreme cold in northern Russia,Canada and the Artic Circle as well as the high temperatures over the outback regionof Australia. Additionally, there is a strong correspondence between max and mintemperatures.

Page 50: A Data Resource for Cloud Cover Simulations

Chapter 4. Analysis 40

4.3.2 Wind Speed and U & V Components

The similarities between U&V components are even more pronounced (figure 4.4).The most notable features include the large belt of high U-component running hori-zontally across the southern oceans. In the V-component maps there is a divide be-tween high and low values right at the equator that creates an interesting effect. Noticethat the max and min ranges between these 2 datasets (for both variables) are nearlyidentical with an average difference less than 0.4. The correlation between U & VComponents translates very nicely into wind speed (figure 4.5). Both results show thestrong winds off the coast of Africa/Middle East and part of a hurricane in the Gulfof Mexico. Any discrepancy between wind speed at an individual location is likelycaused by differences in the method used to derive the variable given the two datasetshigh similarity.

4.3.3 Mean Total Cloud Cover

The final variable to be compared is mean total cloud cover (figure 4.6). Again thereis a fairly high degree of correlation between these two results. Both accurately reflectregions of high and low cloud cover. The main difference between them being that themax cloud cover in the NOAA-CDC image is 83% while the climate database suggests95%. This is most likely due to the model each dataset uses to define and determinethis variable.

The results of these analyses show a significant degree of correspondence to theNOAA-CDC data for a given variable. This suggests that the climate database is ca-pable of describing conditions at a level of quality approximately equal to a highlyregarded dataset.

4.4 Benchmarks

In order to effectively gauge the run time cost of the optimal locations algorithm aseries of benchmark tests were preformed. Each was run on a Linux Fedora Core 5machine with 2 PowerPC processors each at 2.3 GHz and 4 GB of RAM. The databaseis an IBM UDB v9.1 with a standard 16 MB buffer pool. Testing was done usingthe db2batch utility according to the protocol recommended in [1]. Each test wasperformed 6 times with the first test result thrown away as it incurs additional start-upcosts such as initializing the buffer pool. The remaining 5 results are averaged and

Page 51: A Data Resource for Cloud Cover Simulations

Chapter

4.A

nalysis41

Figure 4.3: Comparison of surface temp in degrees Kelvin for NOAA-CDC data (left) and this climate database (right).

Page 52: A Data Resource for Cloud Cover Simulations

Chapter

4.A

nalysis42

Figure 4.4: Comparison of U & V (top and bottom respectively) wind components for NOAA-CDC data on the left and this climate database on

the right.

Page 53: A Data Resource for Cloud Cover Simulations

Chapter

4.A

nalysis43

Figure 4.5: Comparison of mean wind speed for NOAA-CDC data (left) and this climate database (right).

Page 54: A Data Resource for Cloud Cover Simulations

Chapter

4.A

nalysis44

Figure 4.6: Comparison of mean total cloud cover for NOAA-CDC data (left) and this climate database (right).

Page 55: A Data Resource for Cloud Cover Simulations

Chapter 4. Analysis 45

Single Variable Test Results (time in seconds)

2001-01-01 TO

2001-03-31

2001-01-01 TO

2001-06-30

2001-01-01 TO

2001-12-31

MEAN_ALBDO_STRATOCUM_DAYONLY (REAL/4 Bytes) 18.8119956 38.3333944 92.9432256 D_N_L_W_C_SEE_TABLE_2PT5PT6 (SMALLINT/2 Bytes) 21.984266 77.3714766 91.0501244

Query Execution Time vs. Num Variables

0

50

100

150

200

250

0 1 2 3 4 5 6 7 8 9 10 11

Num Variables

Qu

ery

Exe

cuti

on

Tim

e in

Sec

on

ds

Execution TimeFor: Jan '01 -Mar '01

Figure 4.7: Benchmark test results.

reported as the query execution time in seconds. After each test the buffer pool isflushed, in order to minimize the potential effect of prior results.

The first benchmark looked at two variables: stratocumulus cloud amount (REAL/4Bytes) and D L N C W2 (SMALLINT/2 Bytes) at 3 time periods each twice as longas the previous. With the exception of the 6 month D N L C W test, results showslightly less than linear scale up. When analyzing the results it is interesting to notethat the REAL, which is twice as large as the SMALLINT, performed better in all butthe last test. This was surprising as one would logically suspect the smaller data typeto perform best; although, this may have more to do with the CPU than the database.

The second benchmark compared performance as additional variables were succes-sively added to the algorithm. Ten different tests were performed in all starting with asingle variable. Besides a small hiccup at 7, results show linear scale-up where eachadditional variable cost approximately 20 seconds for the 3 month temporal period.This reflects the θ(n ∗ x) cost shown in section 3.5.3 and is to be expected given thatthe algorithm is essentially run once for each individual variable (see 3.5.1). It may be

2This is a code representing the time of day the measurement was taken at as well as the land type.

Page 56: A Data Resource for Cloud Cover Simulations

Chapter 4. Analysis 46

Variable Importance/Weight Maximize or Minimize Percentage of Stratocumulus Clouds 100.00% maximize Percentage of Med & High Clouds 50.00% minimize Stratocumulus Cloud Albedo 30.00% minimize Estimated Droplet Concentration for Stratocumulus 30.00% minimize Incoming Shortwave Radiation at 60 Milibars 100.00% maximize Boundary Layer Height (meters) 30.00% maximize Cloud Base Height (meters) 60.00% minimize

(Theoretical) Max Score = 4.00

Figure 4.8: A definition of optimal locations.

possible, although difficult, to reduce the cost to θ(n) by making it generate scores forall variables in a single pass. This is difficult because SQL does not deal well with anunknown number of columns at runtime[19].

Each monthly temporal period contains 1.6 million tuples and takes approximately20 seconds to process. Thus the algorithm is comparing about 80 thousand tuples persecond. In a production setting it would perform much faster by re-using previousresults and utilizing a much larger buffer pool. However, these benchmarks do providea glimpse of the type of performance one could expect.

4.5 Initial Predictions

While not the stated goal of this project it is interesting to make some initial predic-tions as to where optimal locations may be found. The definition of optimal locationsfor CCN seeding is purposefully subjective. Thus in making initial predictions thedefinition used represents one opinion, which of course may be refuted by others.

In this case optimal locations will be defined as those with: high amounts of stra-tocumulus clouds, low amounts of med and high clouds above them, a low albedo anddroplet concentration, a large amount of incoming radiation and a cloud base heightwhich is below the marine boundary layer (figure 4.7).

In this definition the variables stratocumulus cloud amount and shortwave radiationare given more weight than the others in order to reflect their importance. Conversely,albedo & droplet concentration are each given a low weight. This is because theyare each primarily determined by optical thickness and their combined weight of 60%gives them an appropriate importance.

Page 57: A Data Resource for Cloud Cover Simulations

Chapter 4. Analysis 47

The quarterly and full year results of this optimal locations query for the year 2001can be found in figures 4.8 and 4.9 respectively. The first item to note is that as hy-pothesized, optimal locations change throughout the year. This would be even morepronounced if data were analyzed over smaller time increments such as months orweeks instead of quarters. The primary driver of this shift is the dramatic change in in-coming shortwave radiation over the year (figure 3.2). Shortwave radiation was given astrong weight, which provided it with greater control in determining optimal locations.Other variables also show this type of variation, although to a lesser extent. For exam-ple, figure 3.4 indicates that on average there is a high concentration in stratocumulusclouds off the west coast of South America from 1983 - 2005; however, in the firstquarter of 2001 this is clearly not the case (figure 4.10). Thus although stratocumuluscloud amount over longer terms is concentrated in these areas over shorter terms ittends to fluctuate.

While one would expect the belts off the west coasts of the Americas and Africato be potential candidates for CCN seeding due to their high concentration of stratocu-mulus clouds additional locations have also been suggested. They include the regionsdirectly off the west coast of Australia and in Quarter 2 the North Sea above WesternEurope.

The suggestion of the west coast of Australia (especially in Quarter 1) is likelyattributed to its high levels of incoming solar radiation and remarkably low concentra-tions of high clouds. It also has a slightly lower albedo than the west coast of SouthAmerica. Additionally, in Quarter 2 concentrations of low-level stratocumulus cloudsreach as high as 40%!

In Quarter 2 when incoming shortwave radiation focuses more on the northernhemisphere the North Sea is featured prominently as an optimal location. This is alsodue to a moderate stratocumulus concentration and a low amount of high and middleclouds.

Second, while optimal locations do change throughout the year some also stayconstant such as the belts off of the west coasts of the Americas and Africa. Theprimary driver of this is the typically large concentration of stratocumulus clouds foundhere. These locations were initially cited by [22, 36] as being potential locations forCCN seeding because they have a high amount of stratocumulus clouds but there areother reasons that were identified by this project as well. This is because the proximityto the equator gives them a generally high amount of incoming shortwave radiationthroughout the year. In addition, the amount of mid and high level clouds at these

Page 58: A Data Resource for Cloud Cover Simulations

Chapter 4. Analysis 48

locations is remarkably low (figure 4.11). However, the downside is that they alreadyposses a relatively high albedo, which will make increasing it more difficult.

Finally, the maximum theoretical score a single cell can obtain for this optimallocations query is 4. The maximum score actually obtained is about 2.5. This is to beexpected because in order to achieve it a cell must perform perfectly for each variable,which while theoretically possible is also highly unlikely.

Page 59: A Data Resource for Cloud Cover Simulations

Chapter

4.A

nalysis49Figure 4.9: Quarterly optimal locations predictions for 2001. Note: uncolored cells represent missing or undefined data.

Page 60: A Data Resource for Cloud Cover Simulations

Chapter 4. Analysis 50

Figure 4.10: Full year optimal locations predictions for 2001. Note: uncolored cells

represent missing or undefined data.

Figure 4.11: Quarter 1 (January - March) 2001: Stratocumulus cloud concentration

Page 61: A Data Resource for Cloud Cover Simulations

Chapter

4.A

nalysis51

Figure 4.12: Full year 2001: On the left high and mid level cloud amounts. On the right areas with large concentrations of low level clouds and

small amounts of mid/high clouds above.

Page 62: A Data Resource for Cloud Cover Simulations

Chapter 5

Conclusion

This dissertation documented the implementation and analysis of: A Data Resourcefor Cloud Cover Simulations, which will be used to perform a number of in silico

experiments related to climate change. The outputs of this project represent what isbelieved to be the first attempt to answer some new and important questions related toglobal warming and climate change.

While not the stated goal of this project, an attempt was made to take an early lookat optimal locations. Results showed previously suggested locations such as thosejust off the west coasts of the Americas and Africa. Interestingly, they also identifiedsome new areas such as the North Sea above Western Europe and off the west coastof Australia, although they were only found to be optimal at certain times of the year.Further analysis with shorter temporal periods may identify more locations still.

Early results such as these suggest an exciting new perspective on the questionof where Latham & Salter’s ocean vessels should be used. In addition the climatedatabase may help answer further questions related to their research such as: Howmuch CCN should they spray and how many should be used?

In this spirit, before concluding, it is worthwhile to look at a few open questionsand areas of future work.

5.1 Open Question & Future Work

As with most research a delicate balance must be struck between what is desirable& what is feasible and this is certainly no exception. Many challenges led to newquestions and opportunities for future work. The most prominent of which, being theuse of this resource to perform a number of in silico experiments related to climate

52

Page 63: A Data Resource for Cloud Cover Simulations

Chapter 5. Conclusion 53

change. This section looks at some additional open questions and areas of future work.One difficult challenge this project faced was that it quickly became evident no suf-

ficient global dataset for CCN and/or droplet concentration exists. This was confirmedin [12]. Thankfully, [12] also discussed a fairly accurate method of approximatingdroplet concentration, which was then incorporated into this project (figure 3.3). Whilesuitable for this situation, it may be possible to improve its accuracy with better data.Using their formula assumptions were made for 2 variables: droplet radius (assumedto be 10µm) and cloud depth, from base to top (assumed to be 800m). By obtaining avalue for them the accuracy of droplet concentration could be increased. Droplet radiusdatasets are available with limited temporal resolution[11]. Unfortunately, cloud depthis more difficult to obtain. However, since the database already has data for cloud baseheight, and cloud top can be estimated using the hydrostatic equation, the differencewould give us cloud depth. Whether this approach is actually viable requires futurestudy.

Another challenge is that not much is known about the vertical structure of clouds.This is because satellites analyze the tops of clouds and ground based measurementsstudy the bottoms making it difficult to determine what happens in-between. Verti-cal structure is starting to be understood thanks to newer satellites such as NASA’sCALIPSO [26]. In addition to vertical structure CALIPSO is producing some of thefirst global aerosol datasets, which will assist in answering some of the above ques-tions regarding CCN and droplet concentration. The data produced by CALIPSO isjust beginning to be analyzed by scientists and it would be very interesting to use inthe climate database.

In terms of computing power there is definitely a limit to what can be accom-plished without using High Performance Computing technology. Investigating paralleldatabases in a shared nothing architecture[32] should significantly reduce I/O costsand make more advanced analysis capable. For example, higher resolution data couldbe used. ISCCP provides some datasets at 30km resolution, which would be great touse if the proper computational facilities existed. In addition much longer temporalresolutions could be investigated such as comparing optimal locations of the 1990’swith those of the 1980’s.

As stated in the first chapter there are a number of useful components that couldeasily be added on to extend the utility of the database. For example, more complexvisualization tools could be built to assist in analysis. Additional algorithms could beimplemented to determine how much CCN seeding would produce what sort of albedo

Page 64: A Data Resource for Cloud Cover Simulations

Chapter 5. Conclusion 54

increase.Initially, a large amount of research was put into incorporating spatial GIS technol-

ogy into this project[32, 34, Santilli et al.]. The main advantage is that there would beno reason to require data to conform to a specific spatial indexing schema such as theISCCP Equal Area Grid. This would make the integration of datasets with heteroge-neous spatial resolutions much easier. The spatial DBMS technology proposed for thisproject was PostgreSQL’s PostGIS extension[Santilli et al.]. PostGIS uses GeneralizedSearch Tree (GiST) indexes, which are a slightly optimized version of the R-Tree indexthat is typical of most spatial databases. While the use of spatial technology was in the-ory a good idea it suffered from a very significant drawback. Spatial indexes can onlydetermine whether the bounding box of one geometric object intersects with anotherwithout having to default to a file scan. Thus they cannot be used to perform importantoperations such as: intersection, contains, overlaps, or union. Given the large size ofthe database this cost was deemed too expensive. It may be possible, however, thatother vendors have more advanced spatial technology and future research could be ahuge benefit as it would allow many new types of analysis to be preformed.

In regards to the optimal locations algorithm, one potential area for future workis to extend the types of scoring it can perform. For example, it may be useful toseek locations that have a wind speed as close to (but not greater than) 5 meters persecond. It would also be nice to allow users to add conditionals to their searches. Forexample, only score coordinates that have a cloud amount greater than 20%. Thiswould disqualify misleading coordinates, which are compensating for low scores onkey variables with others.

Finally, this project purposefully avoided the subjective and difficult question ofwhat exactly defines optimal locations for CCN seeding. This resource will give re-searchers the ability to start answering this question by comparing and testing variousdefinitions.

Page 65: A Data Resource for Cloud Cover Simulations

Appendix A

User Guide

This document is intended to be a brief and by no means complete introduction tousing the climate database and optimal locations algorithm. Further information canbe found by following the links in this document or contacting myself (g N Sortino atyahoo dot com) or my supervisor Peter Buneman.

The physical database and optimal locations algorithm reside on boswell.inf.ed.ac.uk.Access to this machine is restricted and can only be obtained by speaking to PeterBuneman and/or the Informatics Computing Support (I cannot provide access).

All programs written for this project including the parser and pre-processor arestored in the PROJECT directory off of my home folder. Each includes documentationconcerning its use either in a separate readme file or embedded within the source code.If for some reason you cannot access something please contact me with your requestand I may be able to provide the required information. However, any questions relatedto the use of the FORTRAN routines or code libraries provided by anyone but myselfshould be directed to the relevant organization.

A.1 How to Access the Database & Run the Optimal Lo-

cations Algorithm

1. Assuming you are logged into boswell with access to the database enter:

db2 ‘‘connect to graham’’

2. To view the schema details you can enter:

db2 ‘‘describe table primary table’’

db2 ‘‘describe table primary table’’

55

Page 66: A Data Resource for Cloud Cover Simulations

Appendix A. User Guide 56

3. To run the pre-processor find and execute the program SQLFuncs.java in theparsers directory of my PROJECT folder (this is not located on the boswellserver) using the command:

java SQLFuncs

This will build the lookup table which is used by the optimal locations algorithm.

Note: this step is not necessary unless you have changed the underlying data in

the database.

4. To run the optimal locations algorithm use the following information:

• The procedure performs the following actions:

(a) Create a temp table using an email address or simmilar unique identi-fier

(b) Build the query

(c) Execute once for each variable

(d) For each query that returns a result insert it into the temp table

• Syntax:

optimalLocations v3 ( IN p emailAddr VARCHAR (60), IN p minLat SMALL-INT, IN p maxLat SMALLINT, IN p minLon SMALLINT, IN p maxLonSMALLINT, IN p landType SMALLINT, IN p startDate INTEGER, INp endDate INTEGER, IN p hourStart SMALLINT, IN p hourEnd SMALL-INT, IN p params VARCHAR (1500))

• Input definitions are as follows:

emailAddr- A unique identifier used to create the temp table (do not use@ or .).

minLat- A value from 0 to 180 corresponding to the smallest Lat coordi-nate to be investigated.

maxLat- A value from 0 to 180 corresponding to the largest Lat coordinateto be investigated.

minLon- A value from 0 to 360 corresponding to the largest Lon coordi-nate to be investigated.

maxLon- A value from 0 to 360 corresponding to the largest Lon coordi-nate to be investigated.

Page 67: A Data Resource for Cloud Cover Simulations

Appendix A. User Guide 57

landType- A code representing the type of data the user is interested in.

’1’= All - Day & Night

’2’= All - Day Only

’3’= All - Night Only

’4’= Water & Coasts - Day & Night

’5’= Water & Coasts - Day Only

’6’= Water & Coasts - Night Only

’7’= Water Only - Day & Night

’8’= Water Only - Day Only

’9’= Water Only - Night Only

’10’= Land & Coast - Day & Night

’11’= Land & Coast - Day Only

’12’= Land & Coast - Night Only

’13’= Land Only - Day & Night

’14’= Land Only - Day Only

’15’= Land Only - Night Only

startDate- The start date period in the format YYYYMMDD

endDate- The end date period in the format YYYYMMDD

hourStart- The Hour start time in the format HH - Choices are 00, 03, 06,09, 12, 15, 18, 21

hourEnd- The Hour end time in the format HH - Choices are 00, 03, 06,09, 12, 15, 18, 21

params- The names of the variables to be calculated, followed by theirimportance on a scale from 0 to 1 (1 being most, 0 being least) anda code, either 0 or 1 representing whether the parameter should beminimized or maximized (0 = minimize, 1 = maximize). Each elementshould be separated by a single space. For Example: ’Albedo 1 0 CCN0.8 0 ShortWave .6 1’ (no trailing whitespaces allowed!!).

• Example call:

db2 ‘‘CALL optimalLocations v3 (’myEmail’, 1, 180, 1, 360, 1,

20010101, 20010131, 00, 21, ’MEAN ALBDO STRATOCUM DAYONLY 1 0

PCT PIXLES STRARTOCUMULUS 1 1 PCT PIX MED AND HIGH CLOUDS .5 0

Page 68: A Data Resource for Cloud Cover Simulations

Appendix A. User Guide 58

CLD BASE HEIGHT IN METERS .6 0 AVG WIND SPEED METERS PER SEC 0

1’)’’

**Note: Be patient this may take a few minutes depending upon the number

of variables as well as spatial and temporal periods.

5. After executing the procedure a temporary table is built that stores the results,which can be obtained by calling:

getScores (IN p emailAddr VARCHAR (60), INOUT p params VARCHAR (1500)

where p emailAddr is the same key used to run the optimalLocations procedureand p params is a list of variables (from 1 up to the number of the variableslisted in the optimal locations procedure) each separated by a single white spaceyou want to find the score for. If you enter more than one variable a score iscomputed for them. If you enter only one variable it will return its averagevalues. This is so that you can analyze individual variables as well multiple oneswithout re-running the optimal locations query.

Example Call:

db2 ‘‘CALL getScores (’myEmail’,’MEAN ALBDO STRATOCUM DAYONLY

PCT PIXLES STRARTOCUMULUS PCT PIX MED AND HIGH CLOUDS

CLD BASE HEIGHT IN METERS AVG WIND SPEED METERS PER SEC’)’’

Further Information regarding the use of DB2 can be found by consulting [19, 1].In addition I have created a short db2 documentation (SQL/handySQLCommands.txt)file off of my PROJECT folder, which contains some commonly used commands. Fi-nally, do not under any circumstances run the batch file createTable.db2 as it will deletethe entire database!!

Page 69: A Data Resource for Cloud Cover Simulations

Appendix A. User Guide 59

[7] [11] [16] [33] [5] [9] [27] [41] [28] [43] [23] [6] [31] [PODAAC QuikSCAT Data Team][32] [34] [19] [1] [Santilli et al.]

Page 70: A Data Resource for Cloud Cover Simulations

Bibliography

[1] (2006). IBM DB2 Viper Release Candidate 1 for Linux, UNIX, and Windowsinformation center. IBM.

[2] Athens, G. H. (2006). It struggles with climate change. Computer World.

[3] Atmospheric Radiation Measurement (2006). http://www.arm.gov.

[4] Atmospheric Science Data Center (2006). http://eosweb.larc.nasa.gov/.

[5] Bohannon, P., Fan, W., Flaster, M., and Rastogi, R. (2005). A cost-based modeland effective heuristic for repairing constraints by value modification. SIGMOD.

[6] Breon, F. M., Tanre, D., and Generoso, S. (2002). Aerosol effect on cloud dropletsized monitored from satellite. Science, 295:834–838.

[7] Caron, J. (2004). NetCDF-Java User’s Manual. UNIDATA, 2.2 edition.

[8] Daum, P. and Liu, Y. (2002). Indirect warming effect from dispersion forcing.Nature, 419(6872):580–81.

[9] Gaallhardas, H., Florescu, D., Shasha, D., Simon, E., and Saita, C. (2001). Ajax:An exstensible data cleaning tool. SIGMOD.

[10] Google Maps Web Service (2006). http://www.google.com/apis/maps/.

[11] Han, Q., Rossow, W. B., Chou, J., and Welch, R. M. (1998a). Global survey ofthe relationships of cloud albedo and liquid water path with droplet size using isccp.Journal of Climate, 11:1516–1528.

[12] Han, Q., Rossow, W. B., Chou, J., and Welch, R. M. (1998b). Global variationof column droplet concentration in low-level clouds. Geophysical Research Letters,25(9):1419–1422.

[13] Han, Q., Rossow, W. B., and Lacis, A. A. (1994). Near-global survey of effectivedroplet radii in liquid water clouds using isccp data. Journal of Climate, 7:465–497.

[14] Han, Q., Zeng, W. R. J., and Welch., R. (2002). Three different behaviors of liquidwater path of water clouds in aerosol-cloud interactions. Journal of the AtmosphericSciences, 59(3):726–735.

[15] Hartmann, D. L., Ockert-Bell, M., and Michelsen, M. L. (1992). The effect ofcloud type on earth’s energy balance: Global analysis. Climate, 5:1281 – 1304.

60

Page 71: A Data Resource for Cloud Cover Simulations

Bibliography 61

[16] Haywood, J. and Boucher, O. (2000). Estimates of the direct and indirect radiativeforcing due to tropospheric aerosols: A review. Review of Geophysics, 38(4):513–543.

[17] Hortal, M. and Simmons, A. (1991). Use of reduced gaussian grids in spectralmodels. American Meterological Society, 119:1057–1074.

[18] International Satellite Cloud Climatology Project (2006).http://isccp.giss.nasa.gov/index.html.

[19] Janmohamed, Z., Liu, C., Bradstock, D., Chong, R., Gao, M., McArthur, F., andYip, P. (2004). DB2 R© SQL PL: Essential Guide for DB2 R© UDB on LinuxTM ,UNIX R© , Windows R© , i5/OSTM , and z/OS R© . IBM Press, second edition.

[20] Kallberg, P., Simmons, A., Uppala, S., and Fuentes, M. (2004). ERA-40 ProjectReport Series. European Center for Medium Range Weather Forecasts (ECMWF),Shinfield Park, Reading, RG2 9AX, England.

[21] Latham, J. (1990). Control of global warming? Nature, 347:339–340.

[22] Latham, J. (2002). Amelioration of global warming by controlled enhancementof the albedo and longevity of low-level maritime clouds. Atmospheric ScienceLetters, 3(2-4):52–58.

[23] Lehmkuhl, N. K. (1983). FORTRAN 77 - A Top-Down Approach. MacmillanPublishing Co., Inc.

[24] Matsui, T., H. Masunaga, R. A. P. S., Kreidenweis, S. M., Tao, W., Chin,M., and Kaufman, Y. J. (2005). Satellite-based assessment of marine low cloudvariability associated with aerosol, atmospheric stability, and the diurnal cycle.http://scholar.google.com/url?sa=U&q=http://blue.atmos.colostate.edu/publications/pdf/R-298.pdf (Journal Unknown).

[25] NASA (1999). Clouds and the energy cycle. Online.

[26] NASA (2005). Calipso: Cloud-aerosol lidar and infrared pathfinder satellite ob-servations. Online.

[27] NOAA-CIRES Climate Diagnostics Center (2006). http://www.cdc.noaa.gov/.

[28] NOAA Satellite and Information Service - National Environmental Satellite,Data, and Information Service (NESDIS) (2006). http://www.nesdis.noaa.gov/.

[29] Physical Oceanography DAAC (2006). http://podaac.jpl.nasa.gov/.

[30] Pilsbury, R. K. (1969). Clouds and Weather. B. T. Batsford LTD.

[PODAAC QuikSCAT Data Team] PODAAC QuikSCAT Data Team. SeaWinds on QuikSCAT Level 3 Daily, Gridded Ocean Wind Vectors(JPL Sea Winds Project). NASA - Physical Oceanography DAAC,http://podaac.jpl.nasa.gov:2031/DATASET DOCS/qscat L3.html.

Page 72: A Data Resource for Cloud Cover Simulations

Bibliography 62

[31] Pruppacher, H. R. and Klett, J. D. (1997). Microphysics of Clouds and Precipi-tation. Kluwer Academic Publishers, second edition.

[32] Ramakrishnan, R. and Gehrke, J. (2003). Database Management Systems. Mc-Graw Hill, thrid edition.

[33] Raman, V. and Hellerstein, J. M. (2001). Potter’s wheel: An interactive datacleaning system. VLDB Confrence, 27.

[34] Rigaux, P., Scholl, M., and Voisard, A. (2002). Spatial Databases with Applica-tion to GIS. Morgan Kaufmann Publishers.

[35] Rotstayn, L., Ryan, D., and Penner, B. F. (2000). Precipitation changes in a gcmresulting from the indirect effects of anthropogenic aerosols. Geophys. Res. Lett.,27:3045–3048.

[36] Salter, S. (2007). Sea-going hardware for the implementation of the cloudalbedo control method for the reduction of global warming. Submitted to: Inter-national Conference on Integrated Sustainable Energy Resources in Arid Regions,Abu Dhabi.

[Santilli et al.] Santilli, S., Hodgson, C., Ramsey, P., Lounsbury, J., and Blasby, D.PostGIS Manual. Refractions Research Inc., Victoria, British Columbia, Canada.

[37] Schwartz, S. (1996a). Cloud droplet nucleation and its connection to aerosolproperties. Int’l Conf. Nucleation and Atmospheric Aerosols, pages 770–779.

[38] Schwartz, S. (1996b). The whitehouse effect - shortwave radiative forcing ofclimate by anthropogenic aerosols: an overview. Journal of Aerosol Science,27(3):359–382.

[39] Stull, R. B. (2000). Meterology for Scientists and Engineers. Brooks/Cole, sec-ond edition.

[40] The European Centre for Medium-Range Weather Forecasts) (2006).http://www.ecmwf.int/.

[41] The Greenhouse Effect Detection Experiment (2006). Provided By: The BritishAtmospheric Data Center. http://badc.nerc.ac.uk/data/gedex/.

[42] The National Center for Atmospheric Research (2006).http://www.ncar.ucar.edu/.

[43] The National Center for Supercomputing Applications (NCSA) HierarchicalData Format (HDF) (2006). http://hdf.ncsa.uiuc.edu/.

[44] Twomey, S. (1977). Influence of pollution on the short-wave albedo of clouds.Journal of Atmospheric Science, 34:1149–1152.

[45] W.B., Walker, A., Beuschel, D., and Roiter, M. (1996). International SatelliteCloud Climatology Project (ISCCP) Documentation of New Cloud Datasets. ISCCPWMO/TD-No. 737, World Meteorological Organization. 115 pp.