statistical analysis for arcview

27
1 Statistical Analysis for ArcView ® Reference Manual June 1999 Spatial Data Services

Upload: willam-villarreal-rosales

Post on 12-Nov-2015

234 views

Category:

Documents


0 download

DESCRIPTION

Análisis estadístico con ArcView, modelos de cálculos en aplicaciones estadísticas

TRANSCRIPT

  • 1Statistical Analysis for

    ArcView

    Reference ManualJune 1999

    Spatial Data Services

  • 2Page1. Licensing 3

    Liability License Agreement Copyright Trademark Acknowledgment

    2. Over-View 4 Who is SDS Spatial Data Services? What is Gstats? Re-Sale Opportunities Functionality ArcView Requirements Platform What is Gstats Pro?

    3. Using Gstats 5

    Loading Gstats Unloading Gstats

    4. Null Values 6 Setting a Null Value Clearing a Null Value

    5. Charts 7 Ternary Diagram 8 Histogram Chart 9 Scatter Chart 11 Probability Plot 12

    6. Data in Plan 13 Spider Plot 14 Percentile Plot 15 Bi-Variate Plot 16 Point Labeling 17

    7. Processing & Reporting 18

    Correlation 19 Level to Background 20 Statistics 22 Regression 23 Concatenation 24 Percentile values 25 Cumulative values 26 Adding a unique record identifier 27

    List of Topics

  • 3Liability

    SDS Spatial Data Services accepts no liability arising from the use of Gstats software or useof Gstats documentation. SDS accepts no responsibility for technical errors or omissionsassociated with Gstats software.

    License Agreement

    Gstats is licensed for your use. The software is licensed for single use and may not betransferred to a 3rd party for any reason. As the licensee you are granted permission to makecopies of the software for backup purposes only.The license can be dis-continued by returning the original software to SDS and destroyingany copies in your possession. SDS can discontinue your license at any time without reason.No refund is available for discontinued software.

    In no event will SDS be liable to you for any damages, including any lost profits,lost savings or other incidental or consequential damages arising out of the use of or inabilitytouse this program, or for any claim by any third party.

    In accepting a license for GSTATS, you agree to the license terms described above andacknowledge that SDS will be in no way responsible for any damages resulting from use ofthe software by yourself or any 3rd party.

    Copyright

    SDS is the owner of the GStats software and all associated documentation.The software and documentation are protected by Copyright 1999. Neither may bereproduced without written permission by GSTATS.

    Trademark Acknowledgments

    GStats is a registered trade mark of SDS.

    ArcView is a registered trade mark of ESRI.

    Windows 95 and Windows NT are registered trademarks of Microsoft.

    Licensing

  • 4Who is SDS?

    SDS Spatial Data Services specialize in the development of GIS applications for theresources sector. SDS are licensed by ESRI to develop software for ArcView and to resellESRI ArcView GIS software.

    What is Gstats?

    Gstats has been developed to assist in the analysis and presentation of point data using ESRIArcview . An advanced version of Gstats is under development which provides additionalgridding, contouring and visualization functionality.

    Contacting SDS.

    Address25 Richardson St, West Perth, Western Australia, 6005P.O.Box 943, West Perth, Western Australia 6872Telephone + 61 8 9486 7587 Mobile 0412 509 356Fax + 61 8 9322 2994

    InternetE-mail; [email protected] http//www.sds.au.com

    Functionality

    Gstats is an ArcView Extension which enhances the existing statistical capability ofArcView .The software has been developed to provide basic statistical tools, which are easy tounderstand and use. Many graphing, table and plan display functions are available.

    ArcView Requirements

    ArcView 3.0+ is required to run GSTATS Gstats.Gstats has been developed to work with standard ArcView , reducing the cost of purchase ofadditional ArcView extensions such as 3D Analyst or Spatial Analyst. Additional functionalityis available in the form of Gstats Pro, which requires the ArcView extension SpatialAnalyst.

    Platform

    GStats is written entirely in ArcView Avenue and as such is platform independent. Gstatscan run with Arciew on Windows 95/NT, UNIX or Apple MacIntosh.For Windows 95 or NT, a Pentium Processor with 32 Meg of RAM is required.

    What is Gstats Pro?

    Gstats Pro provides additional functional not available in Gstats. Many Gstats tools requirethe availability of ArcView Spatial Analyst. Tools are available which allow for the griddingand contouring of point data and extraction of information from surface data.

    Over-View

  • 5Installation

    1. Copy the file gstats.avx into your AVHOME\ext32 directory. This will often beC:\ESRI\AV_GIS30\ARCVIEW\EXT32.

    2. Copy the file gstats.bmp into your AVHOME\ext32 directory. This will often beC:\ESRI\AV_GIS30\ARCVIEW\EXT32.

    To Load Gstats

    Gstats is an ArcView extension which can easily be loaded into your existing ArcView

    interface.

    1. Select File | Extensions from the ArcView menu. The Extensions dialogue box willappear.

    2. Scroll down the list of extensions and select the Gstats check box.

    3. Select OK to have the Gstats extension available in the current project or Make Defaultto have Gstats automatically load into all projects.

    As Gstats is being loaded, a dialog will be displayed.

    To Un-Load Gstats

    1. Select File | Extensions from the ArcView menu. The Extensions dialogue box willappear.

    2. Scroll down the list of extensions and select the Gstats check box (which should alreadybe selected).

    3. Select OK.

    The Gstats extension will no longer be available in your current project.

    Using GStats

  • 6Description

    Null values are data values which are flagged to represent no data present. This is differentto 0, which could be a valid number.

    When a Null value is set, all Gstats functions that perform analysis of data will exclude nullvalues from the analysis.

    ArcView does not store no data values. Instead, wherever a value has no data, 0 isapplied.

    Setting a Null Value

    1. From the initial GStats menu check the Null tool.

    2. In the following input, enter a number. The last selected null is provided by default. A non-numerical entry will not be accepted.

    The Null value will remain set whilst the Null check remains set.

    Clearing a Null Value

    1. From the GStats menu uncheck the Null tool.

    Null Values

  • 7Introduction

    ArcView provides many different chart types as standard.

    These include; Bar Charts Scatter Charts Line Charts Pie Charts

    Chart Querying and Selections

    Charts allow data to be interrogated like a theme in a view. Attributes of a chart point can beinteractively displayed.

    A selection made on a table or theme, controls which data will be displayed in a chart.From within a chart, chart elements can be removed, changing the original table or themeselection.Gstats will automatically link charts, tables and themes. This can cause delays in processing as multipleArcView documents are updated in response to changes in the current document. If processingbecomes slow, remove some of the links between linked tables.

    Many standard chart functions are available. Refer to the ESRI guide Using ArcView GISfor a full description on the creation of charts.

    Gstat Charts

    Gstat provides a number of additional chart functions not available in the standard release ofArcView 3.1.

    These include; Ternary Diagrams Probability Plots Complex Scatter Plots Theme and binned histograms (bar chart).

    Chart Limits

    The standard ArcView charting imposes a limit of being able to display a maximum ofaround 100 points in a chart.

    The number of points which Gstats can process is only limited by memory. Gstats also allowsstandard ArcView Charts of any size to be created.

    Charts

  • 8Description

    Provides a graphical summary of values from 3different fields along separate axis.

    The axis forms a triangle with the sum of values forany given point equaling 100 percent.

    When a field representing a unique ID is selected,each point is linked to its source, whether a table ortheme in a view.

    Selections in the resulting Ternary Diagram can be viewed interactively with the source data.

    Required Input

    3 different numeric fields for each axis. An output point theme, (you will be prompted for this upon selection of the create

    button). An output polygon theme, (you will be prompted for this upon selection of the create

    button).

    Optional

    A field representing unique values, allowing data points to be linked with the sourcetable.

    Check to have selected or all records processed. Check to create a report describing the processing

    Processing

    All valid records are processed. A record may be invalid if it is; equal to the current Null value Not part of the current selection (if Use Selected was checked).

    Output

    A new view comprising the ternary frame and related points. Two new themes are created;

    A point theme displaying the relationship between the input fields. A polygon theme displaying the bounding tri-angle.

    Ternary Diagram

  • 9Description

    Displays a histogram of values for a selectedfield.

    Histograms can be used to visualize thedistribution of data.

    For tables or themes with many records, datafrom a selected field can be binned.

    Two methods for binning data is possible;linear, where the data is placed into binswhos ranges are evenly determined across the data range OR percentile, whereby bins arecreated according to the calculated percentile value determined from input percentile classes.

    When data is binned, the display cumulative option must be checked to display thebinned data cumatively. When Bin Data is not selected, the selected field will always bedisplayed cumatively.

    See Percentile Values for important information on the results of calculating percentileintervals.

    Required Input

    A field whos values will be used in the construction of the histogram. If a classified histogram is to be created, a theme with a classified legend must be the

    current data source.

    Optional

    Check to display the cumulative values for a field. A chart title Check to bin data.

    Check to select a bin method. Entry of number of bins for linear or a list of white-space separated

    numbers for percentiles.

    Check to have selected, or all records processed.

    Dependents

    If cumulative values are used, a field representing unique values for each record isrequired.

    Processing

    All valid records are processed. A record may be invalid if it is; equal to the current Null value Null Not part of the current selection (if Use Selected was checked).

    Histogram Chart

  • 10

    Output

    A Bar Graph, linked to input source. A table of cumulative values if this option is checked. A table of binned data values if Bin Data is checked.

  • 11

    Description

    Graphically displays the relationshipbetween values of 2 fields.

    Values are scaled along each axis (2)with a point symbol displayed at theintersection point.

    Calculation of regression is also includedand is added to the scatter title.

    The chart is linked to the input datasource; theme or table. Changes in one selection will be reflected in the other. Outputcumulative or percentile tables can be linked to the original data source to provide ainteraction with the original source document.

    All charting functions associated with ArcView scatter charts are available.

    Required Input

    2 fields, one for each axis.

    Optional

    Check to display cumulative values on the Y axis. Check to display percenitle values on the X axis. Check to display the X or Y axis in log form. Check to have selected, or all records processed. A chart title

    Dependents

    If cumulative or percentile values are used, a field representing unique values for eachrecord is required.

    An output table name if cumulative or percentile values are displayed.

    Processing

    All valid records are processed. A record may be invalid if it is; equal to the current Null value Null Not part of the current selection (if Use Selected was checked).

    If cumulative or percentile options are checked, a table to store each type of data will becreated.

    Output

    A Scatter Graph, linked to the input source. A table of cumulative or percentile values if either of these options are checked.

    Scatter Chart

  • 12

    Description

    Graphically displays the distribution of data in asingle field.

    Inflection points can be identified where thereis a significant change in grade.

    The X axis is displays the percentile value ofeach point whilst the Y axis displays thecumulative frequency of each point.

    All charting functions associated with ArcView scatter charts are available.

    Required Input

    1 numeric field. A field whos value represents a unique record.

    Optional

    A chart title. Check to have selected, or all records processed.

    Dependents

    Nil.

    Processing

    All valid records are processed. A record may be invalid if it is; equal to the current Null value Null Not part of the current selection (if Use Selected was checked).

    A table is created to store cumulative data. A table is created to store percentile data.

    Output

    A log probability plot, linked to the input source. A table of cumulative or percentile values if either of these options are checked.

    Log Probability Chart

  • 13

    Introduction

    Users of ArcView would be familiar with the ease at which point data can be displayed inplan.Data can be made to look very different given a different classification method, selectionsubset or color display.

    Gstats provides a number of tools which enhance ArcView s ability to display and plot datain plan.

    These include; Percentile Plot. Spider Plot. Point Labelling. Bi-Variate Plot.

    Data in Plan

  • 14

    Description

    A spider plot is an effective way tovisualize multiple values for asingle point.

    Up to 8 values, representingintervals of 45 degrees from 0 to360 can be plotted.

    All selections not equal to Nonein the Plot Field list are used.

    For each value, a line of specifiedcolor is plotted, with its end point calculated according to the orientation and size of the line.

    The line size is defined by the proportion of the current value to the maximum value for thefield being plotted.

    The plotting of lines can be further refined by selecting a cutoff above which values will beconsidered.

    Definition of an intended plot scale and maximum line size allows creation of lines suitable forplotting.

    Required Input

    A field and a color for a single orientation.A scale.A maximum line length in mm.

    Optional

    Check to have selected, or all records processed.

    Dependents

    NIL

    Processing

    All valid records are processed. A record may be invalid if it is; equal to the current Null value Null Not part of the current selection (if Use Selected was checked). Less than the specified cut-off for a nominated field.

    Output

    For each orientation selected (up to 8), a graphics group is created which represents alayer of information which can be deleted, edited or removed from the current view.

    A new view is created which contains a legend for referral and plotting.

    Spider Plot

  • 15

    Description

    A very effective way to visualizedata is to rank and display data bypercentile.

    A percentile plot displays data inplan, where each symbolrepresents a different percentilerange.

    In some applications, it is useful toexamine the low or highpercentiles to gain anunderstanding of outstandingvalues, distinct from the remainderof the population.

    See Percentile Values for important information on the results of calculating percentileintervals.

    Required Input

    A check to display a percentile. An associated percentile value (0 to 100), size and color.Selection of a field to display percentile value for.

    Optional

    Check to have selected, or all records processed.

    Dependents

    If no records are selected in the current theme, the Use Selected box will not be available.

    Processing

    All valid records are processed. A record may be invalid if it is; equal to the current Null value Null Not part of the current selection (if Use Selected was checked).

    Records which are not in the current selection are excluded if the the Use Selected box isticked.

    Output

    A new theme is added to the current view, which displays the classes of percentilesnominated in the input dialog.If the the Use Selected box was selected, points not in the original selection will not bedisplayed.

    Percentile Plot

  • 16

    Description

    Values of 2 fields are classifiedindependently producing the samenumber of classes in each case.

    These classes are combined in alegend classification with each classbeing displayed with the same symbolcolor and size.

    Required Input

    2 Fields A Classification method for each field. A display color ramp. The number of classes to generate for each field.

    Optional

    NIL

    Dependents

    f no records are selected in the current theme, the Use Selected box will not beavailable.

    Processing

    All valid records are processed. A record may be invalid if it is; equal to the current Null value Null

    If all values within a field are equal, no classes can be defined. As a result, the messageCannot create classes for field", "Data is all equal" will appear.

    If no data is valid, the message"No valid data in field", "Cannot create display" will appear.

    Output

    A new theme is added to the current view, whos legned displays multiple classescreated from the combination of the 2 input fields.

    Bi-Variate Plot

  • 17

    Description

    Gstats allows easy labeling ofnumerical data relative to a point.

    Values from 8 fields can be plottedat 1 of 8 label placements.

    A cutoff for each field can bedefined. Values equal to or abovethis value will be plotted.

    For each field, the color of text andangle of text can also be defined.

    Required Input

    A field, and associated selection of cutoff, angle, size and color.A scale.

    Optional

    Check to have selected, or all records processed.

    Dependents

    If no records are selected in the current theme, the Use Selected box will not be available.

    Processing

    All valid records are processed. A record may be invalid if it is; equal to the current Null value Null Not part of the current selection (if Use Selected was checked). Less than the specified cut-off for a nominated field.

    Output

    For each selected field, labels will be drawn on the current view. A graphics group is createdfor each field, which represents a layer of information which can be deleted, edited orremoved from the current view.

    Point Labeling

  • 18

    Introduction

    Processing data involves applying a constant or formula to data in a logical manner to outputnew data. The new data may reveal new information, which was previously unseen.

    In the standard ArcView , there are many ways to process data to generate new information.

    These include; Summarizing data. Calculating new data using a combination of numerics, constants and functions. Classifying data into groupings based on the data distribution. Ratioing values from 2 fields.

    Gstats provides additional functionality in the form of; A leveling tool. Calculation of correlation co-efficients. Calculation of regression. Calculation of advanced statistics.

    In addition, 2 utility tools are provided which enable you to; Concatenate values from many fields into a single field. Add a unique record number to each record.

    The leveling tool in particular, may require the pre-creation of a concatenated field torepresent groups of data formerly defined in many fields.

    Assigning a unique value for each record allows allows many tools to link newly generateddata to the input data source. This is the case of many of the charting functions.

    Processing & Reporting

  • 19

    Description

    A correlation coefficient is a statistic that describes how similar thedistributions of two columns of data are.

    A correlation coefficient will be between 0 and 1.

    A 0 value indicates that no correlation exists, whilst 1 indicates exactcorrelation.

    Correlation coefficients can be viewed in a popup message or writtento a table for permanent storage and usage.

    Required Input

    At least 2 fields for which a correlation will be tested.

    Optional

    Check to have selected, or all records processed.Check to have records output to a table.

    Dependents

    If no records are selected in the current theme, the Use Selected box will not be available.

    Processing

    All valid records are processed. A record may be invalid if it is; equal to the current Null value Null Not part of the current selection (if Use Selected was checked).

    Output

    A popup report of correlation values or a table of information of the same data.

    Correlation Coefficient

  • 20

    Description

    A table of data can contain within it, groupings ofdata, which unless standardized in some way arenot easily compared.

    The process of standardizing data is known asleveling, the result of which allows data to beuniformly compared.

    Leveling may be undertaken in a number of ways,depending on the data being leveled.

    When leveled by percentile, each group of information has its nominated percentile valuecalculated, with this value being used to ratio all data for the current group.

    In addition, the mean of the lower selected percentile value can be used as the level byvalue.

    When leveled by value, a value within a group will be ratiod to the level value.

    Each field within the current table can be leveled in a different way.

    Required Input

    Selection of at least one field for which data will be leveled, (use the ON/OFF buttonsto select/de-select the current field).

    Selection of a field which contains a unique value for each record. Selection of a field which contains values which represent groupings of data. Selection of a method of leveling for each selected field.

    Optional

    Check to have selected, or all records processed. Check to have processing reported to a file. Check to use the mean of the calculated percentile value .

    Dependents

    If no records are selected in the current theme, the Use Selected box will not beavailable.

    When the percentile method of leveling is selected; a percentile value must be input.

    ProcessingIn some cases, a group field may exist, containing a single value, which can be used to groupthe data during processing. Sometimes however, a group may be defined by values in morethan 1 field. If this is the case, you may need to combine values from multiple fields into asingle field prior to leveling your data, (see Concatenate).

    Level to Background

  • 21

    All valid records are processed. A record may be invalid if it is; equal to the current Null value Null Not part of the current selection (if Use Selected was checked).

    Processing is undertaken according to collections of data derived from the grouping field.

    Output

    A table is created of the output levelled data, containing the fields; ,, _rat.

    If selected, a report is generated, describing the levelling process; the groupings andassociated levelling method and values for each field.

    If a value of 0 results from the calculation of a percentile value or its mean, then thefollowing entry will be placed in the output report file;

    "Not Processed. Group for returned a value of 0 for percentile ." If no valid records exist in the current grouping, the following is reported to the output file

    "Not processed. Group

  • 22

    Description

    Many different types of statistics are useful for analyzing data.

    The following statistics are reported, or written to a table as required; Sum Count Maximum Minimum Mean Median Midrange Harmonic Mean Quadratic Mean Mode Range Variance Standard Deviation

    Required Input

    If statistics are to be reported, a field must be selected. If statistics are to be written to a table, one or more fields must be selected.

    Optional

    Check to have selected, or all records processed.

    Dependents

    If no records are selected in the current theme, the Use Selected box will not beavailable.

    Processing

    All valid records are processed. A record may be invalid if it is; equal to the current Null value Null Not part of the current selection (if Use Selected was checked).

    Output

    A table or report of statistical information.

    Statistics

  • 23

    Description

    Least squares regression produces a line of best fitbetween 2 variables; one which is considered to becorrect (the dependent variable), and the other whichcontains error (independent variable). Regressionminimizes the error.

    Required Input

    A dependent field. A independent field.

    Optional

    Check to have selected, or all records processed.

    Dependents

    If no records are selected in the current theme, the Use Selected box will not beavailable.

    Processing

    All valid records are processed. A record may be invalid if it is; equal to the current Null value Null Not part of the current selection (if Use Selected was checked).

    Output

    A report detailing the regression value, intercept and slope

    Regression

  • 24

    Description

    Concatenation involves the combination of values from multiple fields intoa single new field for each selected record.

    Concatenation of data can be useful to combine data to create a uniqueidentifier.

    Some functions, such as the Gstats leveling tool may require that youconcatenate field values to create a grouping field.

    Required Input

    Selection of more than 1 field. Definition of a new field name.

    Optional

    Check to have selected, or all records processed.

    Dependents

    If no records are selected in the current theme, the Use Selected box will not beavailable.

    Processing

    The table containing the fields to be concatenated must be editable.

    All valid records are processed. A record may be invalid if it is; equal to the current Null value Null Not part of the current selection (if Use Selected was checked).

    Output

    A field is added to the existing table representing a combination of values from the inputfields.

    Concatenate

  • 25

    Description

    A very effective way to visualize data is to rank anddisplay data by percentile.

    Percentile data is used in many Gstats functions andwritten to tables on the fly. In each case a new percentiletable is created.

    Creating a table of percentile values for multiple fieldsprior to processing can make the management ofpercentile data more effective.

    The number of records assigned to each percentile class will always be correct. The valuerange for each class is a weighted average. There is no way to precisely calculate the valuerange for each class. Discrepancies may exists between the percentile range and the numberof samples in that range, In this case, the number of records for the calculated percentile iscorrect.

    Once a percentile table has been created, it can be joined to an existing table by a commonfield identifier.

    Required Input

    A field defining a unique identifier, which can be used at a later date to join (relate) datato the output table.

    At least one field for which percentile values will be created.

    Optional

    Check to have selected, or all records processed.

    Dependents

    If no records are selected in the current theme, the Use Selected box will not beavailable.

    Processing

    All valid records are processed. A record may be invalid if it is; equal to the current Null value Null Not part of the current selection (if Use Selected was checked).

    Output

    A table is created containing an id field for each input field and an output percentile fieldcalled _pct.

    A percentile value in any given row has no relationship with a percentile value of anotherfield in the same row.

    The output table can be used in the creation of scatter charts, log probability plots andany other chart you wish to create.

    Percentile Values

  • 26

    Description

    Many Gstats functions make us of cumulative data.

    Cumulative data allows different groups of informationto be identified from a single dataset.

    Creating a table of cumulative values for multiple fields,prior to processing can make the management ofcumulative data more effective.

    Once a cumulative table has been created, it can bejoined to an existing table by a common field identifier.

    Required Input

    A field defining a unique identifier, which can be used at a later date to join (relate) datato the output table.

    At least one field for which cumulative values will be created.

    Optional

    Check to have selected, or all records processed.

    Dependents

    If no records are selected in the current theme, the Use Selected box will not beavailable.

    Processing

    All valid records are processed. A record may be invalid if it is; equal to the current Null value Null Not part of the current selection (if Use Selected was checked).

    Output

    A table is created containing an id field for each input field and an output cumulative fieldcalled _cum.

    A cumulative value in any given row has no relationship with a cumulative value ofanother field in the same row.

    The output table can be used in the creation of scatter charts, log probability plots andany other chart you wish to create.

    Cumulative Values

  • 27

    Description

    Many Gstats options require the presence of a unique value for each record in a table. Theunique value is most often used to relate newly created data to the original table. Use this toolto add a unique record number to each record.

    Required Input

    Selection of an editable theme or table.

    Optional

    NIL.

    Dependents

    NIL.

    Processing

    This tool adds a unique identifier for each record, beginning at 1 and ending with . If a field called recno already exists in the current table, this field can beupdated.

    Output

    An updated or newly created field containing record numbers.

    Adding a unique record identifier