make the map you want with proc gmap and the annotate · pdf filechoro tract / levels =1...

14
1 Make the Map You Want with PROC GMAP and the Annotate Facility Michael Eberhart, MPH, Philadelphia Department of Public Health ABSTRACT This paper describes how to use SAS/GIS® and PROC GMAP to create presentation-quality maps of geographic data. Topics discussed include using U.S. Census Bureau TIGER/Line® files for geocoding address data, using PROC GMAP and the annotate facility to display map datasets in a variety of formats, importing map files from other software products, and assigning geocoded cases to polygons based on spatial location. Additional topics include coordinate systems and map projections, PROC GMAP options that control appearance of maps, annotating polygon borders, creating and combining annotate datasets, using annotate macros (%maplabel), summarizing data and handling missing values, creating map output files using device options, loop processing with NULL datasets, replaying graphics with high resolution, and customizing map legends with legend options and annotating. GETTING STARTED Maps can be an effective method for presenting data that varies geographically. Maps provide a spatial picture of the data, and allow end-users to easily see clusters or areas of concentration. Spatial data refers to anything that can be referenced based on its physical location, such as census tracts, zipcodes, and street addresses. Geocoding is the process of adding spatial information to existing data based this physical location. Address geocoding attempts to match a street address in a SAS dataset with spatial information in a SAS spatial database. If a match is found, the coordinates for the address location (x,y) are added to the observation. Additional information about the address location (e.g. census tract) can also be added to the address dataset. Data points can be displayed on a map discretely or aggregated to some geographic unit. TIGER/LINE DATA The U.S. Census website contains TIGER/Line® files for all counties in the United States. The term TIGER® refers to the Topologically Integrated Geographic Encoding and Referencing system used by the U.S. Census Bureau. Each TIGER/Line file set contains a series of data files that contain spatial information for geographic features such as roads, rail lines and rivers, as well as boundary lines for census tracts, census blocks and counties. The data include digital information such as location in latitude and longitude, the names and types of features, address ranges (from-to, left-right), and relationships between features (e.g. where rails cross streets, or census blocks are contained within census tracts). The steps to download and import TIGER/Line files are outlined in a previous paper. GEOCODING ADDRESS DATA Assigning a spatial location based on a standardized street address can be accomplished either through the GEOCODE option in SAS/GIS or by using the batch geocoder. For either method, a prepared SAS address dataset is required. The dataset must include one address variable including the address number, direction, name and type (e.g. 500 S. Broad St). The dataset must also include a city variable and a state variable. A zipcode variable is desired, but not required. Import the dataset containing the addresses to be geocoded. Create a variable that contains all of the elements of the street address – in this example, four string variables are concatenated and compressed using the COMPBL functions to remove extra spaces. The city and state variables are added to the datasset. /*Get case data for geocoding*/ PROC IMPORT OUT= WORK.geocode DATAFILE= "c:\nesug08\data\address.dbf" DBMS=DBF REPLACE; GETDELETED=NO; RUN; /*Create address variables for geocoding*/ data geocode; set geocode; address=compbl(strtno||strtdir||' '||strtname||strtave); city='Philadelphia'; state='Pennsylvania'; run; And Now, Presenting ... NESUG 2008

Upload: phungdang

Post on 08-Feb-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

1

Make the Map You Want with PROC GMAP and the Annota te Facility Michael Eberhart, MPH, Philadelphia Department of Public Health

ABSTRACT This paper describes how to use SAS/GIS® and PROC GMAP to create presentation-quality maps of geographic data. Topics discussed include using U.S. Census Bureau TIGER/Line® files for geocoding address data, using PROC GMAP and the annotate facility to display map datasets in a variety of formats, importing map files from other software products, and assigning geocoded cases to polygons based on spatial location. Additional topics include coordinate systems and map projections, PROC GMAP options that control appearance of maps, annotating polygon borders, creating and combining annotate datasets, using annotate macros (%maplabel), summarizing data and handling missing values, creating map output files using device options, loop processing with NULL datasets, replaying graphics with high resolution, and customizing map legends with legend options and annotating. GETTING STARTED Maps can be an effective method for presenting data that varies geographically. Maps provide a spatial picture of the data, and allow end-users to easily see clusters or areas of concentration. Spatial data refers to anything that can be referenced based on its physical location, such as census tracts, zipcodes, and street addresses. Geocoding is the process of adding spatial information to existing data based this physical location. Address geocoding attempts to match a street address in a SAS dataset with spatial information in a SAS spatial database. If a match is found, the coordinates for the address location (x,y) are added to the observation. Additional information about the address location (e.g. census tract) can also be added to the address dataset. Data points can be displayed on a map discretely or aggregated to some geographic unit. TIGER/LINE DATA The U.S. Census website contains TIGER/Line® files for all counties in the United States. The term TIGER® refers to the Topologically Integrated Geographic Encoding and Referencing system used by the U.S. Census Bureau. Each TIGER/Line file set contains a series of data files that contain spatial information for geographic features such as roads, rail lines and rivers, as well as boundary lines for census tracts, census blocks and counties. The data include digital information such as location in latitude and longitude, the names and types of features, address ranges (from-to, left-right), and relationships between features (e.g. where rails cross streets, or census blocks are contained within census tracts). The steps to download and import TIGER/Line files are outlined in a previous paper. GEOCODING ADDRESS DATA Assigning a spatial location based on a standardized street address can be accomplished either through the GEOCODE option in SAS/GIS or by using the batch geocoder. For either method, a prepared SAS address dataset is required. The dataset must include one address variable including the address number, direction, name and type (e.g. 500 S. Broad St). The dataset must also include a city variable and a state variable. A zipcode variable is desired, but not required. Import the dataset containing the addresses to be geocoded. Create a variable that contains all of the elements of the street address – in this example, four string variables are concatenated and compressed using the COMPBL functions to remove extra spaces. The city and state variables are added to the datasset.

/*Get case data for geocoding*/ PROC IMPORT OUT= WORK.geocode DATAFILE= "c:\nesug08\data\address.dbf" DBMS=DBF REPLACE; GETDELETED=NO; RUN; /*Create address variables for geocoding*/ data geocode; set geocode; address=compbl(strtno||strtdir|| ' ' ||strtname||strtave); city= 'Philadelphia' ; state= 'Pennsylvania' ; run;

And Now, Presenting ...NESUG 2008

2

Batch geocoding is initiated using the %GCBATCH macro. The macro launches the geocoding facility and supplies a series of values that determine what gets geocoded, how addresses are matched, what spatial data is used for matching, and how the output file is generated. While processing the address data, the geocoding facility writes a message to the log indicating its progress. As addresses are matched, the coordinates of the address location are added to the address data set. Additional values contained in the spatial data, such as census tract or census block, can also be added to the address data set for matched cases. If an address cannot be matched to the spatial data but the address includes a zipcode, the X and Y coordinates of the zipcode centroid are added instead of the exact coordinates of the address. To enhance matching, the geocoding process converts address components to uppercase and attempts to standardize address components such as street direction and street type values. These standardized versions of the address are also added to the address data set. The variables M_ADDR, M_CITY, M_STATE, M_ZIP, and M_ZIP4 refer to the values that were actually matched during the geocoding process. The following values are accepted by the %GCBATCH macro:

� the name of the dataset to geocode (geod=) � the names of the address variables (av=, cv=, sv=, zv=) � the map name (mname=) � the names of any additional values to add to the address data set (pv=) � command to recreate the dataset (newdata=yes)

The code below will launch the geocoding facility using MAPDATA.TIGER.PHILA as the spatial reference map. It will geocode addresses in WORK.GEOCODE using ADDRESS as the street address variable (av), CITY as the city variable (cv), STATE as the state variable (sv), and ZIP1 as the zipcode variable (zv). It will recreate (newdata=yes) the address dataset and include TRACT as an additional variable (pv).

/*Batch geocode addresses*/ %gcbatch(geod=geocode, /* Name of the addresses dataset*/ av=address, /* street address variable*/ cv=city, /* city variable */ sv=state, /* state variable*/ zv=zip1, /* zipcode variable*/ pv=tract, /*variable added to address data*/ newdata=yes, /*recreate dataset*/ mname=mapdata.tiger.phila); /* Name of the map entry */ dm 'af c=sashelp.gis.geocodeb.scl' ; /* Invokes the batch geocoder.*/

EXPORT MAP In order to display maps created in SAS/GIS using SAS/GRAPH, a map file must be exported into a traditional SAS dataset that can be referenced by PROC GMAP. To do this, SAS provides an experimental sample program called the GIS Exporter. The program and documentation can be downloaded from: http://support.sas.com/rnd/datavisualization/mapsonline/html/tools.html Select the GIS2SAS Exporter – 13jul04 Update (Request Download) link and follow the steps to download. This download requires a username/password login. As stated in the program code, this is an experimental set of routines for use only with Version 8 or above. The files are provided by SAS “as is” without warranty or support of any kind. Once the files are downloaded, the GISEXPORT.CPT file must be taken out of transport format using PROC CIMPORT.

libname mapdata 'c:\nesug08\maps' ; LIBNAME EXPORT 'C:\NESUG08\MAPS' ; FILENAME CPTFILE 'C:\nesug08\gisexport_13jul04\gisexport_13jul04.cpt ' ; PROC CIMPORT FILE =CPTFILE LIB=EXPORT; RUN;

Before invoking the GIS exporter, a series of macro variables must be assigned to define parameters of the export.

/* define the parameters for the export */ %let MAPLIB=mapdata; /*libref of SAS/GIS map*/

And Now, Presenting ...NESUG 2008

3

%let MAPCAT=tiger; /*catalog of SAS/GIS map*/ %let MAPNAME=phila; /*name of SAS/GIS map*/ %let SPALIB=mapdata; /*libref of spatial data*/ %let SPANAME=phila; /*name of spatial data*/ %let GRALIB=mapdata; /*libref of output map file*/ %let GRANAME=ptract; /*name of output map file*/ %let ANNLIB=mapdata; /*libref of output annotate dataset*/ %let ANNNAME=annotate; /*name of output annotate dataset*/ %let GRAFLIB=SASUSER; /*libref of graphics output*/ %let GRAFOUT=philamap; /*name of graphics output*/ %let LAYER2=tract; /*name of map layer to export*/ %let EXTRA=; /*names of additional variables to maintain*/ /* Perform the export*/ dm 'af c=export.export.exportv81.scl; run;' graph1;

When the GIS Exporter completes, it generates a graph file using PROC GMAP. This graph uses the exported SAS dataset (in this case a map of census tract polygons) and a very complex annotate dataset generated by the exporter. The annotate dataset draws lines for the other layers in the original map. Creating a graph file without the annotate dataset will allow you to see the exported map of census tracts. SAS/GRAPH – PROC GMAP Traditional map datasets can be used to produce two-dimensional (choropleth) or three-dimensional (block, surface or prism) maps using the GMAP procedure. Choropleth maps are generally used to highlight spatial variation between areas of the map using colors and/or patterns. The code and output for a simple chorpleth map of census tracts is found below. However, you will notice that the map appears different than it did in SAS/GIS. The reason for the difference is that the map is not projected. The map dataset created by the GIS exporter uses latitude and longitude for its spatial values. Latitude and longitude are coordinates for a spherical surface. Map projections convert these values to present spatial data with less distortion.

/*Presenting maps with PROC GMAP*/ title1 h= 12 'Unprojected Map' ; pattern1 value=empty color=black; proc gmap data =mapdata.ptract map=mapdata.ptract; id tract; choro tract / levels =1 nolegend ; run; quit;

And Now, Presenting ...NESUG 2008

4

MAP PROJECTION Map projections allow a three dimensional sphere to be displayed on a flat surface. All map projections contain some distortion, but using the right projection can help limit the amount of visual distortion. Map projections are not critical when displaying maps that cover only a small area. However, larger areas will have more distortion, and may even appear backwards if not projected. Projection is critical for any map that crosses the equator or meridian. The GPROJECT procedure converts latitude and longitude into Cartesian coordinates which allows you to project maps in such a way as to minimize distortion and maximize the display area. For most maps of the US, the default Albers projection should suffice. The GPROJECT procedure requires an input dataset (data=), an output dataset (out=) and an ID statement. The ID statement refers to a variable in the dataset that identifies a unit area, and each unit area is evaluated separately. The DEGREE option specifies that the latitude and longitude are in degrees (the default is radians), and the EASTLONG option specifies that longitude values increase to the east. The code below creates a projected output map dataset called PROJTRACT and creates a graph of the projected map using PROC GMAP.

/*Project map using default - Albers*/ proc gproject data =mapdata.ptract out =projtract degree eastlong ; id segment; run; /*display projected map*/ title1 h= 12 'Projected Map' ; pattern1 value=empty color=black; proc gmap data =projtract map=projtract; id tract; choro tract / levels =1 nolegend ; run; quit;

And Now, Presenting ...NESUG 2008

5

PLOTTING GEOCODED DATA ON THE MAP For the first GMAP example, the choropleth map will be used as background, or spatial reference for the geocoded cases. Since GMAP does not draw the data points, they are placed on the map using the annotate facility. The annotate facility can be used to enhance graphics output in a variety of ways. An annotate dataset can be created to add additional text, highlight graph elements, enhance legend features, or anything else that will help end-users understand what the graph is meant to convey. The annotate dataset in this example will be used to add the geocoded cases to the map using a color-coded symbol. Annotating is as simple as creating a SAS dataset with a few commands. In this case, the point locators (x,y) for the cases are already in the data, and several annotate variables are specified using the length statement: function, color, style, position and text. The values for other variables - xsys, ysys, hsys and when – are set in the data step. The value of ‘color’ is set based on the sex of the case. The annotate option (annotate=) of the PROC GMAP statement identifies the annotate dataset.

Annotate Variables:

XSYS and YSYS indicate which coordinate system to use. In this example, coordinate system 2 is used. Coordinate system 2 refers to the data area and the absolute value of x,y. For more information regarding coordinate systems refer to the SAS help tool.

when indicates when annotate graphics are drawn – ‘a’ indicates that annotate graphics are drawn after the procedure output and ‘b’ indicates that they are drawn before the procedure output.

position indicates where a text string will be drawn in relation to the x,y coordinates. A value of ‘E” indicates the text should be centered one-half cell below the position of the x,y coordinate. The default is 5 (centered).

style applies to the ‘label’ function and indicates the text font. function indicates the action to be taken – options include MOVE, LABEL, BAR, and PIE (as well as several others). color and size indicate the color and size of the text (or the BAR, PIE, etc.). The SIZE is interpreted based on the function – for TEXT, size represents the height of the text.

/*annotate dataset */ data anno; set geocode; length function style color text $ 8 ; xsys= '2' ; ysys= '2' ; hsys= '3' ; when='A' ; function = 'label' ; style= 'special' ; text= 'L' ; if sex= 'F' then color= 'blue' ; if sex= 'M' then color= 'red' ; size= 5; position= 'E' ; if _status_= 'found' ; segment= 1; run;

Because the annotate data must be projected using the same projection criteria as the map, the GPROJECT procedure is run on the annotate dataset as well. The variable segment=1 is created so that all of the elements are evaluated together, just as was done in the map projection.

proc gproject data =anno out =annop degree eastlong ; id segment; run; /*projected map with annotated cases plotted*/

And Now, Presenting ...NESUG 2008

6

pattern1 value=empty; proc gmap data =projtract map=projtract anno =annop; id tract; choro tract / coutline =gray nolegend ; run; quit;

ADDING LEGEND AND NORTH ARROW The above GMAP examples were executed using the NOLEGEND option. Legends, like any other graphics element, take up space in the output area and affect the size of the map. Since the geocoded addresses were placed on the map as annotations, they would not be considered legend elements. The legend elements are levels of the CHORO variable, in this example, tracts. However, a legend that describes the annotated addresses can be created using a combination of a legend statement and some additional annotating. A north arrow is also a typical map element that is not an option in GMAP, but annotation can handle that as well (if you have an image file). Using the annotate function ‘image’ with style=’fit’ and imgpath=’pathname’, the image is placed to ‘fit’ in the space allotted. For this annotate dataset, the xsys and ysys coordinate system is set to ‘5’ which represents the procedure output area.

/*annotate for legend*/ data legend; length function style color text$ 8; retain xsys '5' ysys '5' when 'A' ; function= 'move' ; x= 70;y= 18; output ; function= 'label' ;style= 'special' ;

size= 2;position= '5' ;color= 'red' ;text= 'L' ; output ; function= 'move' ; x= 75;y= 18; output ; function= 'label' ;style= 'swissb' ;size= .8;position= '5' ;color= 'black' ; text= 'Male' ; output ; function= 'move' ; x= 70;y= 14; output ; function= 'label' ;style= 'special' ;

size= 2;position= '5' ;color= 'blue' ;text= 'L' ; output ; function= 'move' ; x= 76;y= 14; output ; function= 'label' ;style= 'swissb' ;size= .8;position= '5' ;color= 'black' ; text= 'Female' ; output ; function= 'move' ; x= 4;y= 4; output ;

And Now, Presenting ...NESUG 2008

7

function= 'image' ; style= 'fit' ;imgpath= 'c:\nesug08\n_arrow.bmp' ; x=10;y= 16; output ; run; /*combine cases and legend*/ data anno2; set annop legend; run;

LEGEND - The LEGEND option is included in the GMAP, however most of the legend elements will be added using the annotate facility. Therefore, the legend is included in the GMAP leaving space to annotate legend values for the geocoded addresses.

Definition of Legend Features: across=1 defines the number of columns across for the legend entries down=3 defines the number of rows for the legend entries (these two options [across= and down=] allow for

great flexibility in how a legend is displayed) position=(bottom right inside) places the legend below the graph, justified right, inside of the graph output area cborder=black creates a border around the legend mode=share places the legend in the procedure output area allows other graphic elements to share the same

space (other options include RESERVE and PROTECT) offset=(-2cm,) places the legend 2cm to the left of the default position for bottom/right/inside (the offset

function allows you to place the legend anywhere you want) value= sets the height, font and text of the legend value(s)

/*Map with titles and legend*/ title1 h= 2 "Map With Legend and North Arrow" ; pattern1 value=empty; legend1 across =1

down=3 cborder =black position =(bottom right inside) mode=share label =none offset =(-2cm, .5cm) shape =bar( 2, 1) value =( h=.8 f =swissb justify =left 'Census Tract' );

proc gmap data =projtract map=projtract anno =anno2; id tract; choro tract / levels =1 legend =legend1;

run; quit;

And Now, Presenting ...NESUG 2008

8

IMPORT SHAPEFILES If you want to present data at some other geographic unit, you will need a traditional map dataset drawn to that unit. Census tracts (and census blocks) as well as county boundaries are available with the TIGER/Line data, and SAS provides map datasets for zipcodes. But users often want to see data using boundaries that are specific to their needs. Other GIS software programs allow users to create geographic boundary layers. Layers created using ESRI can be imported into SAS using PROC MAPIMPORT. The two required arguments in PROC MAPIMPORT are the name of the output dataset (OUT=) and the complete pathname of the input shapefile (DATAFILE=). By default, SAS reads and converts all variables in the input data, however the INCLUDE and EXCLUDE options can be used to limit the number of variables.

/*Import ESRI council district and voting ward map files*/ PROC MAPIMPORT OUT=sasuser.council DATAFILE='c:\nesug08\council.shp' ; run;

PROC MAPIMPORT OUT=sasuser.wards DATAFILE='c:\nesug08\wardjoin4.shp' ; run;

PROC MAPIMPORT OUT=sasuser.wardproj DATAFILE='c:\nesug08\tgr42101vot00.shp' ; run;

LABELLING POLYGONS The %MAPLABEL macro creates an annotate dataset that can be used to label all or some of the polygons in the map. Prior to running any of the annotate macros provided by SAS, you must first run the %ANNOMAC macro. Upon completion, a message in the log indicates that annotate macros are now available. The %MAPLABEL macro has several arguments, and depending on how you want the polygons labeled, you may need to create a new variable. The output annotate dataset will place the label at the centroid of the polygon. %MAPLABEL (map-dataset, attr-dataset,output-dataset,label-var,id-list,font=font_name,color=n,size=n,hsys=n);

map-dataset: name of the map to be annotated. attr-dataset: name of the dataset containing the text to be shown on each ID value. output-dataset: name of the annotate data set created by the macro. label-var: name of the label variable to place on the map (the text for annotate). id-list: ID vars that you would issue in PROC GMAP to create the map. These values need to be on both the map and the attribute data sets. Font: font name.

And Now, Presenting ...NESUG 2008

9

Color: value for the color variable. Size: specifies a value for the size variable. Defaults to 2. Hsys: specifies a value for the hsys variable. Defaults to 3.

%annomac; proc sort data =sasuser.council; by dist_num; run; %maplabel(sasuser.council,sasuser.council,anno1,dist_num,dis t_num,font=swissb, color=red, size= 3);

You can customize the label by creating one or more new variables, or by creating more than one %maplabel output file and conbining the results. For example, to label the polygons with the text “Dist.” on one line and the district number below it, create two annotate datasets and use the annotate variable ‘position’ to place one label below the other. The position variable determines where text is placed in relation to the location defined by x,y variables. Position ‘5’ is centered on the exact location. Position ‘8’ is centered one cell below the location.

/*Create prefix var in map dataset*/ data sasuser.council; set sasuser.council; prefix= 'Dist.' ; run; /*Sort and create 2 annotate datasets*/ proc sort data =sasuser.council; by dist_num; run; %maplabel(sasuser.council,sasuser.council,anno1,prefix,dist_ num,font=swissb, color=red, size= 3); %maplabel(sasuser.council,sasuser.council,anno2,dist_num,dis t_num,font=swissb, color=red, size= 3); /*Assign position = 5 - centered*/ data anno1; set anno1; position= '5' ; run; /*Assign position = 8 - centered, one cell below*/ data anno2; set anno2; position= '8' ; run; /*Combine annotate datasets*/ data anno3; set anno1 anno2; run;

And Now, Presenting ...NESUG 2008

10

ANNOTATE POLYGONS To create annotate dataset to draw polygon boundaries, first remove observations from the traditional map dataset that do not refer to the polygon boundaries using PROC GREMOVE. The GREMOVE procedure combines unit areas defined in a map data set into larger unit areas by removing shared borders between the original unit areas. Then use the data step to create an annotate dataset that has functions to start polygons and continue polygons as needed.

/*Sort dataset with 9 targeted police districts*/ proc sort data =sasuser.police9; by dist_num; run;

/*Remove obs that do not apply to polygon borders*/ proc gremove data =sasuser.police9 out =borders;

by dist_num; id district_;

run;

/* Create annotate dataset for polygon borders*/ data annob;

set borders; by dist_num segment; retain size 3 /*line size*/ color 'blue' /*boundary color*/ xsys '2' /*Map coordinate system*/ ysys '2' when 'a' ; /*Annotate after the map is drawn*/ /* For first point in each polygon set FUNCTION='PO LY '*/ if first.segment then function= 'POLY ' ; /* All other points, FUNCTION='POLYCONT'*/ else function= 'POLYCONT' ; /* Output only obs with non-missing x and y values */ if x and y then output ;

run;

And Now, Presenting ...NESUG 2008

11

ASSIGN BY SPATIAL LOCATION The batch geocode process allows you to assign additional variables to the geocoded cases, typically some polygon boundary that contains the geocoded point. However, the TIGER/Line data is limited to census-related boundaries. Assigning geocoded data to other geographic units is possible, but there are some limitations. One way to accomplish this task is to identify the minimum and maximum values of X and Y (in this case, latitude and longitude), and assign data points based on whether or not they fall within these values. Not all polygons are amenable to this solution, especially those with non-rectangular shape. In this example, council districts are oddly shaped, but voting wards, which fit into council districts, will work fine. Cases are assigned by spatial location to voting wards and then merged with a dataset that contains both ward and council district.

/*Assign cases to voting ward by spatial location ( x,y)*/ /*Use projected voting ward map for assignment*/ proc summary data =sasuser.wardproj nway; class vote00; var x y; output out =assign min =xmin ymin max=xmax ymax n=npoints; run;

/*Macro evaluates location of each case against pol ygon boundaries*/ %macro assign(d,v1,v2,v3,v4); data geocode; set geocode; if (&v1 le x le &v3) and (&v2 le y le &v4) then nam e2=&d; run; %mend;

/*Call macro for each district, providing ward and 4 boundary values*/ /*Executed once for each unique value of voting war d*/ data _null_ ; set assign;

by vote00; call

execute( '%assign(' ||vote00|| ',' ||xmin|| ',' ||ymin|| ',' ||xmax|| ',' ||ymax|| ')' ); run;

proc sort data =geocode; by name2; run; proc sort data =sasuser.wards nodupkey out =wards; by name2; run; data newfile; merge geocode ( in =in1) wards ( in =in2); by name2; if in1; run;

/*Summarize merged data*/

And Now, Presenting ...NESUG 2008

12

proc summary data =newfile nway; class dist_num; output out =a; run; /*Create dataset of valid district numbers*/ proc sort data =sasuser.council out =valid nodupkey; by dist_num ; run;

proc sort data =a; by dist_num; run; proc sort data =valid; by dist_num; run; /*Merge valid data with summarized data, set missin g to zero*/ data cdist; merge valid a; by dist_num; if _freq_= . then _freq_= 0; run;

legend1 label =( f =swissb j =c 'Cases' ) across =1 down=4 frame position =(bottom right inside); title h= 2 f=swissb 'Cases by Council District' ; proc gmap data =cdist

map=sasuser.council ; id dist_num; choro _freq_ / levels =4 legend =legend1 ; run; quit;

EXPORT HIGH RESOLUTION GRAPHS SAS provides several formats for creating image files, and each format has its own default settings that affect how the image is sized. PROC GREPLAY can be used to increase the resolution of output graphs and create external files to store the images. As the defaults between devices vary considerably, it is wise to build graphs using the device driver desired for your final graph.

/*Clear work catalog and reset options*/ proc greplay igout=work.gseg nofs ; delete _all_ ; run; goptions reset =all device =bmp; legend1 label =( f =swissb j =c 'Cases' ) across =1 down=4 frame position =(bottom right inside); title h= 2 f=swissb 'Cases by Council District' ;

And Now, Presenting ...NESUG 2008

13

proc gmap data =cdist map=sasuser.council ; id dist_num; choro _freq_ / levels =4 legend =legend1 ; run;

quit;

The filename statement is used to create a fileref and an output file pathname. The fileref ‘maps’ is used in the gsfname option to direct the output of the procedure. The device is set to .bmp, just as it was when the graphs were created. Graph options are used to improve the resolution of the output graph. The ‘xpixels’ and ‘ypixels’ options increase the resolution by a factor of 6. Without these options, xpixels=584 and ypixels=403 for a final resolution of 96 pixels per inch. Changing the xpixels and ypixels to 3600 and 2400, respectively, increases the resolution by a factor of 6 (approx.) and creates an output file at approximately 600 pixels per inch. The ‘lfactor=6’ option increases the line thickness by the same factor so that the lines do not disappear when the resolution is increased. GREPLAY Procedure: Statement options used in this procedure include:

IGOUT=WORK.GSEG – references the input graphics catalogue: this is where the procedure looks for the graphs to be ‘replayed’. NOFS – suppresses the default catalogue window and executes the procedure in line mode. TC=SASHELP.TEMPLT – identifies the template catalogue. SAS® provides several output templates in the SASHELP.TEMPLT catalogue. TEMPLATE=WHOLE – specifies the template for the output. WHOLE indicates that all graphs in the TREPLAY statement be ‘replayed’ into one graph covering the ‘whole’ output area. TREPLAY Statement – indicates which graphs from the input catalogue to ‘replay’. /*Export map*/ filename maps 'e:\nesug08\map_poly_assign.bmp' ; goptions xpixels =3600 ypixels =2400 device =bmp gsfname =maps lfactor =6; proc greplay igout=work.gseg tc=sashelp.templt

template=whole nofs ; treplay 1:gmap;

run ; quit ;

MORE SAS/GRAPH DEVICES Changing the device driver in the GOPTIONS statement will create the same output map with tool tips for the values of council district and frequency. As stated earlier, different devices have different defaults, and not all procedure options are available with all devices. When changing the device in the goptions statement, some fine-tuning may be required to make the map you want.

/*ACTIVEX GMAP*/ goptions reset =all device =activex; filename odsout 'c:\nesug08' ; ods html path =odsout

file ='council activex.html' ; legend1 label =( f =swissb j =c 'Cases' ) across =1 down=4 frame position =(bottom right inside); title h= 2 f=swissb 'Cases by Council District' ; proc gmap data =cdist

map=sasuser.council ; id dist_num; choro _freq_ / levels =4 legend =legend1 ; run; quit;

ods html close ; ods listing ;

And Now, Presenting ...NESUG 2008

14

REFERENCES Eberhart, M. “Geocoding and PROC GMAP - Tools for Presenting Spatial Data”, available online at http://www.nesug.org/proceedings/nesug07/hw/hw05.pdf Odem, E. and Massengill D. “Cheap Geocoding: SAS/GIS® and Free TIGER® Data”, available online at http://support.sas.com/rnd/papers/sugi30/CheapGeocoding.pdf SAS Mapsonline: http://support.sas.com/rnd/datavisualization/mapsonline/html SAS Institute Inc., SAS 9.1.3 Help and Documentation, Cary, NC: SAS Institute Inc., 2000-2004. US Census, TIGER®, TIGER/Line®, and TIGER®-Related Products, http://www.census.gov/geo/www/tiger/ CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at:

Michael Eberhart, MPH Philadelphia Department of Public Health 1101 Market Street 8th Floor Philadelphia, PA 19107 Work Phone: 215-685-4772 Email: [email protected]

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.

And Now, Presenting ...NESUG 2008