by xiaohui liu...o postgres 9.3.1 - use the 32-bit version as additional computing power is not...
TRANSCRIPT
1
GeoDatabase Practice with PostGreSQL
by Xiaohui Liu
Objectives
• Create a new database schema in Postgres
• Use the Postgres shapefile loader plugin to import data in Esri's shapefile format
• Write and execute queries in Postgres
• Describe the spatial data types built into PostGIS
• Translate features in textual format into PostGIS geometries
• View PostGIS tables as layers in QGIS
• Use QGIS to conduct basic GIS operations
Activities
1. Please download and install Postgres, PostGIS and QGIS (all open-source desktop GIS packages):
o Postgres 9.3.1 - Use the 32-bit version as additional computing power is not necessary for the class (you may
use 64 bit for your project and production).
▪ When prompted to provide a Postgres user name and password, accept the default user name of
'postgres' and specify a password you will remember.
▪ After Postgres is installed, when prompted “want to launch the "Stack Builder," a separate package
that allows for the installation of add-ons to Postgres → answer Yes to this prompt and under the
category of Spatial Extensions, choose PostGIS 2.0.
o PostGIS - You can use a separate installer file if for some reason you do not use the Stackbulder, you can
use this installer. Follow the link → go to PostGIS Windows download → pg92 directory → postgis-pg92-
setup-2.0.3-2.exe
o QGIS - click the Download QGIS link under the Windows heading
Getting Started with PostGIS
• Download the data file PGIS_Data1.zip → unzip it to a folder of your choice
• You will use Postgres's graphical interface pgAdmin. You will import a shapefile, load data from a text file and see
how queries are performed in pgAdmin.
A. Create a new schema
1. Open pgAdmin (Start → PostgreSQL 9.2 → pgAdmin III
a. The application should open with an Object Browser pane displayed on the left side of the window
displaying a tree with Server Groups at the top.
b. Beneath Server Groups, you should see a Servers node and beneath that you should see a Postgres 9.2
server called localhost (which is accessed through port 5432).
2. Double-click on your localhost server and log in using the default user name of postgres.
3. Enter the password you defined earlier for the postgres account when you installed the software. You should now
see 4 nodes beneath the localhost server: Databases, Tablespaces, Group Roles and Login Roles.
4. Expand the Databases list. You should see two "starter" databases: postgres and template_postgis_20. The first
was created when you installed Postgres; the second when you installed PostGIS.
5. Click on template_postgis_20 to connect to the database → Click on the plus sign to view the objects associated
with the database. We will be using the schemas (for a database). When you begin a new project, you'll want to
create a new schema.
6. Expand the Schemas list → there should be only two schemas - public and topology. Create a new schema
before looking into the other existing schemas.
7. Right-click on Schemas → select New Schema → specify the name usa in the dialog box → set the Owner of
the schema to postgres → Click OK.
B. Load data from a shapefile
2
• PostGIS users should convert Esri shapefile format data to PostGIS tables. We will use plug-ins to do this conversion.
1. In pgAdmin → select Plugins → PostGIS Shapefile and DBF loader 2.0.
a. Click the Add File button → navigate to your data folder →Select States.shp → Click Open.
2. Confirm using the View Connection Details button that the PostGIS Connection parameters are set to:
Username: postgres
Password: should already be set to your password
Server Host: localhost (port 5432)
Database: template_postgis_20
3. In the Import List section, supply the following settings by clicking on the current values under each heading:
Schema: usa
Table: states
Geo Column: geom
SRID: 4269
(After setting the last value, click away from the row of specifications to avoid changing the value to its original
value.)
4. Look at the SRID (Spatial Reference IDentification) setting. This ID is set to 0 (spatial reference of the shapefile
is unknown) by default. For now, use the value 4269 (decimal degree/NAD83) for the ID.
5. Click Import → the Log Window area should report that the import process has been completed →Close the
import manager.
6. Expand the object list associated with the usa schema → Click on Tables → the newly imported states table must
be visible under the Properties tab → Right-click on the states table → select View Data → View Top 100
Rows.
a. Refresh the GUI if necessary by right-clicking on the schema or Tables node in the Object browser and
selecting Refresh (or hitting F5 on the keyboard).
b. Note the other options in this context menu.
c. Column headers include column names and data types. The gid column is an auto-incrementing integer
column added by the importer. The presence of [PK] indicates that gid is designated as the primary key.
d. The geom column shows values for only a couple of rows (geometries that too long are not displayed
though stored).
7. Follow the steps to import us_cities shapefile and rename it to cities.
C. Create a new table
• Loading data from a comma-delimited text file is common. Start with creating a new blank table.
1. Right-click on Tables under the usa schema → select New Table
a. Under the Properties tab, set the table's Name to census2010 and the Owner to postgres
b. Under the Columns tab → click Add →Under the Properties tab, set the column's Name to state, its
Data type to character varying and its Length to 50 → Click OK.
i. Add a column with the Name total and Data type of integer
ii. Add following columns of integer type. (Note: follow the order so that data from the text is
inserted to correct column: male, female, white, black, amind, asian, hawaiian, other, mixed )
2. After adding columns → click Constraints tab → Confirm that Primary Key is selected from the drop-down list
near the bottom of the dialog → Click Add.
a. In the New Primary Key dialog, set the Name to census2010_pk.
b. Under the Columns tab, select state from Columns drop-down list → click Add → Click OK to finish
adding the primary key.
3. Click OK on the New Table dialog to finish creating the table.
D. Load data using the COPY command
• Before completing data import, look at the data in a text editor and note its location.
1. Determine the full path name of the location of your census2010.csv file. Copy the exact path of your file
3
a. In pgAdmin → Click SQL → Open Query builder dialog → in the SQL Editor text box, enter the
following command,
b. COPY usa.census2010 FROM 'ENTER YOUR FILE PATH AS COPIED' WITH (FORMAT csv,
HEADER True, QUOTE ' " '); File path should look like: 'C:\temp\PGIS_Data1\census2010.csv'
2. Look at the options in the WITH clause. In case of tab-delimited data, the FORMAT setting would be
text. Because the data contains a header row, HEADER option is set to True. The QUOTE option was set to the
double-quote character to indicate that the input file encloses text strings with that character. A number of other
options are available on the documentation. Among the other options that you may need to set is DELIMITER,
which defaults to comma for csv files and tab for text files. If your file uses another delimiter, such as the pipe
character (|), you can indicate that using the DELIMITER option.
3. COPY command inserts values from the first column of the input file into the first column of the table, values
from the second column of the input file into the second column of the table, etc. The HEADER option simply
tells Postgres to skip the first line, not to read the column headers and intelligently match the columns of the input
file to the columns of the table. If your table happens to have more columns than the input file and/or the columns
are in a different order, you can deal with this by supplying a list of column names that matches the input file after
the table name. For example:
COPY usa.census2010 (state, total, male....) FROM ....
4. Click Execute query to execute the command. You should receive a successful implementation message (there
should be 52 rows).
a. If you encounter a "permission denied" error, the "postgres" database login doesn't have permission to
read the csv file where you downloaded it. Try copying it to a sub-directory belonging to the "Public"
user (e.g., 'C:\Users\Public\Public Documents') or to a location that has no permission restrictions (e.g.,
'C:\temp').
E. Write queries in pgAdmin
1. Click on the SQL button on the pgAdmin toolbar to open the Query GUI. You can write your own query or
build one the Graphical Query Builder.
2. In the SQL Editor box, write the query to identify the states where most of the population uses the term
'Soda' when referring to soft drinks:
3. Run the query by clicking the Execute query ("play") button on the toolbar. You should receive an error relation
"states" does not exist. This is because the states table is part of usa schema. You must specify the schema name
followed by table name: usa.states in the FROM syntax.
4. You can reset pgAdmin's search path to your schema (By default, pgAdmin searches only the public schema.).
a. Cut the query from SQL editor and paste it in a text pad.
b. Enter the following statement into the SQL Editor: SET search_path TO usa, public;
5. Click Execute query → this query will set usa schema in the public schema.
a. Write the following query: SHOW search_path; to find out what the search path is.
6. Rewrite the query to select states. You should get 17 states in the Output pane. pgAdmin converts all
table/column names to lower-case prior to execution by default. So, pay attention to table name as SQL is case
sensitive. To override this case conversion, you can put the table/column name in double quotes. SELECT name,
sub_region FROM "states" WHERE sub_region = 'Soda';
F. Query-Writing Practice Exercises
• Implement the following queries using the following tables: the 2008 population data, soft drink data and geometries
are in the states table; the 2010 data are in the census2010 table.
1. Select the states with a 2008 population over 10 million.
2. Select the state capitals.
3. Select the states whose names begin with the word "New".
4. Select the cities whose names contain the letter "z".
5. Sort the states by their 2008 population from high to low.
6. Sort the states first by soft drink name, then by state name.
4
7. Select the states with a 2008 population over 10 million and where the majority of population refers to soft drinks
as pop.
8. Select cities in the states of NY, NJ and PA (using the stateabb column).
9. For each state, compute the percentage of the 2010 population that is white. Give this output column an alias of
pctwhite. Besides pctwhite, include only the name of the state in the output. Note: the columns involved in this
calculation are defined as integers, which means the resulting value will be rounded to the nearest integer (0). To
avoid this rounding and obtain the desired percentages, add '::double precision' after the white column in the
calculation. This will convert the integer values to double precision values prior to the calculation. It is only
necessary to perform this conversion for one of the columns involved in the calculation.
10. Sum the 2008 state populations across the soft drink categories (i.e., What is the population of the 'soda' states? Of
the 'pop' states? Of the 'coke' states?)
11. Bring together data from the states and census2010 tables, outputting the name from the states table, total
population from the census2010 table and geom from the states table. Look into the syntax for INNER JOIN
12. Calculate the average 2010 male population across the soft drink categories.
G. Introduction to Spatial Select Queries
• Spatial data sets help answer geometric and topological questions.
1. Go to pgAdmin → Query → Execute the following query:
SELECT name, ST_Centroid(geom) AS centroid
FROM states
WHERE sub_region = 'Soda';
i. This query calls a function called ST_Centroid() that accepts inputs and returns outputs. The geom
column is used as an input to the function to get geometric centers of the shapes.
ii. The output is a point in the coordinate system of the input column, but expressed in hexadecimal
notation. To display the coordinate values in a readable form, convert it to a text using the function
ST_AsText().
2. Modify the query as follows, then execute:
SELECT name, ST_AsText(ST_Centroid(geom)) AS centroid
FROM states
WHERE sub_region = 'Soda';
3. Modify the query as follows, then execute:
SELECT name, ST_AsText(ST_Centroid(geom)) AS centroid, ST_Area(geom) AS area
FROM states
WHERE sub_region = 'Soda';
i. The values returned by ST_Area() are in the units of the input geometry (squared). As the coordinate
system is in latitude/longitude coordinates, the area is in square degrees. Use the ST_Transform()
function to reproject the geometry to a different spatial reference.
4. Modify the query as follows, then execute:
SELECT name, ST_AsText(ST_Centroid(geom)) AS centroid, ST_Area(ST_Transform(geom,2163))???? AS area
FROM states
WHERE sub_region = 'Soda';
i. The code 2163 is an equal-area projection in meters that is suitable for the continental US
ii. ST_Transform() re-projects input geometries in memory only; the input geometries stored in the table
remain in the same spatial reference they had before.
iii. "ST_" at the beginning of these functions stands for spatial-temporal.
H. PostGIS Geometry Types
5
• Though polygon, line and point are the three main geometry types, other geometry types are available in PostGIS
that allow storage of multipart shapes, 3-dimensional shapes and shapes that have a measure (or M value)
associated with its vertices.
• You will create a new schema to store points of interest in New York City. Instead of using Shapefile Loader to
both create and populate a table at the same time, you will do these steps separately.
I. Create a new empty spatial table
1. pgAdmin → Object browser → right-click on the Schemas node beneath the template_postgis_20 database →
select New Schema.
i. Set the Name →nyc_poi
ii. Owner →postgres → Click OK.
2. Expand the object listing associated with the nyc_poi schema → Right-click on Tables → select New Table
i. Properties → table's Name →pts and Owner →postgres.
ii. Columns → Add →New Column → Name to gid and the Data type to serial. (This data type is roughly
equivalent to the AutoNumber type in Access.) →Click OK.
iii. Create a column name; Data type →character varying and its Length →50.
iv. Last column will be to hold geometries. Though a column of type 'point' through the GUI can be added, a
number of settings should be made such as its spatial reference ID or SRID. These settings are all handled
by a PostGIS maintenance function called AddGeometryColumn().
3. To specify gid column is the table's primary key: New Table dialog → click Constraints →Confirm that
Primary Key is selected from the drop-down list near the bottom of the dialog → Click Add.
i. New Primary Key → set the Name to pts_pk.
ii. Columns tab →select gid from the Column drop-down list → Click Add → Click OK →Click OK
again to dismiss the New Table dialog and create the table.
4. Reset the search path of pgAdmin to nyc_poi by executing the following statement in the Query dialog:
SET search_path TO nyc_poi, public;
5. Add a geometry column called geom to the table by executing this statement:
SELECT AddGeometryColumn('nyc_poi','pts','geom',4269,'POINT',2);
i. SELECT is used in this case because SQL rules don't allow for invoking functions directly. Function
calls must be made in one of the statement types we've encountered so far (SELECT, INSERT,
UPDATE or DELETE). In this situation, a SELECT statement is the most appropriate.
ii. Function arguments are: the schema name, the table name, the name to be given to the geometry
column, its spatial reference ID, the type of geometry it will store and the dimension of the
coordinates it will hold.
iii. dimension argument: if storing X and Y coordinates, then the dimension value is 2. In case of 3
dimensions, it will be 3. It is also possible to store some type of measure (M value) with each point
(e.g., the time the point was recorded with a GPS unit). In that scenario, you would also assign a
dimension value of 3, but would differentiate that XYM type of point from an XYZ point by setting
the geometry type to POINTM instead of POINT. Finally, it is possible to store all four values (X, Y,
Z and M). In that situation, you would assign a dimension value of 4.
II. Add rows to the spatial table
You will use INSERT statements to insert rows. Copy and paste them rather than typing manually. Can you store it in a
SQL file to implement all insertions one time???
1. SQL statement to insert a row into the pts table.
INSERT INTO pts (name, geom)
VALUES ('Empire State Building', ST_GeomFromText('POINT(-73.985744 40.748549)',4269))
6
i. The ST_GeomFromText() function will convert a geometry supplied in text format to the
hexadecimal form that PostGIS geometries are stored in.
ii. The other argument “4269”is the spatial reference of the geometry. As the geom column using
AddGeometryColumn() was created with a constraint that values in that column must be in a
particular spatial reference (which we specified as 4269).
2. Execute the following statements to add a couple more rows to the table. You can write all the insert statements
simultaneously.
INSERT INTO pts (name, geom)
VALUES ('Statue of Liberty', ST_GeomFromText('POINT(-74.044508 40.689229)',4269));
INSERT INTO pts (name, geom)
VALUES ('World Trade Center', ST_GeomFromText('POINT(-74.013371 40.711549)',4269));
3. Another example of adding multiple rows using a single statement.
INSERT INTO pts (name, geom)
VALUES ('Radio City Music Hall', ST_GeomFromText('POINT(-73.97988 40.760171)',4269)),
('Madison Square Garden', ST_GeomFromText('POINT(-73.993544 40.750541)',4269));
4. pgAdmin → right-click on the pts table → select View Data →View All Rows to confirm that the INSERT
statements executed properly.
III. Create and populate a table of linestrings
1. Repeat the steps in Part I above to create a new table that will hold NYC line features.
o Give the table a name of lines.
o The table should have the same column definitions, but the geometry type should be set to LINESTRING
rather than POINT:
SELECT AddGeometryColumn('nyc_poi','lines','geom',4269,'LINESTRING',2)
o Search path is already set. So, don’t have to do it, but you can check.
2. Execute the following statement to insert 3 new rows into the lines table:
3.
INSERT INTO lines(name,geom)
VALUES
('Holland Tunnel', ST_GeomFromText('LINESTRING(-74.036486 40.730121,-74.03125 40.72882,-74.011123
40.725958)',4269)),
('Lincoln Tunnel', ST_GeomFromText('LINESTRING(-74.019921 40.767119,-74.002841 40.759773)',4269)),
('Brooklyn Bridge', ST_GeomFromText('LINESTRING(-73.99945 40.708231,-73.9937 40.703676)',4269));
i. The syntax for constructing linestrings is similar to that of points, but you supply more than one pair
of lat/lon to delineate the feature (with the pairs being connected sequentially by straight line
segments). The longitude (X) is followed by latitude (Y) by a space. Each lon/lat pair is separated by
comma.
ii. There may be curvy lines with too many lat/lon pairs in real world.
IV. Create and populate a table of polygons
1. Repeat the steps to create a polygon table, but give the table name of polys and set the geometry columns to
POLYGON.
2. Execute the following statement to add a row to your polys table:
INSERT INTO polys (name, geom)
VALUES ('Central Park',ST_GeomFromText('POLYGON((
-73.973057 40.764356,
The pink () means it may contain multiple
interior rings (multiple polygons inside)
7
-73.981898 40.768094,
-73.958209 40.800621,
-73.949282 40.796853,
-73.973057 40.764356))',4269));
i. The syntax is similar to the syntax for line strings, but
a. The first lon/lat pair should be the same as the last (to close the polygon).
1. The coordinate list is enclosed in an additional set of parentheses because polygons are composed
of potentially multiple rings and every polygon has a ring that defines its exterior.
2. In case of polygons with rings inside, the exterior ring is supplied first followed by the
interior rings. Each ring is enclosed in a set of parentheses and the rings are separated by
commas. For example, add Central Park again with the reservoir near its center removed.
3. Remove the original Central Park row: pgAdmin →right-click on the polys table → select Truncate to delete all
rows from the table.
4. Add Central Park (minus the reservoir) back into the table using this statement:
INSERT INTO polys (name, geom)
VALUES ('Central Park',ST_GeomFromText('POLYGON((
-73.973057 40.764356, -73.981898 40.768094, -73.958209 40.800621, -73.949282 40.796853, -73.973057
40.764356), (-73.966681 40.785221, -73.966058 40.787674, -73.965586 40.788064, -73.9649 40.788291, -
73.963913 40.788194, -73.963333 40.788291, -73.962539 40.788259, -73.962153 40.788389, -73.96181
40.788714, -73.961359 40.788909, -73.960887 40.788925, -73.959986 40.788649, -73.959492 40.788649, -
73.958913 40.78873, -73.958269 40.788974, -73.957797 40.788844, -73.957497 40.788568, -73.957497
40.788259, -73.957776 40.787739, -73.95784 40.787057, -73.957819 40.786569, -73.960801 40.782394, -
73.961145 40.78215, -73.961638 40.782036, -73.962518 40.782199, -73.963076 40.78267, -73.963677
40.783661, -73.965694 40.784457, -73.966681 40.785221))',4269));
V. 3- and 4-dimensional geometries
• You won’t create any 3-dimensional (XYZ and XYM) and 4-dimensional (XYZM) geometries, but check out the
syntax.
• Define a column that can store M values as part of the geometry, use the POINTM, LINESTRINGM and
POLYGONM data types. When specifying objects of these types, the M value should appear last. For example,
an M value of 9999 is attached to each coordinate in these features from nyc_poi schema:
POINTM(-73.985744 40.748549 9999)
LINESTRINGM(-74.019921 40.767119 9999, -74.002841 40.759773 9999)
POLYGON((-73.973057 40.764356 9999, -73.981898 40.768094 9999, -73.958209 40.800621 9999, -
73.949282 40.796853 9999, -73.973057 40.764356 9999)
• The most common usage of M coordinates is in linear referencing (e.g., to store the distance from the start of a
road, power line, pipeline, etc.). Find a good article about this topic of linear referencing.
• To define a column capable of storing Z values along with X and Y, use the "plain" POINT, LINESTRING and
POLYGON data types rather than their "M" counterparts. The syntax for specifying an XYZ coordinate is the
same as that for an XYM coordinate, but the "plain" data type indicates the third coordinate is for Z value. E.g.,
you can include sea level elevation in the coordinates for the Empire State Building (in feet): POINT(-73.985744
40.748549 190)
• If you want to store both Z and M values, again use the "plain" POINT, LINESTRING and POLYGON data
types. The Z value should be listed third and the M value last. For example:
POINT(-73.985744 40.748549 190 9999)
VI. Multipart geometries
8
• PostGIS provides support for features with multiple parts through the MULTIPOINT, MULTILINESTRING and
MULTIPOLYGON data types.
• E.g, state of Hawaii is composed of multiple disconnected islands. The syntax for specifying a
MULTIPOLYGON builds upon the rules for a regular POLYGON; the parts are separated by commas and an
additional set of parentheses is used to enclose the full coordinate list.
• The footprints of the World Trade Center Towers 1 and 2 (now fountains in the 9/11 Memorial) can be
represented as a single multipart polygon as follows:
MULTIPOLYGON(((-74.013751 40.711976, -74.01344 40.712439, -74.012834 40.712191, -74.013145
40.711732, -74.013751 40.711976)), ((-74.013622 40.710772, -74.013311 40.711236, -74.012699 40.710992, -
74.013021 40.710532, -74.013622 40.710772)))
• The Z values and M values are also supported for multipart geometries. The "MULTI" data types have "M"
counterparts too: MULTIPOINTM, MULTILINESTRINGM and MULTIPOLYGONM.)
VII. Mixing geometries
• The tables you have created use a Esri-centric design: with each table storing a single column of homogeneous
geometries (i.e., all points, all lines, or all polygons, but not a mix). PostGIS supports two design approaches:
• It is possible to store multiple geometry columns in a table. This capability could be used to store data in two
or more different spatial reference systems. Note: this should be done under limited circumstances given the
existence of the ST_Transform() function and the additional maintenance such a design would require.)
• It is possible to store multiple geometry types in a single column. This capability can simplify schema design
and certain types of queries, though there are cons to this approach, such as the fact that some third-party tools
can't deal with mixed-geometry tables.
• Implement this heterogeneous column approach to store all of our nyc_poi data in the same table.
1. Repeat the steps in Part I above to create a new table. Pay particular attention to these differences:
o Give the table a name of mixed.
o The table should have the same column definitions, with the exception that the geometry type should be
set to GEOMETRY rather than POINT.
2. Add the same features to this new table by executing the following statement:
INSERT INTO mixed (name, geom)
VALUES ('Empire State Building', ST_GeomFromText('POINT(-73.985744 40.748549)',4269)),
('Statue of Liberty', ST_GeomFromText('POINT(-74.044508 40.689229)',4269)),
('World Trade Center', ST_GeomFromText('POINT(-74.013371 40.711549)',4269)),
('Radio City Music Hall', ST_GeomFromText('POINT(-73.97988 40.760171)',4269)),
('Madison Square Garden', ST_GeomFromText('POINT(-73.993544 40.750541)',4269)),
('Holland Tunnel',ST_GeomFromText('LINESTRING(-74.036486 40.730121, -74.03125 40.72882, -74.011123
40.725958)',4269)),
('Lincoln Tunnel',ST_GeomFromText('LINESTRING( -74.019921 40.767119, -74.002841 40.759773)',4269)),
('Brooklyn Bridge',ST_GeomFromText('LINESTRING( -73.99945 40.708231, -73.9937 40.703676)',4269)),
('Central Park',ST_GeomFromText('POLYGON(( -73.973057 40.764356, -73.981898 40.768094, -73.958209
40.800621, -73.949282 40.796853, -73.973057 40.764356))',4269));
3. pgAdmin → Right-click on the mixed table → Select View Data → View All Rows to confirm that the INSERT
statement executed properly.
I. Viewing Data in QGIS
Quantum GIS (QGIS) is a free and open-source desktop GIS package analogous to Esri's ArcMap and supports viewing
PostGIS data and has strong cartographic capabilities. OpenJUMP is another desktop application often used in
combination with PostGIS though its strengths are in spatial querying and geoprocessing.)
I. Add PostGIS data to QGIS
1. Click on Start → All Programs → Quantum GIS Lisboa → Quantum GIS Desktop (1.8.0) to launch QGIS.
9
2. The Manage Layers toolbar provides access to a set of buttons used to add data to the project (the buttons with
plus signs). Moving from left to right:
o Add Vector Layer - for data that is supported by the OGR library, an open-source library that allows
working with vector data in many different formats including Esri shapefiles and geodatabases
o Add Raster Layer - for data that is supported by the GDAL library the raster counterpart to the OGR
library
o Add PostGIS Layers - as the name implies, for data stored in PostGIS tables
o Add SpatiaLite Layer - for data stored in a SpatiaLite [1] database (another free and open-source spatial
database extension similar to PostGIS, built to add spatial functionality to an RDBMS called SQLite [2])
o Add MSSQL Spatial Layer - for data stored in Microsoft SQL Spatial Server spatial databases
o Add WMS Layer - for data streamed via a Web Mapping Service (WMS)
o Add WFS Layer - for data streamed via a Web Feature Service (WFS)
3. Click the Add PostGIS Layers button → Add PostGIS layers →Click New button.
4. In the Create a New PostGIS connection dialog supply the following information.
o Name: Lesson3
o Service - (leave blank)
o Host: localhost
o Port: 5432
o Database: template_postgis_20
o SSL mode: disable
o Username: postgres
o Password: <the password you established earlier in the lesson>
5. Click Test Connection to make sure you have typed things correctly → Click OK.
6. Lesson3 name should appear in the pick list in the Connections section of Add PostGIS layers dialog.
10
7. Click Connect → you should see something like the following, includes nyc_poi in the Schema list.
8. Expand the list of layers to see all the tables you have created so far.
9. Highlight all 6 nyc_poi layer rows → click Add. Your data should display in the map display area and the layers
listed in the Layers pane on the left side of the window (just like ArcMap's Table of Contents).
I. Explore basic functions of QGIS
1. Zoom to full extent View →Zoom Full.
2. Reorganize the layers: point, line, polygon.
3. The pts, lines and polys layers are redundant with the three layers based on the mixed table, so turn off pts, lines
and polys.
4. Do you see any difference between the polys layer and the mixed layer for polygon for central park?
5. Rename the mixed table layers as mixed - lines, mixed - pts and mixed - polys.
i. See that each geometry type has a separate layer
6. Right-click on the mixed - lines layer and select Query.
i. In the SQL where clause
a. You see this expression: upper(geometrytype("geom")) IN ('LINESTRING,'MULTILINESTRING')
b. It was automatically created when we chose to add the lines from the mixed table. The geometrytype()
function returns the type of each geometry in the geom column (returning 'POINT', 'LINESTRING' or
'POLYGON', in this case). That value is then passed to the upper() function to ensure that it is all upper-
case. If the value matches one of the items in the IN clause (in this case, 'LINESTRING' and
'MULTILINESTRING'), the associated row is included in the layer. If the value does not match, then the
11
row is excluded.
The mixed - pts layer similarly is defined by a query that looks for geometries of type POINT and
MULTIPOINT; mixed - polys by a query that looks for POLYGON and MULTIPOLYGON.
Check the Query property of the pts, lines and polys layers. Do you see this expression?
7. You need to extract only the tunnels from the mixed-lines layer. Go to mixed - lines Query dialog. Use the GUI
buttons and/or type directly into the box
"name" LIKE '%Tunnel' AND upper(geometrytype("geom")) IN ('LINESTRING','MULTILINESTRING')
You should see that the Brooklyn Bridge is no longer displayed as part of the layer.
8. Play with other tools available on: Map Navigation toolbar, Attributes toolbar, Identify Features tool,
9. Layer Properties dialog, Style tab, Actions tab, Joins tab, Diagrams tab.
10. Right-click on the mixed - pts layer → Open Attribute Table → Enter Statue in the Look for text box → select
name from the in drop-down list → click Search. This is a quick way to select features. For more complicated
queries, a Search query builder dialog can be invoked by clicking the Advanced search button.
11. Click Advanced search → enter the following example expression: gid < 4 → Click OK.
12. Finally, select Settings → Project Properties
i. Explore the General tab, the Coordinate Reference System (CRS) tab.
13. Click on the Coordinate Reference System (CRS) tab →Check the Enable 'on the fly' CRS transformation
box.
14. The Coordinate reference system of the world list has two categories Geographic Coordinate Systems and
Projected Coordinate Systems. Re-project your data into the New York East State Plane system.
i. Note: QGIS doesn't include a State Plane sub-category in its Projected Coordinate System list. You
should, however, find all of the state plane projections in either the Lambert Conformal Conic or
Transverse Mercator sub-categories.
15. Find and expand the Transverse Mercator category, then scroll down, select NAD83 / New York East EPSG:
32115 and click OK.
16. View →Zoom Full to see all your data. The coordinate readout at the bottom of the application window now
reports values in the ballpark of 190000,210000.
17. Save your project by clicking the Save Project button → name it Lesson3 with file for project files .qgs.
J. Working with Views
You have done queries using MS Access that can be stored. Similar things can be done in Postgres and they are called
views (though it is not the same conceptually).
I. Create a view based on attribute criteria
1. In the pgAdmin Query dialog, execute the following query (which identifies the state capitals):
SELECT * FROM usa.cities WHERE capital = 1 ORDER BY stateabb;
2. After confirming that the query returns the correct rows, copy the SQL to the clipboard.
3. Go to pgAdmin window → usa schema → right-click on the Views node → select New View → Set the view's
Name to vw_capitals and Owner to postgres → Click on the Definition tab → paste the SQL statement held on
the clipboard → Click OK.
4. Select View → Refresh → Right-click on vw_capitals and select View Data →View All Rows. Any updates
made to the table will be displayed in the view.
II. Build a query based on a view
12
1. In the pgAdmin Query dialog, execute the following query (which identifies the relatively small capitals):
SELECT * FROM usa.vw_capitals WHERE popclass = 2;
III. Create a view based on a spatial function
Create views that re-project our states and cities data on the fly.
1. Execute the following query:
SELECT gid, name, pop2008, sub_region, ST_Transform(geom,2163) AS geom FROM usa.states;
Does not work here?? 2. Follow the procedure outlined above to create a new view based on this query. Assign a name of vw_states_2163
to this view.
3. Repeat this process to create an on-the-fly re-projection of the cities data called vw_cities_2163. Define the view
using the following query:
SELECT *, ST_Transform(geom,2163) AS geom_2163 FROM usa.cities;
IV. Display views in QGIS
You can add views as layers.
1. Follow the same steps you followed above to get to usa schema.
2. Expand the object list associated with the usa schema → You should see the original cities and states tables and
vw_capitals, vw_states_2163 and two versions of vw_cities_2163.
You see two versions of vw_cities_2163 because that view outputs all of the columns from the cities table
(including geom) plus a column of geometries re-projected into SRID 2163 (geom_2163).
Add these layers to the QGIS project and play around with the view (zoom to the extent of the different layers,
and turn the views on and off), you'll note that cities, states, vw_capitals and vw_cities_2163 (the one based on
the original geom column) align correctly with one another, but they do not align with the other vw_cities_2163
(the one based on the geom_2163 column) and vw_states_2163 due to projection difference. You need on-the-fly
re-projection.
3. Click Settings → Project Properties → Coordinate Reference System (CRS) tab → check the Enable 'on the
fly' CRS transformation box → Click OK.
K. Spatial Reference Considerations
I. Spatial Reference ID lookup
• Populating a geometry column with usable data requires specifying the spatial reference of the data. You can re-
project using the ST_Transform() function, but you had to use SRID (Spatial Reference ID). These IDs come
from the spatial reference IDs defined by the European Petroleum Survey Group (EPSG). How to find these Ids?
Using pgAdmin
SRID are stored in a Postgres table in the public schema called spatial_ref_sys.
1. In pgAdmin → spatial_ref_sys table → view top 100 rows → check the srid and srtext columns.
2. One way to find an SRID is to query the spatial_ref_sys table.
3. Open the query dialog and execute the following query:
SELECT srid, srtext FROM spatial_ref_sys
WHERE srtext LIKE '%Mississippi%';
This query shows the SRIDs of each Mississippi-specific spatial reference supported in PostGIS.
13
Using QGIS
Another way to find SRIDs is to look them up in QGIS.
1. QGIS → Settings > Project Properties → Under the Coordinate Reference System (CRS) tab
2. Expand the Projected Coordinate Systems → Universal Transverse Mercator (UTM) category.
3. Scroll down through the list and find NAD83 / UTM zone 18N.
IN QGIS 2.6, Settings→Options→CRS→CRS for new Layers→use default CRS displayed below→”Select”
button
4. On the right side of the dialog, you should see a column called Authority ID. Note that most of the Authority ID
values are prefixed with EPSG, which means those are the values you should use in PostGIS. In this case, you
would find that the desired SRID, for UTM NAD83 Zone 18N, is 26918.
Using the Prj2EPSG service
The website http://prj2epsg.org/ [1] provides an easy-to-use interface for finding EPSG IDs. You can upload a .prj file
(used by Esri to store projection metadata) and get back the matching EPSG ID. The site also makes it possible to enter
search terms.
II. Geometry metadata in PostGIS
The new version of the PostgreSQL has view, which used to be the geometry_columns table and is found in the Views list
of the Public schema. What is a table?
Pre-version 2.0 of PostGIS: The public schema contains a table called spatial_ref_sys that stores all of the spatial
references supported by PostGIS. Another important table in that schema is geometry_columns. This table stores the
metadata, the parent schema, the parent table, the geometry column’s name, the coordinate dimension, the SRID and the
geometry type (e.g., POINT, LINESTRING, etc.). Being able to conduct spatial analysis with PostGIS requires accurate
data in this table, so it’s best not to go mucking around in it unless you’re confident you know what you’re doing. You
used the AddGeometryColumn() function instead of adding the geometry column through the table definition GUI. An
important reason for adding the column in that manner is that it adds a record to the geometry_columns table, something
that would not happen if we had used the GUI.
III. Spherical measurements and the geography data type
• What is the difference between Cartesian (2D) measurement versus spherical (3D) measurement. For example,
the PostGIS function ST_Distance() can be used to calculate the distance between two geometries. For
geometries you used so far, ST_Distance() will calculate distances in Cartesian space. This is OK at a local or
regional scale (earth’s curvature is minimum), but over a continental or global scale a significant error would
result.
• To address this issue, you can use ST_Distance_Spheroid(): calculates the minimum great-circle distance
between two geometries.
• You can also use a data type called geography. Unlike the geometry data type, the geography data type is
meant for storing only latitude/longitude coordinates. In this data type, the measurement functions like
ST_Distance(), ST_Length() and ST_Area() will return measures calculated in 3D space rather than 2D space.
The disadvantage is that the geography data type is compatible with a significantly smaller subset of functions as
compared to the geometry type. Calculating spherical measures can also take longer than Cartesian measures
since the mathematics involved is more complex.
• This data type: geography can simplify data handling for projects covering a continental-to-global scale. For
projects covering a smaller portion of the earth’s surface, better to use geometry data type.
Project 3: Mapping the Class Roster • Map the hometowns of everyone in your class based on their zip code centroids using Postgres/PostGIS and
QGIS.
• The zip file holds a counties shapefile and a comma-separated values file called postal_codes.txt that stores U.S.
and Canadian postal codes along with the coordinates of their centroids.
14
• Here are the broad steps you should follow:
1. Import the counties shapefile into Postgres.
2. Create a table to store the postal code centroids.
3. Load the centroid data from the text file.
The format of the code in postal_codes text file (double quotes) causes syntax error when copying to
prj.postcode, so I first import it into excel, and then copy the data as csv file.
15
Here is the SQL statement used to copy postal_code.csv to postgres:
COPY prj.postcode FROM 'C:\Xiaohui\GeoDB\w8\PGIS_Data1\postal_codes.csv' WITH (FORMAT csv,
HEADER True);
4. Add a geometry column and use an UPDATE query to populate it with POINT geometries.
1) SELECT AddGeometryColumn('prj','postcode','geom',4269,'POINT',2);
16
2) use an UPDATE query to populate it with POINT geometries:
UPDATE prj.postcode
SET geom = ST_SetSRID(ST_MakePoint(postcode.long,postcode.lat),4269);
3) After populated the point geometry, the prj.postcode table is shown below:
17
5. Create a table to store the class roster.
6. 1) The format of the names in class_roster text file (double quotes) causes syntax error when copying to
prj_classroster, so I first import it into excel, and then copy the data as csv file. Load the class roster from
class_roster.csv file.
18
2) Here is the SQL statement used to copy class_roster.csv to postgres:
COPY prj.classroster(last,first,postalcode)
FROM 'C:\Xiaohui\GeoDB\w8\PGIS_Data1\class_roster.csv'
WITH (FORMAT csv,DELIMITER ',');
3) The updated prj.classroster is shown below:
19
7. Determine a way to associate the geometries in the postal codes table with the records in the roster table.
1) Associate the geometries
SELECT classroster.last,classroster.first,postcode.geom,postcode.code
FROM prj.postcode,prj.classroster
WHERE postcode.code = classroster.postalcode
20
2) Create a view to store the result from the above query
a. In the pgAdmin Query dialog, execute the following query (which identifies the state capitals):
SELECT classroster.last,classroster.first,postcode.geom,postcode.code
FROM prj.postcode,prj.classroster
WHERE postcode.code = classroster.postalcode
b. After confirming that the query returns the correct rows, copy the SQL to the clipboard.
c. Go to pgAdmin window → prj schema → right-click on the Views node → select New View → Set the
view's Name to vw_postal and Owner to postgres → Click on the Definition tab → paste the SQL
statement held on the clipboard → Click OK.
d. Select View → Refresh → Right-click on vw_postal and select View Data →View All Rows. Any updates
made to the table will be displayed in the view.
8. Use QGIS to create a map of the hometowns. Include state and county boundaries for some context.
21
Tips:
• The copy command can be used to load data from a text file into a table. Use the Postgres documentation [2] to
determine the correct usage for this command. Pay attention to the following options: FORMAT, HEADER,
DELIMITER and QUOTES. The copy command expects the columns of the "to" table to match and be in the
same order as the columns of the "from" file.
• You can assume that the centroid coordinates are in the NAD83/Geographic coordinate system.
• Assuming your postal codes table contains columns named lat and lon, you could use the following expression in
an UPDATE statement to create a textual representation of the POINT geometry:
'POINT(' || lon || ' ' || lat || ')'
Note that the double-pipe character string (||) is the concatenation operator in Postgres.
• Tables must have a primary key column defined in order to be added as a layer in QGIS. Also, it is important that
this primary key column be an integer.
• There are multiple ways to address step 7, some more well-designed than others. The ideal design would allow
you to make changes to the class roster table (e.g., add new students or change erroneous postal codes) and those
changes will automatically be reflected in QGIS (i.e., it should not be necessary to update a geometry column
when a new student is added). Coming up with this design will be worth 10% of the total project points.
• Draw the state and county boundaries in light hues so they don't distract from the main thing the map is meant to
convey.
Deliverables
1. A map for the final output. The map should have all the cartographic requirements.
2. Describe the process you implemented to get to the map
o Ease of making updates/insertions to the class roster (10 of 100 points)Thanks to PSU online education
for the information for the lab.
22
Homework 9
GHY 467/567 – Spring 2015
DUE: April 14, 2015
Email the answers to [email protected]
PostGIS Spatial Function Practice Exercises
1. Add Times Square to the nyc_poi pts table (located at 40.757685,-73.985727).
INSERT INTO nyc_poi.pts(name,geom) VALUES ('Times Square', ST_SetSRID(ST_MakePoint(40.757685,-
73.985727),4269))
2. Report the number of points in the geometries of states with a 2010 population over 10 million.
SELECT name, ST_NPoints(geom)
FROM usa.states INNER JOIN usa.census2010
ON states.name = census2010.state
WHERE census2010.total > 10000000;
3. Perform an on-the-fly re-projection of your nyc_poi pts data into the New York East NAD27 state plane
coordinate system.
In QGIS, settings→options→CRS→Default CRS for new projects→Select →use “new york” as the filter and find
the “New York East NAD27 state plane coordinate system” shown in the figure:
23
The SRID is 32015.
SELECT name, ST_Transform(geom,32015) FROM nyc_poi.pts;
4. Output the US cities coordinates in human-readable format, including the SRID.
SELECT name,ST_AsEWKT(geom)
FROM usa.cities
24
5. Select the names and centroids of states where the 2010 male population is greater than the female population.
SELECT name,ST_AsText(ST_Centroid(geom))
FROM usa.states INNER JOIN usa.census2010
ON states.name = census2010.state
WHERE male>female
6. List the states that contain a city named Springfield (based on points in the cities table).
SELECT states.name AS states,cities.name AS city
FROM usa.cities CROSS JOIN usa.states
WHERE ST_Contains(ST_SetSRID(states.geom,4269), cities.geom)
AND cities.name LIKE '%Springfield%'
25
7. List the cities that are found in ‘Soda’ states. Sort the cities first by state name, then city name.
SELECT states.name AS states,cities.name AS city
FROM usa.cities CROSS JOIN usa.states
WHERE ST_Contains(ST_SetSRID(states.geom,4269),cities.geom)
AND states.sub_region = 'Soda'
ORDER BY states.name,cities.name
8. Select features from the nyc_poi lines table that are within 1 mile of Madison Square Garden.
SELECT lines.name AS LineFeature
FROM nyc_poi.lines CROSS JOIN nyc_poi.pts
WHERE ST_DWithin(ST_Transform(lines.geom,4269),ST_Transform(pts.geom,4269),5280*2)
AND pts.name = 'Madison Square Garden'
9. Calculate the distance in kilometers (based on a sphere) between all of the state capitals.
SELECT cityA.name,cityB.name,ST_Distance_Sphere(cityA.geom,cityB.geom)/1000 AS dist_sphere
FROM usa.cities As cityA CROSS JOIN usa.cities AS cityB
26
WHERE cityA.capital = -1 AND cityB.capital = -1
10. Find the SRID of the Pennsylvania North NAD83 state plane feet coordinate system.
SELECT * FROM spatial_ref_sys
WHERE srtext LIKE '%Pennsylvania North%'
11. Using data in the 'usa' schema, select states containing cities that have a 'popclass' value of 4 or 5. Pretend that
the stateabb column in the cities table doesn't exist. Don't fret if your results include duplicate states. How to
handle duplicate results?
SELECT DISTINCT states.name AS states,cities.name AS city
FROM usa.cities CROSS JOIN usa.states
WHERE ST_Contains(ST_SetSRID(states.geom,4269),cities.geom)
AND cities.popclass = 4 OR cities.popclass = 5
27
Project 4: Jen and Barry's Site Selection in PostGIS
Suppose "Jen and Barry" are looking for the best place to open an ice cream business. Your task is to import the project
shapefiles into a PostGIS schema and then write a series of SQL statements that automate the site selection process.
Download the file PGIS_Data2.zip. Here are the selection criteria:
• Greater than 500 farms for milk production
• A labor pool of at least 25,000 individuals between the ages of 18 and 64 years
• A low crime index (less than or equal to 0.02)
• A population of less than 150 individuals per square mile
• Located near a university or college
• At least one recreation area within 10 miles
• Interstate within 20 miles
• You should get 9 candidate sites without using the interstate and recreation area criteria. You should have 4
candidate sites will all the criteria included.
o Create a new schema “prj2”, then import the project shapefiles into it.
28
o Browse through each table in schema prj2 and identity the level of query for each of the
selection criteria.
Query Level Selection Criteria
county-level • Greater than 500 farms for milk production
• A labor pool of at least 25,000 individuals between the
ages of 18 and 64 years
• A population of less than 150 individuals per square mile
city-level • A low crime index (less than or equal to 0.02)
• Located near a university or college
"near an interstate" • Interstate within 20 miles
"near a recreation area" • At least one recreation area within 10 miles
o 1) Implement query at county level.
SELECT name, geom
FROM prj2.counties
WHERE no_farms87 >500 AND age_18_64 > 25000
AND pop_sqmile <150
29
2) Create a view to store the result from the above query.
a. Copy the above query
b. c. Go to pgAdmin window → prj2 schema → right-click on the Views node → select New
View → Set the view's Name to vw_county and Owner to postgres → Click on the Definition tab
→ paste the SQL statement held on the clipboard → Click OK.
o 1) Implement query at city level from the saved view vw_county table and cities table.
SELECT vw_county.name AS county, cities.name AS city,cities.geom AS city_geom
FROM prj2.vw_county CROSS JOIN prj2.cities
WHERE ST_Contains(vw_county.geom,cities.geom)
AND cities.crime_inde <= 0.02 AND cities.university > 0
2) Copy the above query to create a view for the result: vw_city
o Implement query to find cities within 20 miles of interstate roads.
▪ Be sure to re-project the data into a spatial reference that is appropriate for distance
measurement. Use Pennsylvania State Plane North NAD83 (SRID = 2271) for re-projection.
SELECT DISTINCT vw_city.city AS city, vw_city.city_geom AS city_geom
FROM prj2.vw_city CROSS JOIN prj2.interstates
WHERE ST_DWithin(ST_Transform(vw_city.city_geom,2271),
ST_Transform(interstates.geom,2271), 5280*20)
30
▪ Copy the above query to create a view for the result: vw_interstate.
o 1) Implement a query to find cities have at least one recreation area within 10 miles.
SELECT DISTINCT vw_interstate.city AS city, vw_interstate.city_geom AS city_geom
FROM prj2.vw_interstate CROSS JOIN prj2.recareas
WHERE ST_DWithin(ST_Transform(vw_interstate.city_geom,
2271),ST_Transform(recareas.geom,2271), 5280*10)
▪ 2) As shown in the above screenshot, there are 4 cities returned for final result. Copy the
above query to create a view for the result: vw_final.
o Open QGIS→add the final candidate layer and the other original datasets.
31
o In QGIS interface, change the color schema and the label display of each layer to make sure all
the map content are clearly displayed.
o Project→New print composer→specify a name for the map composer as “Jen and Barry's Site
Selection”.
o In the map composer window, layout→add map, then the map shown in QGIS desktop will be
added.
o Insert map frame, legend, title, scale bar, north arrow and adjust the layout. The final map is
shown below:
32
B. Tips for completing Project 4
• Do not use a long query.
• The most logical workflow will be:
1. Write a query that selects the suitable counties based on the county-level criteria and save this as a view.
2. Write a query that selects cities in the suitable counties that also meet the city-level criteria and save this
as a view. This would meet the project requirements.
3. Select from cities in the view saved in #2 that meet the "near an interstate" criterion and save this as a
view.
4. Then select from cities in the view saved in #3 that meet the "near a recreation area" criterion.
• It should not be necessary to create any new tables to hold intermediate results.
• This is a scenario that calls for the use of cross joins.
• If you look at the .prj files for the project shapefiles, you'll see that they are in geographic coordinates, NAD27
(SRID = 4267). Be sure to re-project the data into a spatial reference that is appropriate for distance
measurement. Pennsylvania State Plane North (NAD27 or NAD83), units = feet, would be a logical choice for
that purpose.
C. Deliverables
1. Submit a write-up that summarizes your approach to this project. Most importantly, include the SQL code that
you developed. Don't forget to include the code built into your views, if you create any. Create a map in QGIS of
the candidate cities and all other data sets.
Thanks to PSU online education for the information for the lab.
33