geographic base files: applications in the … insight and judgment would be aided and guided by...
TRANSCRIPT
GEOGRAPHIC BASE FILES: Applications in the Integration and Extraction of Data from Diverse Sources
Patrick E. Mantey, Eric D. Carlson, IBM Research Laboratory, San Jose, Cal i forn ia
Abstract This paper addresses the development of integrated municipal data bases, with
consideration given to po l i t i ca l r ea l i t i e s and to the sources of data now avai lable
in munic ipal i t ies. F i r s t , the potential users and potential uses of a municipal
data base are discussed, and an information system which would serve these users
and uses is considered. Next, the "current" status of data bases in munic ipal i t ies
is reviewed and i t is concluded that there is a large quantity of data avai lable in
many munic ipa l i t ies, but that integrated data bases fer supporting an information
system are not usually a r ea l i t y . The problem of building an integrated data base
from the var ie ty of data sources presented by local agencies is then addressed. A
central ingredient for an integrated data base is the Geographic Base Fi le (GBF)
which permits the construction of extracted data f i l e s from these mult iple sources
to support information system appl icat ions. The concept of extract ion, for the
integrat ion of diverse source f i l e s via geographic references, is developed (and a
prototype implementation is described in the Appendix). Using the GBF, and source
data from various municipal functions, extracted data bases can be rapid ly bu i l t to
serve a var ie ty of applications of an information system in the decision-making
si tuat ions of munic ipal i t ies.
I . APPLICATIONS OF A MUNICIPAL DATA BASE
Municipal governments are, in essence, created to del iver services to a geographical
area. There is an unusual var ie ty , in comparison to pr ivate industry, of services
offered. Local government is often structured (or fractured) along functional l ines
into special d i s t r i c t s , as well as by geography. Such structur ing has precluded
concentration of power, but i t has increased the complexity of planning, resource
a l locat ion, or management.
Many of the problems in municipal government require decisions which are not
routine. Rather, they require the professional insight and judgment of human
decision makers who consider the speci f ic conditions of each problem. Idea l ly , th is
150
insight and judgment would be aided and guided by appropriate information derived
from a comprehensive data base. This is the objective of a municipal information
system: to f ac i l i t a t e effect ive analysis and solution of specific problems by
supporting human decision makers with data resources and analysis functions in
readily usable form. Because a municipality provides services to a geographical
area, much of the data relevant to decision making or problem solving in local
agencies w i l l have geographical at tr ibutes, and can be given spatial interpretation
via maps. A key attr ibute of a municipal information system is the capabi l i ty for
displaying information in the form of maps. Another requirement for such systems to
be effect ive is that they support ready use by decision makers who know very l i t t l e
about computers. The system must help the decision makers develop their precise
objectives or decision c r i te r ia for solving a problem. The solution process in such
decision making requires exploratory analysis, selection of relevant data, and
meaningful data presentation in an interactive environment, A system which provides
such capabi l i t ies, called GADS (Geo-data Analysis and Display System) has been
developed and evaluated in several applications, such as police manpower allocation
and analysis of urban development policies [ I -4 ] . The evaluations of GADS indicate
that interactive analysis and display systems have a great potential in the
operations, management, and planning of municipal i t ies.
As an example, consider a municipality which maintains a computerized property
information f i l e (via the tax assessor function). Such a f i l e would have data on
each parcel, possibly including:
address
owner
zoning
improvements
date constructed
type construction
size (area)
current use
area
centroid
assessed value
I f this data were accessed via an interactive information system, a decision maker
could readily obtain, for example, the address and assessed value of a l l residences
constructed between 1960 and 1962 and having f loor area between 1600 and 1800 square
feet, on lots with 6800 to 7200 square feet. I f the user of the system were a
real-estate appraiser, this information in tabular form might be of value in
determining i f a part icular home is f a i r l y appraised (Figure I ) . A histogram giving
the distr ibut ion of assessed value of these homes would provide additional insight
151
(Figure 2) and a map re la t ing the average value of such homes in the c i t y ' s planning
areas to the ci ty-wide average (Figure 3) would provide the appraiser with
information in a spat ial framework.
I f the user of the information system were an assessor concerned with determining
the neighborhoods which could be considered equivalent for purposes of
computer-aided appraisal, addit ional data would be required. I f recent sales
records are the basis for ca l ibrat ing the assessment model, i t may be found that the
assessor's data alone cannot be used to model var iat ions in se l l ing price of houses
f i t t i n g the description above. Showing on a map the mean and variance of the
se l l ing price of such houses by neighborhood w i l l o f fer the assessor a visual means
for examining the qua l i ty of the assessment model. The display may cause the
assessor to consider other factors to explain the var ia t ions; e.g. crime rate, level
of public f a c i l i t i e s and services (such as the influence of an adjacent regional
park [5 ] ) or the influence of other near-by land uses.
In making decisions related to resident ia l zonings, pert inent questions and
information displays would re late to adjacent land use, the ef fect that the proposed
development would have on the mix of housing stock avai lable in the community, or to
the ef fect these new residents would make on the per-capita park acerage in the
area. The school o f f i c i a l s (and in some areas the local zoning author i ty) need to
evaluate the impact such a development w i l l have on the exist ing school f a c i l i t i e s .
Each of these questions, and many others, are of in terest to d i f fe rent decision
makers involved in the area of community development.
An example of the use of a municipal data base in resource al locat ion could re late
to f inding the best approach to the reduction of burglaries in a resident ia l
neighborhood. One group may advocate better street l igh t ing . I f the queries
re lat ing to these burglaries show they are day-time crimes, another approach
probably offers greater promise. Certainly the level of police patrols would be
examined. I f the area is found to have a high percentage of two wage-earner
fami l ies , and the majori ty of burglaries are during the week, then some suspicion
w i l l f a l l on the chi ldren. I f a large number of burglaries occur on school days, i t
may appear that the school-age children are not the perpetrators. However, i f the
data base contains school information so that i t can be determined that a school
adjacent to the neighborhood has f l ex ib le scheduling and that school children are
free to roam about o f f school grounds, the source of the trouble may have been
iden t i f i ed .
As a las t example, consider the application for a building permit for an automobile
service stat ion. With the recent wave of service stat ion abandonments, many
munic ipal i t ies are reluctant to grant such permits. S imi la r ly , f inancial
ins t i tu t ions are probably wary about loaning money for such ventures. For a l l
152
concerned,
required.
stat ions,
of the automobile ownership and disposable income in the surrounding areas.
In these examples i t has been assumed that the information system has access to
comprehensive municipal data base, However, such data bases are a r a r i t y today.
the next section the current status of municipal data bases is discussed, and
Section I I I an approach toward the provision of integrated data is offered.
and for the public good, a careful analysis of such a proposal is
Factors of in terest would include location and number of exist ing
t r a f f i c access and t r a f f i c patterns at the proposed s i te , and an estimate
a
In
in
I I . CURRENT STATUS OF MUNICIPAL DATA BASES
The importance of a comprehensive integrated data base to support decision-making in
municipal government has been widely recognized. There have been several d i f fe rent
approaches taken to the development of comprehensive municipal data bases.
One approach, which was popular in the 1960's, was the development of a
comprehensive "data bank". This data bank was usually generated as a special
co l lect ion or census~ and often was carried out and funded as part of a
comprehensive transportat ion and land use planning study. As a part of such
studies, detai led f i e l d surveys of land use (at the parcel leve l ) were conducted,
and survey data also was gathered on employment, income d is t r i bu t ion , and commercial
a c t i v i t y (Figure 4). The data acquisi t ion consumed a major portion of the study
resources, and did not provide any information re la t ing to many municipal services
(e.g. public safety) . Although accurate data was often gathered in the development
of these data banks, the i r value was shor t - l ived because they were at best a
snap-shot of the state of a very dynamic system, and no means were provided for
updating and extending these data banks. At about the same times the computer was
becoming a tool in the operations of local governments. The appl icat ion of
computers by munic ipal i t ies beginning in the 1960's can be characterized as a
funct ion-by-funct ion approach, with data processing introduced into tasks involving
high-volume routine transactions. The computer is generally u t i l i zed in those
functions which have previously been computerized in pr ivate industry: payro l l ,
accounting, b i l l i n g , budget status report ing, personnel records, etc. Also, the
1960's saw wide-spread use of computers in the processes associated with elections
and in operations of law enforcement agencies. Usually these applications were
isolated from each other~ and no attempts were made to make th is information
avai lable for use by other municipal functions°
The real property assessment function of local government in the la te 1960's began
to recognize the potential of computer appl icat ions. The court decisions in various
locales requir ing property to be appraised at current market value placed a s ign i f i can t l y increased work load on assessment o f f i c i a l s . In numerous areas, the
assessor has turned to computer-aided appraisal to meet these demands. I f a model
153
is to be bu i l t and calibrated to re l iab ly estimate fair-market value of residential
properties, a comprehensive real property data base is a must. These data bases
were constructed, often by computerizing the data contained on the assessor's f i l e
cards which were maintained on each parcel. Some property data bases, besides the
usual references to assessor map, book and page and to situs address, also added
geographic data such as a "centroid". Computerized real property systems required
the f ie ld acquisition of very comprehensive data. A work sheet for f ie ld surveys by
appraisers in a California county is shown in Figure 5. Note the detailed land and
building attributes used in this system. There was a signif icant increase in the
amount of data used and required by the introduction of this computerized appraisal
system, but much of the data on this sheet would not change from year-to-year, and
the appraiser in the f ie ld would only need to correct those data items which had
changed.
An additional requirement for computer-aided appraisal is data relating to current
sel l ing price of residential properties. This data provides the calibration
information for the regression models, and is available, depending upon the state or
local laws, from the registrar of deeds, from the collector of transfer taxes, or
from t i t l e companies. Assessors in some locales obtain this data by questionnaires
which buyers are required by law to complete and return (Figure 6). These sources
and/or others also can be used to obtain financial data (e.g. mortgage terms) for
the property. Clearly this data base is an integral part of a municipal data base
for applications such as i l lust rated by the examples in Section I.
Another approach related to the development of municipal data bases is characterized
by the USAC projects [6], part icular ly those of Charlotte, North Carolina, and
Wichita Falls, Texas. These c i t ies were funded by the Federal USAC project to
build Integrated Municipal Information Systems, (IMIS). The concepts of IMIS are
[6]:
"(1) (2)
Integrated data processing systems should inter-re late municipal processes.
A fundamental analysis of municipal operations and ident i f icat ion of related
data processing components is a precondition to the effect ive use of computers.
(3) A systems approach is required throughout the development process.
(4) The automation of municipal operations must exploit the fu l l range of computer
technology.
(5) Automation of routine municipal processes is a fundamental condition to the
realization of an IMIS.
(6) An IMIS views the municipality as a basic building block for intergovernmental
information systems.
(7) Municipal information systems are by-products of computer-driven, operations-
based systems.
154
(8) Adequately designed data processing systems can be transferred from one
munic ipal i ty to another.
(9) The integrated approach to municipal systems development must proceed on the
basis of a plan within which incremental i ns ta l l a t i on may be achieved in
accordance with the p r i o r i t i e s and resources of any par t icu lar c i t y . "
The USAC ef fo r ts involve c i t y governments, and were the consequence of studies such
as the IBM/New Haven project [7] and the USC/Burbank project [8 ] . These groups
sought to develop a methodology, via a "systems approach", for the appl icat ion of
computers by munic ipal i t ies [6] but did not resul t in system implementation of
integrated municipal information systems.
The USAC approach wisely focused on operational sources to provide the current data
required for municipal decision-making. In the implementations, which are s t i l l in
progress, the c i t i es have concentrated on building up operational uses of computers,
and on implementing these applications on a central computer under an integrated
data-base management system. The value of computers to these operational functions
has been confirmed [9] but the applications of IMIS in the areas of management
decision making are s t i l l to be demonstrated. One of the d i f f i c u l t i e s that must be
overcome in providing a comprehensive municipal data base, (even in ci~ies with a
f u l l y integrated and operational IMIS constructed according to the USAC philosophy),
is that complete in tegrat ion, where a l l municipal functions use the same computer
and data base management system, is not a l i k e l y prospect with current local
governmental structure and with the l imi ted resources of local governments. For
example the data pert inent to decision making in a c i t y may be gathered by another
agency, such as the tax assessor (and conversely), and may reside on d i f fe ren t
computers, under d i f fe ren t data management schemes and in d i f fe ren t f i l e formats.
In addit ion, problems of data secur i ty , compat ib i l i ty of f i l e s , and high processing
costs may make complete integrat ion unrea l is t ic for many munic ipal i t ies.
Special data co l lect ions, such as the U.S. Census, and data avai lable from state
sources, must also be readi ly incorporated into a municipal data base. With census
data gathered according to blocks, block groups and census t racts , with assessors
property data coded according to assessor map, book and page, with public works data
in state-plane coordinates, and school data gathered by school attendance area, the
building and maintenance of a t r u l y integrated municipal data base presents a
formidable task.
I I I . APPROACHES PERMITTING DEVELOPMENT OF INTEGRATED FILES
A completely integrated data base would have a l l data re la t ing to any functions of a
munic ipal i ty residing on the same computer system, under the same data management
system and organized and indexed to f a c i l i t a t e corre lat ion. This ideal is not
155
attainable, given present organizational structures and computing capabil i t ies in
most municipalit ies. However, i f the computerized f i l es of various municipal
functions are "properly structured", i t w i l l be possible to achieve the same
benefits as i f there existed a completely integrated municipal data base. In
addition, such an approach wi l l not require re-implementation of current
applications, but rather leaves the application data base in the control of the
function responsible for i ts primary maintenance and use.
The approach taken is to make use of data, when data f i l es are "properly
structured", to develop ef fect ive ly the results as i f there existed a completely
integrated data base without requiring that complete integration take place. This
should not be construed as an argument against integration. I f integration is
p o l i t i c a l l y and technically possible, provides required data security, and is
economically at t ract ive, i t should be implemented. Even with an integrated data
base, there w i l l always be decisions which require di f ferent groupings of data than
those supported by the integrated data base. (There w i l l also remain, in practice,
data sources which cannot be integrated.) So, the problem of providing a
comprehensive data base from multiple data sources is unavoidable and is not
completely solved by an "integrated" data base.
The "proper structuring" required to make data integration possible can be
i l lus t ra ted by example. I f one is interested in information about burglaries, and
wishes to relate this information to neighborhood conditions, data sources could
include police dispatch f i l es , criminal justice arrest f i l e s , census data and tax
assessor f i l es . Suppose the police dispatch data is used for burglary incidence,
and that such data is available in terms of police beats, e.g. the number of
burglaries in each beat for each day. I f one wishes to use census data for
socio-economic information, and i f the census tracts and beats have few common
boundaries, no small area information is obtainable relating these data sources.
Al ternat ively, i f the police dispatch data is captured by the street address of the
ca l l , and i f a directory exists for the c i ty which w i l l permit ident i f icat ion of the
census tract for each street address, then burglaries and socio-economic data can be
related at the census tract level.
"Proper structuring" of data f i les only has meaning with respect to potential uses
of the data ( i .e . data f i l es are not an end in themselves). I f the objective is to
offer data to support decision making in a wide range of problem areas, then the
data f i l es must be as detailed as possible, within the constraints of economics,
privacy and security. The detailed data can then make possible the development of
the widest variety of data subsets and aggregations, and is more l i ke l y to permit
development of the required set of integrated data for a part icular decision-making
context. An additional requirement is the existence of data elements in each f i l e
which w i l l f ac i l i t a t e relating the data to that from dif ferent f i l es . (In this
156
paper, geographical references w i l l be singled out as data elements serving th is
function in municipal f i l e s . Common references to account numbers, project numbers
or personnel i den t i f i e r s are other examples of data elements permitt ing the
re la t ing of data from d i f fe ren t source f i l e s . ) A set of f i l e s w i l l be called
"properly structured" i f they contain information permitt ing the re la t ing of data
from d i f fe ren t source f i l e s so that integrated subsets of data at the appropriate
level of detai l can be developed to support the requirements of problem solvers.
Because municipal government is a service del ivery function, mutual references to
geography can often be used to re late data from the diverse f i l e s avai lable in
munic ipa l i t ies. A powerful f i l e in f a c i l i t a t i n g the re la t ing of data, based on
these common geographical references, is a Geographic Base Fi le (GBF).
Functional ly, the GBF contains data to support the re la t ing of data from other f i l e s
to geographical location and also the display of th is data on a map. The creation
of a GBF for a munic ipal i ty is a key requirement in the development of a municipal
data base from source f i l e s . Several d i f fe ren t approaches have been taken.
The simplest GBF is a f i l e sometimes cal led a Property Location Index (PLI) which
contains a l i s t of the val id addresses in the munic ipal i ty and an x,y coordinate for
each. This approach is the one used in Lane County, Oregon, and by the Assessor in
Santa Clara County , Cal i forn ia. To make th is more useful, a l i s t of public place
and st reet intersect ions and the i r x,y coordinates is appended. With such a GBF i t
is then possible to automatical ly convert addresses (in the police ca l l f i l e for
example) to x ,y coordinates. I f the GBF also contains the police beat, census
t rac t , and munic ipal i ty for each address, then i t is very simple computationally to
count the number of ca l ls in each beat. Evaluation of ca l ls by census t ract would
also permit consideration of socio-economic data with the crime data (of course,
police o f f i cers could also encode ca l ls by beat and census t rac t , but th is approach
is l i ab le to s ign i f i can t errors and seems to be a poor use of police manpower).
The most detai led GBF's contain d ig i t i zed land parcel boundaries, easement
locat ions, bui lding out l ines, u t i l i t y placements, and even topographic information,
along with s t reet address information on a l l parcels and names of a l l public lands
and bui ldings. This GBF is at the level of deta i l of surveyor's data, and is
suitable for engineering applications and detai led map bui lding. Ottawa, Canada's
National Capital Commission [ I I ] has pioneered in the development of th is kind of
GBF.
The most common GBF at the present time is the resu l t of work by the U.S. Census
Bureau in conjunction with the 1970 Census. Using the Metropolitan Map Series, a
massive feature label ing and d ig i t i za t i on was performed for 200 major metropolitan
areas in the United States. The resul t ing computerized maps were called the DIME
(Dual Independent Map Encoding) f i l e s [12]. Each entry (record) in the DIME f i l e
157
represents a l ine segment (a portion of a street segment, ra i l road, creek, c i t y
l i m i t , e tc . ) . Figure 7 shows a sample record and the map data from which the record
is derived. The segment has a "From" node and a "To" node, as well as a Left and
Right side. Thus, each entry describes i t s two ends and i t s two sides. The
description of the nodes includes x,y coordinates, produced by the Census Bureau's
map d ig i t i za t i on . The nodes are label led as f a l l i ng on a par t icu lar map of the
series, and given a sequential number within map and census t rac t . The other data
on each entry is feature ident i f#cat ion for the segment and i t s sides. Each feature
is ident i f ied by a pre f ix , name, su f f i x , type (e.g. , North Army Southwest Street) ;
only name and type are required (e.g., Coyote Creek), The description of a side
actua l ly describes the adjacent land. The census t rac t number, the block number,
and the place (c i ty ) code are included for each side. High and low address ranges
for the st reet segment sides are also given. The records are ordered by feature
name and low address (features without addresses
Administrative overlays (e.g. beats, census t racts)
of segments of this f i l e . Used in combination with
these computerized overlays f a c i l i t a t e development
any specif ied overlay map.
have no secondary ordering).
can be readi ly defined in terms
"point- in polygon rout ines",
of counts of events in areas of
As in the development of any large machine readable f i l e , high startup costs, data
errors, and poor standardization have hindered development of GBF's. But the key
problem in the development and use of a GBF is edi t ing (corrections and addi t ions) .*
Because of the startup cost, accuracy, and standardization problems, edit ing is a
key aspect of development. I t is pa r t i cu la r l y important to ve r i f y the topological
and coordinate accuracy of the f i l e . Even i f there were no developmental problems,
"geographic" changes, such as new streets or changing area boundaries, make f i l e
edi t ing essential to a useful GBF.
The Census Bureau and related ef for ts have produced programs for o f f - l i ne creation
and batch edi t ing of a DIME f i l e [12-13]. These programs require a d i g i t i ze r for
data entry and take large amounts of computer and c ler ica l time for edi t ing.
Although the procedures were used to create 200 GBF's, there has been l i t t l e
ed i t ing, and hence l i t t l e use, of these f i l e s , Some c i t i es (e.g., [14]) have
developed the i r own GBF's s imi lar to DIME. These ef for ts are also characterized by
the use of a d i g i t i z e r and batch computer programs for f i l e creat ion, and by
cumbersome f i l e edit ing procedures. There have been a few ef for ts to develop
on-l ine d ig i t i za t i on systems, (e.g., [15], and there are experimental systems which
could support on-l ine d ig i t i za t i on with visual feedback [16-17]. Yet, none of these
systems provide a l l of the capabi l i t ies required for e f fect ive GBF creation and
edi t ing.
*The Census Bureau uses the word "edi t ing" to mean topological ve r i f i ca t i on , and uses "update" for what we define as edi t ing in th is paper.
158
Conclusions drawn from the IBM study [ I0 ] regarding the requirements for in terac t ive
GBF edi t ing and maintenance were:
I . There must be a capabi l i ty for projecting hard copy maps and/or photographs onto
the display screen. I t must be possible to select a rb i t ra ry (contiguous)
sections of the maps, and to produce a range of scales.
2. The display system must be able to handle mul t ip le , non-rectangular geographic
coordinate systems.
3. The display system must be able to produce both text and l ines, with at least
three colors for l ines (in order to be able to dist inguish two maps).
4. The display system must enable selection of any addressable point on the screen,
whether or not anything is displayed at that point.
5. The creation and edi t ing functions must include: d ig i t i za t i on of base and
overlay maps; label ing of points, l ines, and polygons in the maps; moving and
delet ing points and l ines; display of any section of the maps, and of speci f ic
points, l ines and polygons; and checking for topological accuracy.
IV. DATA EXTRACTION
A. Philosophy and Operation
In the previous section, the combination of a GBF and properly structured source
f i l e s containing geographic references were iden t i f i ed as the basis for of fer ing a
problem solver an e f f ec t i ve l y integrated data base to support decision making.
Recent studies of in teract ive information systems applications in the solution of
unstructured problems [4,18,19] have ident i f ied the need for reduced subsets of data
for supporting the problem solving. Data reduction is required because:
a. the po ten t ia l l y useful data base w i l l be much larger than the data actual ly
used, b. the user w i l l want access to varying levels of deta i l in the data base,
c. the relevant subset of data w i l l vary during the problem-solving process,
d. some data (e.g, census and event data such as police ca l ls ) may not be
compatible at the detai l level of the data captured in the source f i l e s .
Extraction is a process by which an integrated subset of data is developed from the
source f i l e s relevant to a par t icu lar problem-solving appl icat ion. Extraction thus
provides the user with a capabi l i ty e f f ec t i ve l y indist inguishable from a f u l l y
integrated data base, without requir ing the development of such an integrated data
base at the deta i l level of the source f i l e s , i . e . i t provides a " v i r t u a l l y "
integrated data base.
The extract ion approach builds a data base subset from the source f i l e s according to
a pr io r i speci f icat ions for a par t icu lar appl icat ion. Total integrat ion of the
source f i l e s , and dynamic aggregation and subsetting of the data at the time the
159
data
a t t rac t ive in today's environment because:
items are required is of course an a l ternat ive approach. This approach is not
a. for any appl icat ion a l l the relevant source f i l e s would have to be on- l ine to
support conversational interaction,
b. protection of the source f i les would be more d i f f i cu l t ,
c. development of conversational information systems would require additional
standardized data structures and codes for the dynamic aggregation and
subsetting,
d. better conversational performance is possible when the problem solving accesses
a smaller data base.
Clearly the development of a fully-integrated, on-line data base from the source
f i les , solely for problem-solving applications, is not (currently) economical. Such
an approach would also require special procedures for keeping the duplicate records
current and consistent. With the extraction approach, the subset of data thought to
be relevant to the particular problem is developed and made accessable to the
problem-solving system in an extracted data base. The subset is an extract from the
available source f i les at the level of detail desired by the decision maker for
(that phase of) problem solving. This extracted data base may be thought of as a
set of tables. Each table contains values for a set of variables extracted from the
source f i les . For each variable there is one value in the table for each basic unit
(e.g. zone, account, employee) used for the problem solving. New variables can be
added direct ly to the extracted data base as an added column of the tables. An
example of an extracted data base is shown in Figure 8, for use in crime analysis.
The extracted data tables are formed from: source f i les containing lO years' data
on crimes, l a n d use, and population; a special purpose map of police
beat-building-blocks (basic zones); and an extraction specification for computing 20
crime categories and selecting population and number of houses by year. The result
is lO tables (one for each year) giving crime by category, population, and number of
houses for each basic zone.
The extraction approach leaves control of the operational source f i les in the hands
of the originating application. The extracted data bases are "snapshots" which are
current at the time of their development. The problem solver can re-invoke the
extraction process at any time to get a more current extracted data base. This
process decouples the data base used in problem solving from the operational f i l es ,
and assures the problem solver that the data base upon which he makes decisions is
under his control. This user control of the extracted data base, and the potential
performance advantages offered by access to the smaller extracted data set as
compared to access to the total set of data, make the extraction approach attractive
even in installations where an integrated data base exists (as with a complete IMIS
as in the USAC approach described in Section I I ) . Extraction is simplified with the
160
existence of an integrated data base, because there are then no d i f f i c u l t i e s with
f i l e formats and data conversion.
B. Extraction System Architecture
The architecture of a municipal information system designed using the data
extract ion philosophy would have three major sets of programs and data bases (e.g.
Figure 9). The f i r s t set would be the source data f i l e s and related programs for
data entry, update, and other routine processing. These f i l e s should be "properly
structured" as defined ea r l i e r in th is paper. The data base management for these
f i l e s may be an integrated system, such as IBM's IMS, or a more t rad i t iona l system
such as those provided by IBM's DOS. The second component includes accurate
reference f i l e s ( indices), such as the Geographic Base F i le , Programs for
maintaining these f i l e s , and programs for providing the data extract ion functions of
data matching, subsetting, and aggregation. This component is the key to
integrat ing the data base of source f i l e s . The GADS experience indicates that i t is
possible to develop general purpose programs for the data extract ion functions.
Essent ia l ly , these functions provide integrat ion through user-invoked processing,
rather than through the complicated data structures and accompanying processing
overhead often found in integrated data base systems. The data extract ion programs
are the interface between the municipal data base and the th i rd component of the
archi tecture, the extracted data bases and associated decision support system. The
GADS analysis and display functions are an example of a dec%ion support system for
non-programmer users. A data extract ion interface can provide mult ip le extracted
data bases for a single decision support system, or for mult ip le decision support
systems. For example there might be decision support systems for cash management,
budget preparation, urban planning, computer-assisted appraisal, crime analysis,
e tc . , a l l supported by a common extract ion interface. The data management
techniques for the extracted data bases should be ta i lo red for each decision support
system. However, the data access techniques may be the same as those provided for
the source data f i l e s .
The deta i ls of the data extract ion architecture and the implementation requirements
are beyond the scope of th is paper and there w i l l be i ns ta l l a t i on -spec i f i c comments.
(An extract ion implementation is b r i e f l y described in the Appendix). There i s ,
however, one general requirement for any data extract ion system. This requirement
pertains to the data aggregation functions of extract ion and can be described by
considering examples of the data sources encountered in municipal governments and
the kinds of extracted data to be developed from these sources. Consideration here
is l imi ted to data which can be related to points or areas. Data related to
networks, budget items, part numbers, e tc . , should be handled in an analogous
fashion.
t61
I , Compatible data
This is the easiest, and for tunate ly the most frequent s i tua t ion , i f data are
captured as specif ied in Section I I I . The data in source f i l e s which can be
ident i f ied with geographic points (x,y) can be d i rec t l y related. I f the extracted
data base is to be relevant to a study of slum dwell ings, fo r example, and i f health
cases, f i r e alarms and building code v io lat ions are a l l data sources which are
avai lable at the event leve l , ( i . e . by address) then an extracted data table showing
incidence of each of these events for specif ied address can be d i rec t l y developed,
Another frequently used extracted data base is the tabulation of such event data by
geographical area, in terms of a specif ied map. (The extracted data base in Figure
8 is an example of th i s ) . Extracted data bases in such cases are obtained by
matching coordinates of events to the corresponding map areas (via point-in-polygon
processing of the event coordinates against the map boundaries spec i f icat ion) .
Figure lO i l l u s t ra tes the extract ion and aggregation to relate property (assessment)
data and census data to support inqui r ies at compatible levels (e.g,) blocks or
block groups), and fur ther aggregation to support t rans i t planning models.
2. Non-compatible area data
I f data is avai lable by areas in the source f i l e s , and these areas are not
compatible ( i . e . one map is not a subset of the other), then the extract ion process
is more complicated. For a chosen set of variables from the source f i l e s , there is
a minimum level of aggregation at which an extracted data base is possible. For
example, school attendance areas and police beats (and therefore the associated
data) may only be compatible at the census t ract leve l , i , e . they may both be
(d i f fe ren t ) f iner par t i t ions of census t racts . The extract ion process should a le r t
the user to the non-compatibi l i ty and display for the user the minimum level of
aggregation necessary for compat ib i l i ty of the data sources of in teres t , in the form
of a map, and permit the user to specify fur ther aggregation from this map as
desired.
I f the user desires an extracted data base at a deta i l level f i ne r than is
compatible with the data sources given, the user must supply addit ional information.
For example, suppose the user is studying property values vs age d is t r ibu t ion of
inhabitants, with the age data on c i t izens avai lable from the census only at census
t ract levels of aggregation. Compatability exists at the census t rac t level . Any
f iner detai led extracted data base, at the c i t y block level for example, could only
be developed i f the user is w i l l i ng to make assumptions (such as homogenity of the
d is t r ibu t ion of population ages in the census t rac t ) .
V. SUMMARY AND CONCLUSIONS
The development of information for decision-making in munic ipal i t ies requires
integrat ion of data from the various operational f i l e s which are generated in local
162
government. Even when an integrated municipal data base does not ex is t , i t is
possible to develop integrated data from properly structured source f i l e s in
conjunction with a well-maintained Geographic Base F i le . The current sources of
information developed in munic ipa l i t ies , in par t icu lar the property data of the tax
assessor function and the operating f i l e s of various service del ivery functions,
provide a r ich source of information, augmented by special col lect ions such as the
U.S. Census.
Data Extraction is the process of developing integrated data subsets from diverse
source f i l e s to support in teract ive problem solving. Extraction provides the
interface to large data bases of source f i l e s and provides data descr ipt ion,
subsetting and aggregation functions. Our experience with GADS has shown that data
extract ion is useful when the user or problem character is t ics require access to
varying amounts, de ta i l , and selection of data, and conversational (rapid response)
interact ion with a decision support system. These character ist ics are l i k e l y to be
encountered when designing decision support systems for nonprogrammer, professional
users working on unstructured problems. The data extract ion interface matches the
functional and response time requirements of in teract ive decision support, can be
implemented on a var ie ty of computer system configurat ions, and can reduce the
operating costs of the decision support system.
Because data extract ion operations can produce mult ip le extracted data bases, with
d i f fe ren t structures~ a single data extract ion interface can support mul t ip le
decision support systems. In addit ion, ex ist ing decision support systems can be
supported and enhanced by data extract ion without major program revis ions.
163
APPENDIX: An Extraction System Implementation
A project in the IBM Research Division has developed an in teract ive Geo-data
Analysis and Display System (GADS as a vehicle for studying in teract ive problem
solving [ I - 4 ] . GADS supports nonprogrammer users solving unstructured problems
where the relevant data can be related to a geographic location. Examples of the
problems for which GADS has been used include: land use planning, police manpower
a l locat ion, school d i s t r i c t i n g , and commercial s i te locat ion. The need for data
extract ion was recognized during the f i r s t studies of the use of GADS. In
par t i cu la r , the need for data aggregated to a var ie ty of geographic levels (e.g. ,
block, police beat, census t rac t , neighborhood), and changing data needs expressed
by users indicated the inadequacies of the s ta t i c , special purpose data base and the
one-level, integrated data base approaches.
GADS data extract ion is configured essent ia l ly as shown in Figure 9. The extract ion
implemented in GADS is l imi ted to compatible event data. There is a requirement
that each record of each f i l e in the large data base contain a geographic code (such
as an address, x,y coordinates or block number) so that extracted data can be
related to points, l ines, and polygons on a map. A u t i l i t y program is provided to
transform geographic codes into x,y coordinates i f necessary for data extract ion or
display. A data base developed by extract ion is a table; an example is shown on
Figure 8. Adding another crime type, acres of commercial land use, or
re-aggregating by census t racts would take only a few minutes.
In the GADS implementation the large data base management system is a special
purpose one designed to handle f ixed format f i l e s with no hierarchies or repeating
groups. Simultaneous access to mult ip le f i l e s , and shared access to single f i l e s
are not supported. Mult iple f i l e extractions are handled by consecutive extractions
from the individual f i l e s . Sequential and d i rect access I /0 are provided.
Character, f ixed binary, packed decimal, and f loa t ing point (binary) data
representations can be used. The ent i re large data base is stored on disk, and
there is a u t i l i t y for loading f i l e s from tapes.
Figure I I gives examples of the data description and subsetting capab i l i t i es . The
data description implementation allows d i f fe rent formats to be used for the same
f i l e or the same formats to be used for d i f fe ren t f i l e s (Figure l l a ) . Seven data
types are allowed. The subsetting language includes constructs for : subsetting
based on any ari thmetic or logical combination of the items in a f i l e , creating of
new items, conditional subsetting or creation (IF, THEN, ELSE), and function ca l ls
(Figure l l b . ) . Results from subsetting can be displayed as l i s t s (Figure l l c ) or as
locations on a map (Figure l l d ) . Using the display capab i l i t i es , two dimensional
subsetting is possible. That is , the user can draw a polygon on the screen, and
select only those elements of a f i l e whose location is within that polygon. This
f a c i l i t y is much more user-oriented than algebraic specif icat ions for subsetting,
164
and other graphic subsetting operators would be useful (e.g., display a l l the crimes
of the same type as the one being pointed a t ) .
The aggregation operations in the implementation are rest r ic ted to forming the
extracted data base for the GADS analysis and display functions. This data base is
aggregated by areas of a map. I t is stored on disk, and is accessed by column name.
GADS is implemented in FORTRAN, but the data extract ion components were implemented
in PL/I because of i t s larger set of data types, and better functional s u i t a b i l i t y
for the extract ion tasks. The combination system runs on the IBM S/360 or S/370
series under the Time Sharing Option (TSO). The combination requires 220K bytes of
main storage. Separating extract ion reduces the main storage requirement to about
12OK. The user terminals may be IBM 2250s or storage tube display terminals. The
I/O time from the large data base, and the data rate to the terminal are the
l im i t i ng facotrs in extract ion response times ( i .e . selection and aggregation times
are negl ig ib le compared to I/O t imes). Although f i ve minutes may be required to
l i s t or display an ent i re f i l e , the user can see the results unfolding (e.g. the
selected items are l i s ted as they are selected). Thus users seem w i l l i ng to wait
during extract ion. Af ter a l l , the batch mode equivalent capab i l i t ies have response
times of days, and manual methods have response times of weeks or months.
165
REFERENCES
[ I ] P.E. Mantey, J. L. Bennett, E. D. Carlson, Information for Problem
Solving: The Development of an Interactive Geographic Information System.
IEEE Int. Conf. on Communication, Vol. I I . Seattle, Wash. June 1973.
[ 2] E. Jo Crist iani, R. J. Evey, R. E. Goldman, P. E. Mantey. "An Interactive
System for Aiding Evaluation of Local Government Policies," IEEE Transactions
on Systems, Man & Cybernetics, Vol. SMC-3, No. 2, March 1973, pp. 141-146.
[ 3] E.D. Carlson, J. L. Bennett, G. M. Giddings and P. E. Mantey. "The Design and Evaluation of an Interactive Geo-data Analysis and Display System,"
Proceedings of the IFIP Congress 74, International Federation for Informa-
tion Processing, Stockholm, August 1974. North Holland Publishing Company, Amsterdam, 1974.
[ 4] E.D. Carlson and J. A. Sutton, A Case Study of Non-programmer Interactive
Problem-Solving, IBM Research Report, RJ 1382, IBM Research Laboratory, San Jose, California, April 1974.
[ 5] T.R. Hammer, R. E. Coughlin, E. T. Horn IV, "The Effect of a Large Urban
Park on Real Estate Values," Journal of the American Insti tute of Planners, Vol. 40, No. 4, July 1974, pp. 274-277.
[ 6] "City Hall's Approaching Revolution in Service Delivery," Nation's Cities, January 1972.
[ 7] Concepts of an Urban Management Information System," a Report to the City
of New Haven, Connecticut, by Advanced Systems Development Division, IBM Corporation, Yorktown, January 1967.
[ 8] A Municipal Information and Decision System. University of Southern Cali- fornia, School of Public Administration, 1968.
[ 9] R.L. Stickrod and L. C. Martin. Data Processing: Analysis of Costs,
Benefits, and Resource Allocations. Lane County, Oregon, Management Report, February, 1973.
[ I0] G.M. Giddings and E. D. Carlson, An Interactive System for Creating,
Editing and Displaying a Geographic Base File. IBM Research Report, IBM Research Laboratory, San Jose, California, 1973.
[ I I ] D.C. Symons, A Parcel Geocoding System for Urban and Rural Information, Ottawa, Ontario, National Capital Commission, 1970.
[12] U.S. Bureau of the Census, Census Use Study, The DIME Geocoding System Report No. 4, Washington D.C., 1970.
166
[13]
[14]
[15]
[16]
[17]
[18]
[19]
U. S. Bureau of the Census: Census Use Study, The DIME Edjt in ~ S~stem
Washington D.C.~ 1970.
R. Ju l i , Geo-modeling: A Local Approach, Eugene, Oregon, Lane Council of
Governments, 1972.
R. D. Hogan, Remote #raphic Terminal and Urban Geographic Information
System Demonstation, Gaithersburg, Maryland, IBM Federal Systems Center,
1968.
R. D. Merr i l l , "Representation of Contours and Regions for Ef f ic ient
Computer Search," Communications of the ACM, Vol. 16, No. 2, February 1973,
pp. 69-82.
B. V. Saderholm, "Paper 'Keyboard' Runs Experimental IBM System," IBM
Research Division Press Release, Yorktown Heights, N. Y., March 8, 1973.
R, M. Cyert, H. A. Simon, and D. B. Throw. Observation of a Business
Decision. Journal of Business, 29, (1956), 237-248.
D. M. S. Peace and R. S. Easterby. The Evaluation of User Interaction with Computer-based Management Information Systems. Human Factors, 15, April
1973, pp. 163-177.
__~
AR.CF I
__A r)F)RE 9, °,
NUMBE R.
YEAR
__
~
CONST
SIZ
E
FLO
OR
.~_.S ~. P n
AR
EA
VA
LU
E
123-06-174
I00 ABBYWOOD
CT.
196.2
7145
12~--06--175
!02
A BBvwooD
CT:
!9~2
6~65
123-06-176
104 ABBYWOOD
CT.
1962
6500
123-06-!./-7
106 ASBV~Q~D
CT
: 1961
7!25
123--06-150
112 AB!NATE
LN
1960
7200
[P~-OA-! ~I
!!4 AS!~!^TE
t~-
!e60
71R~
123-06-1152
116
AS]NATE
LN.
o 0
I. 6~
6995
__123--06-!53
117
A.S
!NA
T~
iN:
--!q60
6q55
!850 -
31000
123-06-154
120 ABINATE
LN
. 1860
7095
1900
32000
!_~--06--!58---]q?
AB!NATE
L,~! ....
!<)6!
7!55
!750
-29000
123-06-160
III ~BINATE
[N.
1960
7005
1975
34000
~3-06
-]A1
11
3
~8
I~:A
TE
iN
, !qAO
6@95
!650
27
00
0
123-06-162
115 ABINATE
LN.
1960
7100
1750
30000
]P~--06--!63
!IY
~N
^,T
E
t~:
!060
7055
1780
30000
123-06-184
31
AFTON
CT.
1960
7020
1650
29000
1?~-oA-I~
59 ~TON
CT-
!960
7!65
1600
_27n(~0
123--06--186
35
A~Tf]N CT.
1961
7190
1775
29000
1675
2 8000
1650
29000
1600
27000
1650
25~
1800
35000
I ~
Nr~
~7
NN
N
1700
30000
Figu
re 1
. Ta
bula
r di
spla
y of
ass
essm
ent
data
OU
PS 1
00
. O.
~UIO
SC
~L(
R[D
R~M
RE
TUR
N
i .
i'.z
'"
if!: 11
1~11
•
Zll
iJl.
UA
RN
AM
[ U
NIT
U
S[M
AP
Z
ON
E
~P
[ S
;N
-1
g
OG
R
19
Z
OH
ES
YM
D
co
Fig
ure
2.
His
togr
am d
ispl
ay o
f ho
usin
g va
lues
ZO
NE
SY
MrB
l
SU
BM
AP
OU
[RL
AY
M
AP
6
ZO
NE
-UA
LU
ES
S
TA
TE
ME
NT
S
SS
MO
DE
EX
PA
ND
S
HR
INK
N
OR
MA
L
CL
EA
R
RE
TU
RN
ZON
£S
? FIND
ZO
NE
cO
Fig
ure
3a.
Map
dis
play
of
rela
tive
hous
ing
valu
es
MA
PS
+ ._
*÷.,
+
"+~'
F ~
l_-"
~_
"~
I
* ÷+.
~*+... ÷
*
%
- _
ZO
NE
-UA
LU
ES
S
TA
TE
ME
NT
S
~O
NE
SY
MB
I
SU
BM
AP
E
XP
AN
D
SH
RIN
K
NO
RM
AL
OV
ER
L#
Y
M#P
e
g
SSM
OD
E
CL
EA
R
RE
TU
RN
ZO
NE
1 t ? F
IND
Z
ON
E
C)
Figu
re 3
b.
Sim
plifi
ed d
ispl
ay o
f ho
usin
g va
lues
171
AVE $ IN
ACCESS $
ACCESS
AVLAND-S AVLAND-M AVLAND-C AVLAND-I
GROWl NXS GROH NXM
ODWU/A-S ODWU/A-M ODNU/A-T
EMP-MFG EHP-WHOL EMP-COMH EMP-TC&U EMP-GOVT EHP-TOTL
EHPDEN-C
HHOLD0-6 HHLD6-10 HHLD1015 HHLDZ5+
ISHOPCTR IBAY
OCDWUN-S OCDWUN-M OCDHUN-T
OCLAND-S OCLAND-M OCLAND-C OCLAND-I
PRODEN-S PRODEN-M
POPUL-S POPUL-M POPUL-T
POP/HH-S POP/HH-M POP/HH-T
RES-LAND
SLOPE
TSKTR
DISP SIN
S EI'IER FLOOD
AVERAGE IHCOtIE PER HOUSEHOLD
ACCESSABILITY TO DISPOSABLE INCOME
ACCESSABILtTY TO EHPLOYMENT
ACRES AVAILABLE FOR SINGLE FAMILY DEVELOPMENT ACRES AVAILABLE FOR MULTIPLE FAMILY DEVELOPMENT ACRES AVAILABLE FOR COMMERCIAL DEVELOPMENT ACRES AVAILABLE FOR I~DUSTRIAL DEVELOPMENT
GROWTH FACTOR IH SINGLE FAMILY GROWTH FACTOR IH MULTIPLE FAMILY
NO SINGLE FAMILY DWELLIt!G UNITS PER RESIDENTIAL NO MULTIPLE FAMILY DWELLING UNITS PER RESIDENTI TOTAL NO DUELLING UNITS PER ACRE OF RESIDENTIAL
NO. OF Et, IPLOYEES HORKING IN MANUFACTURING NO. OF EHPLOYEES WORKING IN WHOLESALE AND TRUCK NO. OF EHPLOYEES WORKING IN COMMERCIAL(RETAIL) NO. OF EMPLOYEES WORKING IN TRANS, COMMUN,AND U NO. OF EMPLOYEES WORKING IN GOVERNMENT TOTAL NO OF EtIPLOYEES
NO. COHHERCIAL EMPLOYEES PER ACRE OF COMMERCIAL
NO. OF HOUSEHOLDS WfTH IHCOME: 0- 6000 NO. OF HOUSEHOLDS HITH INCOME: 6000 -10000 NO. OF HOUSEHOLDS ~qlTH INCOME: lO000-15000 NO. OF HOUSEHOLDS I'IITH INCOME:IS000+
ZONES OF ANTICIPATED SHOPPING CENTERS ZONES TREATED AS BAYLANDS
NO. OF EXISTING SINGLE FAMILY DWELLING UNITS NO. OF MULTIPLE FAMILY DWELLING UNITS TOTAL NO. OF SINGLE AND MULTIPLE FAMILY DWELLIN
ACRES OCCUPIED BY SINGLE FAMILY DWELLINGS ACRES OCCUPIED BY MULTIPLE FAMILY DWELLINGS ACRES OCCUPIED BY COMMERCIAL DEVELOPMENT ACRES OCCUPIED BY INDUSTRIAL DEVELOPMENT
PROJECTED DEI;SITY FOR SINGLE FAMILY DEVELOPMENT PROJECTED DEHSITY FOR MULTIPLE-FAMILY DEVELOPME
TOTAL POPULATION IN SINGLE FAMILY D#IELLINGS TOTAL POPULATION IN MUt_TIPLE FAHtLY DWELLINGS TOTAL POPULATION
POPULATION PER HOUSEHOLD FOR SINGLE FAMILY DWEL POPULATION PER HOUSEHOLD FOR MULTIPLE FAMILY DW POPULATION PER HOUSEHOLD FOR ALL DWELLINGS
RESERVED LAND-NOT AVAILABLE FOR CURRENT DEVELOP
PERCENT OF LAND t i lTH LESS TH#N 10~ SLOPE
CHANGE FACTOR IH INTRAZONAL TRAVEL TIME
DISPOSABLE I I;COHE PER ItOUSEHOLD
SEWER SERVICE DISTRICT FLOOD CONTROL AREA
Figure 4. Land use/transportation planning data base [2]
App
'l. D
ate
App
rais
er N
o.
Site
Use
Cod
e
Zone
Tota
l Pr
op.
Val
.
Land
Val
ue
AB04
t
~ ~
l i
I
Imp. Va,
ue
AB06
1 i
i i
i i
New
Lot
Val
ue
AE02
[
I I
I I
I ~
: =
:I_
lee'
d, D
ate
A~_
_~T
I
l ~
I I
F4-1
- 4
Sal
es P
rice
Con
firm
ed
N 1 y
2
TEM
POR
ARY
VA
LUE
Bo~rd
Ac,~o
. ~
I ~
;;
Oth
er
I AF
03 |
N
~-
IMM
EDIA
TE A
REA
Mkt
, D
eman
d J
AGO
1 I
P i
A 2
G 3
Trans
.Tren
d ~
._ D
~ S
2 G
~
Res
. A
rea
~--
5[
"-'~
Y-
2
Sin
gle
Faro
. I
AG06
]
N'
y2
TOTA
L PR
OPE
RTY
Arc
h. A
ttr.
AH01
P
~ A
~ G
3
Beac
h F
Prop
. hn
prov
ed
AH0,
21
N'
Y;~
--Prop
. Lot U
tit,
l AH0
3 l Nr
'y~
Con
dom
iniu
m
N ~
y2
AO
~2E
~ V
entu
rm O
oun1
¥ A
ilII
I~¢
RE
SID
EN
TIA
L FA
CT
SH
EE
T A
PN
(A
A01
)
SHEE
T ._
__
_
OF ___
DIS
[RIC
T (A
C16
)_
NEI
GH
BOR
HO
OD
(A
C17
)
TRAC
T (A
C0
6)_
_
LOT
----
E
OC
K_
_
SITU
S (A
A10
) --
REC
OR
D D
ATA
LA
ND
ATTRILBUTES
__
~L
BUIL
DIN
G--D
ATA
~.---~
t.TAc
tua,}
_~
~
IREL
. I~
j__
N
• ~c
o,,~
so
Ft ,
0saa
.,e)
A,
~~
I~
H__
A~,,
i ~"
T~
0ie.,
A,0.
L "
~~
~.
0.
I-
I-
,+,,,
,r.
egat
a.
~'_
. ~
.0
~
.ooT
.ruS.
A,
,O I
.'~
~til.o
oms
__1~
.]~
$~ES
OATA
St
-Fron
tage
A,.
I "'Y
= ~,
.oom
a IA
C='i
i i
Alle
y Al
l3 I
"7~
co
nd,,
,o~
j~9
/ P,
A' G
3 Ut
,, O~
G A,,
.___ t
.' Y'
~o
r.=ns
hi~
t ~'°
/ P'
A' ~
C
urbs
& G
utte
rs
AI1
5 I
N t
Y~
Stg.
Spa
ce
I A J
ll
/ P~
A2
G3
__SH
EETS
CO
ST D
ATA
Qua
lity
Cla
ss
AC
-~
--
~
Z I
I $
Tota
l Li
ving
Are
a AC
O5 I
i
i i
--
1st Flr. A
rea
AK01 i
} i-
-i--
---
2ndF
lr, A
rea
AZ
02i
i i
!
2~d E
lf. E
~tor
~o
3 ! T
T--
,-
t1~
. 3r
d Fl
r. A
rea
AK
04
3~d
Fir,
Fac
tor
AK05
I
i i
i I"
I' l
Bsm
t. A
rea
AK__
~ !
! ,,,,
,i ---
Bsm
t. Fa
ctor
A
K_~
~
__
A
ddn.
Are
a ~
08
'
' '
Addn
. Fa
ctor
~
~-
'-
-
N I
yZ
Hea
ting
(Duc
ted)
N I
y2
Car
port
Area
AK
11 I
Car~
Fac
tor
~,~t
i~
i
Pch. A~'
ea
A~1
3
Pch.
Fac
tor
AK14
Sid
ewal
ks
~i1
6
St.
Ligh
ts
A117
N
~ y
2
Hor
ses
" i
Com
mon
Gre
en
At 1
8 t
N 1
y
Com
mon
Rec
. A
I19
I N
1 y
2
H.
& B.
Use
A
I20
I N
1 y
2
Sew
er
AI21
I
N 1
y2
Vie
w
AI2
~
N'Y
2
Vie
w O
ual.
AI2
3 I
P ~ A 2
G 3
Traf
fic F
low
A
t24
H ~
A 3
L 3
Wat
er F
ront
A
25
N 1
y2
Beac
h Fr
ont
~i2
6
1 N
1 y
2 I
Doc
king
Rig
hts
AI2
7 I
N 1
y2
A,!~
....
N' Y~
FIEL
D U
SE o
NLY
- D
O N
OT E
NC
OD
E
Base
,ot
A!~
I L
I I
Coo
ling
(Duc
ted)
Gar
.
Gar
, C
ony,
Cav
ort
Pat
io
Dec
king
Pool
Fenc
e
Gue
st H
ouse
Fire
plac
e
Bui
lt~In
s
Stru
. Fa
ilure
Ti le
Roo
f
A J1
2
A J1
3
A Jt
4
A~m
A J17
A
J18
~20
A J2
1
A J2
2
~25
AJ1
3 N
1 y2
N ~ y2
N 1 y;
N I y2
N ~ y2
N I
y2
N 1
y2
N 1
y~
N';
Y~
N I
y2
N'~
-.r,2
L'~
M2
H3
N t
y:Z
Fire
plac
e C
ost
.~
AK
I~
~st-
- J.A
Kml
! i
i I
Pat
io A
rea
IAK
17
Pat
io F
acto
r IA
K18
Pool
Are
a IA
K!9
Po
ol E
xtra
s ]A
K20
Mi s
c, C
ost
IAK
21
Mis
c, S
troct
. C
ost
I A
K22
TOPO
GR
APH
Y ii
Gr.do
IA
L0i
I .!
~ ~3
/ Ba
nk
AL0_
~ N'Y
___L
J IA
.B3
I N'
Y'
A N;
"Y ~ "
IL°'l
.
Slop
e
Oth
er
t~
Figu
re 5
. Fi
eld
wor
k sh
eet
for
appr
aisa
l da
ta
County of Santa Clara
173
Office of the County Assessor 201 County Admirfistrat~on Building
70 West Heddin9 Street San Jose, California 95114 299-3941 Area Code 408
California
2.
3.
4.
5.
Date Recorded
Recorder's Deed #
Property Description #
Our records indicate you purchased this property.~
What was the full price?
a. Amount of cash down payment:
b. Please enter details concerning any balance:
(I) Ist Deed of Trust $ at Duration of Loanyears
(2) 2nd Deed of Trust $ at Duration of Loan years
c. Was a trade involved? Yes No
d. Outstanding Improvement Bonds against property $
e. Did price include personal property? Yes No
If yes, please estimate value of such property $
f. If this is income property, please enter the monthly gross income
as if it were 100% occupied $
Remarks: Please enter on the reverse side or by attachment any infor- mation you feel may help us to make a fair appraisal of your property.
If you would like mail concerning this property sent to a different address from the one above, please indicate below.*
If there are questions regarding this questionnaire please contact the Assessment Standards Division at 299-3941.
% interest
% interest
See Reverse Side.
Signature of Owner
Telephone Number
Date
(~, 7047 REV, 11/71 .~
Address
City, State, Zip
F i g u r e 6. Assessor 's q u e s t i o n n a i r e f o r f i n a n c i a l da ta
Sour
ce M
ap
1 ®Bl
ock
1 21
~
Blo
ck 4
17
2O
O
Blo
ck 8
16
19
< -&
Blo
ck 2
1st S
treet
Blo
ck 5
2nd
Stre
et
7~
B
lock
3
6 D
B,o
ok7
13
11\
12
Blo
ck 9
1
2B1 lO
ck 1
0
Num
ber o
f Seg
men
ts
Map
le A
v.
5 El
m D
r, 3
Coy
ote
Cr,
6 1s
t St.
4 2n
d St
. 4
= 22
seg
men
ts (re
cord
s)
Con
tent
s of a
Sam
ple R
ecor
d C
onte
nts
Map
le A
v. N
.W.
2 15
3000
0 31
000
18
1530
100
3010
0 20
5 20
5 42
42
5 10
0 19
8 10
1 19
9
Lege
nd
Nam
e T
ype
S
u{f
ix
Fro
m n
od
e m
~m
ber,
x,
y, m
ap
nu
mb
er
To
no
de
nu
mb
er,
x,
y, m
ap
nu
mb
er
Le
ft t
ract
, ri
gh
t tr
act
, le
ft p
lace
, ri
gh
t pl
ace,
{ef
t b
lock
, ri
gh
t b~
ock
Le
ft L
ow a
ddre
ss,
hig
h a
ddre
ss
Rig
ht
low
add
ress
, h
igh
add
ress
Map
6
Cen
sus T
ract
205
Pl
ace 4
2
:3
4~
Figu
re 7
. 'D
IME
' ge
ogra
phic
bas
e fi
le s
truc
ture
~[
Yea
r 10
I Y
ear
2
Yea
r 1
Zon
e 1 2 3 N
i C
rimes
C
rimes
...
I
#1
#2
I
Hou
ses.
..
Pop
ulat
ion
3 15
50
14
6 17
6
53
203
4 4
41
192
26
9 15
47
--4
Figu
re 8
. E
xam
ple
of t
able
s in
ext
ract
ed d
ata
base
176
r l Data I Extraction I Functions/ Presentation I Aggregation t
I l Data Subsetting I Description I
Large Data Base Management
h I [°'''Ex"iracted / I 1 Data Base i:~X4 I Management
J i\\
F ' I ; ; ©-.-© Large Data Base
Figure 9. Interactive data extraction and problem solving system
177
Census- Compatible Property Data
T Property I nfo. System
Transportation Zones Data Base
Dynamically- Aggregated Property and Census Data (Compatible)
II
Added Data (Public Lands, etc.)
Figure 10. Extraction and aggregation relating property and census data
Z U
m--W
X.J
Wm.-$
Ol.-
b~
.Z
W
W2
>
..JW
JO
::) I--
0 Z
H
W Z
~t W
U
Ul
Z Z
Z O
O
O
~
-i H
~
I"-- I---
1"- ~
t ~
U
H
U
H
lUL
¢~
W
ell gL
1E
Z "~
O
Y
0
H
Z~
gL
E
Z 0 Z L~
2>
U~
I-- Z W
2> w
_J
gl
p..,~ H
Z
o'1
~9
Z W
Z H
O
Z ~J
o E
2~
E
o~
E
o x LU
o') LL
[RP
S
2
~'14
TCH
CH
1
3
1 rr
No
¢H
S
fL
AN
K1
CH
t
Sr~
rN~R
[ C
H
16
rTT
YP
E
CH
2
L~T
Y
CH
10
2"
IP
cz
s rL
RN
K2
C
H
1 C
Z 7
rt.~
NK
3
CH
2
TC
! 6
m
m
m
m
m
CR
[RT
[ A
ND
[D
ZT
D
AT
A
D[|
CR
ZP
TZ
ON
|
ZB
fl
[HP
LO
YE
E
FZ
L[
~Z
TH
X
,Y
HA
TC
H
KE
Y
rRO
fl
UN
XflR
TC
H
ST
R[[
T
HU
R|[
R
ST
RE
[T
NR
H[
9T
RE
[T
TY
P[
NR
fl£
CO
D(
CO
OR
DZ
NA
T£
CO
OR
DZ
NR
T[
SC
RR
TC
HP
RD
~
0
]LIP
UT
R
OD
E
GE
T
PR
G[
6 R
[ST
OR
[ R
(PL
RC
[ R
OD
[ sO
N
UZ
(~
TIT
L[~
~'"
C
H[C
K
SR
U£
P
AG
E
RE
TU
RN
R
(DR
RU
...
...
[RA
S[
DE
LE
TE
L
INE
IN
SE
RT
B
LA
NK
L
ZN
[ C
OP
Y
LIN
E
DO
WN
P
RZ
NT
Fig
ure
11a
. D
ata
de
scri
pti
on
fu
nct
ion
CR
ER
T£
A
ND
[D
IT
S[L
.[¢
T][
ON
S
L,P
X<
nt
OR
X
~2
ttia
il
TH
EN
X
~t
;
Y(=
m
OR
Y
)lll
iJJ
88
0
TN
[N
yn
J;
~L
EC
T
MN
I[R
[ Z
IP
~ 9
51
25
SC
RA
TC
HP
AD
INP
UT
H
OD
E
=ON
G
ET
PA
GE
1
5
RE
ST
OR
E
RE
PL
AC
E
MO
DE
V
IEW
T
ITL
E~
"-'-
- C
HE
CK
S
AV
E
PA
GE
1
5
RE
TU
RN
R
ED
RA
M
....
....
E
RA
S[
DE
LE
TE
L
%N
E
INS
ER
T
BL
AN
K
LIN
E
CO
PY
L
IN£
D
ON
N
PR
INT
Figu
re 1
1b.
Dat
a se
lect
ion
func
tion
NT
I81
8~
2
NT
~I@
EB
4
MT
8~
18
~4
fl
T0
11
80
4
MT
me
eS
g4
h
Tll
eli
4
MT
~ll
il4
HT
00
00
04
MT808084
MT8580B4
MT008082
MT080804
MT008802
MT080004
MT
@08084
MT~08004
MT000084
MT
g0
02
84
MT~08~84
MT000804
MTeeeeQ4
MT8808~4
MT~gO~4
MT000004
GR
AD
DR
ES
S
MT8%OB~4
MT
~0
00
04
M
Tg
g~
ge
4
HT
80
08
~4
MTee0004
MTeeeee¢
1818
8 0
18
45
1
16
08
8
16
31
I1
19
1
81
14
8
01
48
7
e 8
15
39
8
01
49
9
8 ~
1~
54
01
93
0
01
74
2
81
34
8
a1
55
1
01
48
2
81
62
3
01
5~
8
81
55
0
81
06
0
01
8B
8
81
82
4
81
47
9
~1
51
4
01
50
2
02
24
3
~1
65
6
0 e
~9
4~
~
o9
18
e~870
ee
91
%
81
67
4
E1
43
6
LIS
TIN
G:
FIL
E
= £M
P
US
ING
P
lP2
: N
O
GL
EN
EY
RI[
G
L£
N[Y
RI[
G
L£
NF
IEL
D
GL
EN
FI£
LD
G
LEN
N
GLE
NN
G
LE
NP
INE
GL
EN
PIN
[
GL
EN
PIN
E
GLE
NU
NR
G
L[N
UN
A
GLE
NU
NA
G
LEN
WO
OD
G
RA
CE
G
RA
CE
G
RA
CE
G
RA
CE
G
UA
DA
LA
JAR
A
HA
MIL
TO
N
HA
RM
IL
HA
ZE
LWO
OD
H
ER
VE
Y
HE
RU
EY
H
ICK
S
HIC
KS
H
ILL
SD
~L
E
HU
MM
ING
BIR
D
HU
MM
ING
BIR
D
HU
MM
ING
BIR
D
HU
MM
ING
BIR
D
HU
ST
ED
H
US
TE
D SE
LE
CT
ION
Fig
ure
1 lc
. Li
stin
g of
dat
a se
lect
ed
SA
N3
0$
E
S5
12
5
15
91
72
3
29
|3|4
A
U
SA
NJO
8£
8
51
28
1
59
88
39
2
97
9E
1
DR
9
AN
~O
B[
95
12
5
15
92
58
1
28
$7
80
D
R
SA
N3
08
E
95
12
5
15
92
37
?
28
58
47
A
U
8A
N3
0S
E
95
12
5
15
91
59
3
29
63
20
A
U
SA
NJO
SE
9
51
25
1
59
17
58
2
96
37
6
DR
S
AN
30
S[
95
12
5
e
D~
S
AN
3O
S[
95
12
5
g
DR
S
~N
30
9E
9
51
25
AU
S
AN
JOS
£
95
12
5
15
96
14
8
29
36
96
S
~N
JO
S[
85
12
5
15
96
62
8
29
28
22
A
U
SA
NJ
OS
E
95
12
5
15
96
14
#
29
36
98
9
RN
JOS
E
95
12
5
15
92
29
1
29
33
14
A
U
SA
NIO
SE
9
51
25
1
58
75
14
2
93
09
5
RU
9
AN
30
SE
9
51
25
1
58
77
19
2
94
19
2
AU
S
AN
3OS
E
95
12
5
15
87
51
4
28
30
85
A
U
SA
N3
0S
E
95
12
5
15
87
68
7
29
32
32
A
U
SA
N/O
SE
9
51
25
1
59
26
33
2
67
30
4
~Y
6
AN
IOS
E
95
12
5
15
91
10
9
29
24
71
N
Y
SA
NIO
SE
9
51
25
1
59
62
99
2
93
94
4
AU
S
AN
30
SE
9
51
25
1
59
60
56
2
92
68
1
LN
SA
N/O
SE
9
51
25
1
59
66
32
2
97
08
4
LN
S
AN
JO
SE
9
51
25
1
59
73
40
2
96
30
5
AU
S
AN
JO
SE
9
51
25
1
59
16
25
2
93
92
3
AU
S
AN
JO
SE
8
51
25
1
59
28
33
2
89
48
2
AU
5
AN
ZO
SE
9
51
25
DR
S
AN
JO
SE
9
51
25
1
58
91
86
2
89
58
7
DR
S
aN
/OS
£
95
12
5
I59
92
25
2
89
35
2
DR
9
AN
30
SE
9
51
25
1
59
92
25
2
89
35
2
DR
9
RN
JOS
E
95
12
5
15
99
22
5
28
93
52
A
U
SA
NIO
SE
9
51
25
1
59
13
35
2
85
41
2
~V
S
AN
IOS
E
95
12
5
15
94
50
1
28
76
5~
=
EM
P
RE
TU
RN
P
AG
E FO
RW
AR
D
~A
GE
5