geographic base files: applications in the … insight and judgment would be aided and guided by...

34
GEOGRAPHIC BASE FILES: Applications in the Integration and Extraction of Data from Diverse Sources Patrick E. Mantey, Eric D. Carlson, IBM Research Laboratory, San Jose, California Abstract This paper addresses the development of integrated municipal data bases, with consideration given to political realities and to the sources of data now available in municipalities. First, the potential users and potential uses of a municipal data base are discussed, and an information system which would serve these users and uses is considered. Next, the "current" status of data bases in municipalities is reviewed and it is concluded that there is a large quantity of data available in many municipalities, but that integrated data bases fer supporting an information system are not usually a reality. The problem of building an integrated data base from the variety of data sources presented by local agencies is then addressed. A central ingredient for an integrated data base is the Geographic Base File (GBF) which permits the construction of extracted data files from these multiple sources to support information system applications. The concept of extraction, for the integration of diverse source files via geographic references, is developed (and a prototype implementation is described in the Appendix). Using the GBF, and source data from various municipal functions, extracted data bases can be rapidly built to serve a variety of applications of an information system in the decision-making situations of municipalities. I. APPLICATIONSOF A MUNICIPAL DATA BASE Municipal governments are, in essence, created to deliver services to a geographical area. There is an unusual variety, in comparison to private industry, of services offered. Local government is often structured (or fractured) along functional lines into special districts, as well as by geography. Such structuring has precluded concentration of power, but it has increased the complexity of planning, resource allocation, or management. Many of the problems in municipal government require decisions which are not routine. Rather, they require the professional insight and judgment of human decision makers who consider the specific conditions of each problem. Ideally, this

Upload: vuongkien

Post on 29-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

GEOGRAPHIC BASE FILES: Applications in the Integration and Extraction of Data from Diverse Sources

Patrick E. Mantey, Eric D. Carlson, IBM Research Laboratory, San Jose, Cal i forn ia

Abstract This paper addresses the development of integrated municipal data bases, with

consideration given to po l i t i ca l r ea l i t i e s and to the sources of data now avai lable

in munic ipal i t ies. F i r s t , the potential users and potential uses of a municipal

data base are discussed, and an information system which would serve these users

and uses is considered. Next, the "current" status of data bases in munic ipal i t ies

is reviewed and i t is concluded that there is a large quantity of data avai lable in

many munic ipa l i t ies, but that integrated data bases fer supporting an information

system are not usually a r ea l i t y . The problem of building an integrated data base

from the var ie ty of data sources presented by local agencies is then addressed. A

central ingredient for an integrated data base is the Geographic Base Fi le (GBF)

which permits the construction of extracted data f i l e s from these mult iple sources

to support information system appl icat ions. The concept of extract ion, for the

integrat ion of diverse source f i l e s via geographic references, is developed (and a

prototype implementation is described in the Appendix). Using the GBF, and source

data from various municipal functions, extracted data bases can be rapid ly bu i l t to

serve a var ie ty of applications of an information system in the decision-making

si tuat ions of munic ipal i t ies.

I . APPLICATIONS OF A MUNICIPAL DATA BASE

Municipal governments are, in essence, created to del iver services to a geographical

area. There is an unusual var ie ty , in comparison to pr ivate industry, of services

offered. Local government is often structured (or fractured) along functional l ines

into special d i s t r i c t s , as well as by geography. Such structur ing has precluded

concentration of power, but i t has increased the complexity of planning, resource

a l locat ion, or management.

Many of the problems in municipal government require decisions which are not

routine. Rather, they require the professional insight and judgment of human

decision makers who consider the speci f ic conditions of each problem. Idea l ly , th is

150

insight and judgment would be aided and guided by appropriate information derived

from a comprehensive data base. This is the objective of a municipal information

system: to f ac i l i t a t e effect ive analysis and solution of specific problems by

supporting human decision makers with data resources and analysis functions in

readily usable form. Because a municipality provides services to a geographical

area, much of the data relevant to decision making or problem solving in local

agencies w i l l have geographical at tr ibutes, and can be given spatial interpretation

via maps. A key attr ibute of a municipal information system is the capabi l i ty for

displaying information in the form of maps. Another requirement for such systems to

be effect ive is that they support ready use by decision makers who know very l i t t l e

about computers. The system must help the decision makers develop their precise

objectives or decision c r i te r ia for solving a problem. The solution process in such

decision making requires exploratory analysis, selection of relevant data, and

meaningful data presentation in an interactive environment, A system which provides

such capabi l i t ies, called GADS (Geo-data Analysis and Display System) has been

developed and evaluated in several applications, such as police manpower allocation

and analysis of urban development policies [ I -4 ] . The evaluations of GADS indicate

that interactive analysis and display systems have a great potential in the

operations, management, and planning of municipal i t ies.

As an example, consider a municipality which maintains a computerized property

information f i l e (via the tax assessor function). Such a f i l e would have data on

each parcel, possibly including:

address

owner

zoning

improvements

date constructed

type construction

size (area)

current use

area

centroid

assessed value

I f this data were accessed via an interactive information system, a decision maker

could readily obtain, for example, the address and assessed value of a l l residences

constructed between 1960 and 1962 and having f loor area between 1600 and 1800 square

feet, on lots with 6800 to 7200 square feet. I f the user of the system were a

real-estate appraiser, this information in tabular form might be of value in

determining i f a part icular home is f a i r l y appraised (Figure I ) . A histogram giving

the distr ibut ion of assessed value of these homes would provide additional insight

151

(Figure 2) and a map re la t ing the average value of such homes in the c i t y ' s planning

areas to the ci ty-wide average (Figure 3) would provide the appraiser with

information in a spat ial framework.

I f the user of the information system were an assessor concerned with determining

the neighborhoods which could be considered equivalent for purposes of

computer-aided appraisal, addit ional data would be required. I f recent sales

records are the basis for ca l ibrat ing the assessment model, i t may be found that the

assessor's data alone cannot be used to model var iat ions in se l l ing price of houses

f i t t i n g the description above. Showing on a map the mean and variance of the

se l l ing price of such houses by neighborhood w i l l o f fer the assessor a visual means

for examining the qua l i ty of the assessment model. The display may cause the

assessor to consider other factors to explain the var ia t ions; e.g. crime rate, level

of public f a c i l i t i e s and services (such as the influence of an adjacent regional

park [5 ] ) or the influence of other near-by land uses.

In making decisions related to resident ia l zonings, pert inent questions and

information displays would re late to adjacent land use, the ef fect that the proposed

development would have on the mix of housing stock avai lable in the community, or to

the ef fect these new residents would make on the per-capita park acerage in the

area. The school o f f i c i a l s (and in some areas the local zoning author i ty) need to

evaluate the impact such a development w i l l have on the exist ing school f a c i l i t i e s .

Each of these questions, and many others, are of in terest to d i f fe rent decision

makers involved in the area of community development.

An example of the use of a municipal data base in resource al locat ion could re late

to f inding the best approach to the reduction of burglaries in a resident ia l

neighborhood. One group may advocate better street l igh t ing . I f the queries

re lat ing to these burglaries show they are day-time crimes, another approach

probably offers greater promise. Certainly the level of police patrols would be

examined. I f the area is found to have a high percentage of two wage-earner

fami l ies , and the majori ty of burglaries are during the week, then some suspicion

w i l l f a l l on the chi ldren. I f a large number of burglaries occur on school days, i t

may appear that the school-age children are not the perpetrators. However, i f the

data base contains school information so that i t can be determined that a school

adjacent to the neighborhood has f l ex ib le scheduling and that school children are

free to roam about o f f school grounds, the source of the trouble may have been

iden t i f i ed .

As a las t example, consider the application for a building permit for an automobile

service stat ion. With the recent wave of service stat ion abandonments, many

munic ipal i t ies are reluctant to grant such permits. S imi la r ly , f inancial

ins t i tu t ions are probably wary about loaning money for such ventures. For a l l

152

concerned,

required.

stat ions,

of the automobile ownership and disposable income in the surrounding areas.

In these examples i t has been assumed that the information system has access to

comprehensive municipal data base, However, such data bases are a r a r i t y today.

the next section the current status of municipal data bases is discussed, and

Section I I I an approach toward the provision of integrated data is offered.

and for the public good, a careful analysis of such a proposal is

Factors of in terest would include location and number of exist ing

t r a f f i c access and t r a f f i c patterns at the proposed s i te , and an estimate

a

In

in

I I . CURRENT STATUS OF MUNICIPAL DATA BASES

The importance of a comprehensive integrated data base to support decision-making in

municipal government has been widely recognized. There have been several d i f fe rent

approaches taken to the development of comprehensive municipal data bases.

One approach, which was popular in the 1960's, was the development of a

comprehensive "data bank". This data bank was usually generated as a special

co l lect ion or census~ and often was carried out and funded as part of a

comprehensive transportat ion and land use planning study. As a part of such

studies, detai led f i e l d surveys of land use (at the parcel leve l ) were conducted,

and survey data also was gathered on employment, income d is t r i bu t ion , and commercial

a c t i v i t y (Figure 4). The data acquisi t ion consumed a major portion of the study

resources, and did not provide any information re la t ing to many municipal services

(e.g. public safety) . Although accurate data was often gathered in the development

of these data banks, the i r value was shor t - l ived because they were at best a

snap-shot of the state of a very dynamic system, and no means were provided for

updating and extending these data banks. At about the same times the computer was

becoming a tool in the operations of local governments. The appl icat ion of

computers by munic ipal i t ies beginning in the 1960's can be characterized as a

funct ion-by-funct ion approach, with data processing introduced into tasks involving

high-volume routine transactions. The computer is generally u t i l i zed in those

functions which have previously been computerized in pr ivate industry: payro l l ,

accounting, b i l l i n g , budget status report ing, personnel records, etc. Also, the

1960's saw wide-spread use of computers in the processes associated with elections

and in operations of law enforcement agencies. Usually these applications were

isolated from each other~ and no attempts were made to make th is information

avai lable for use by other municipal functions°

The real property assessment function of local government in the la te 1960's began

to recognize the potential of computer appl icat ions. The court decisions in various

locales requir ing property to be appraised at current market value placed a s ign i f i can t l y increased work load on assessment o f f i c i a l s . In numerous areas, the

assessor has turned to computer-aided appraisal to meet these demands. I f a model

153

is to be bu i l t and calibrated to re l iab ly estimate fair-market value of residential

properties, a comprehensive real property data base is a must. These data bases

were constructed, often by computerizing the data contained on the assessor's f i l e

cards which were maintained on each parcel. Some property data bases, besides the

usual references to assessor map, book and page and to situs address, also added

geographic data such as a "centroid". Computerized real property systems required

the f ie ld acquisition of very comprehensive data. A work sheet for f ie ld surveys by

appraisers in a California county is shown in Figure 5. Note the detailed land and

building attributes used in this system. There was a signif icant increase in the

amount of data used and required by the introduction of this computerized appraisal

system, but much of the data on this sheet would not change from year-to-year, and

the appraiser in the f ie ld would only need to correct those data items which had

changed.

An additional requirement for computer-aided appraisal is data relating to current

sel l ing price of residential properties. This data provides the calibration

information for the regression models, and is available, depending upon the state or

local laws, from the registrar of deeds, from the collector of transfer taxes, or

from t i t l e companies. Assessors in some locales obtain this data by questionnaires

which buyers are required by law to complete and return (Figure 6). These sources

and/or others also can be used to obtain financial data (e.g. mortgage terms) for

the property. Clearly this data base is an integral part of a municipal data base

for applications such as i l lust rated by the examples in Section I.

Another approach related to the development of municipal data bases is characterized

by the USAC projects [6], part icular ly those of Charlotte, North Carolina, and

Wichita Falls, Texas. These c i t ies were funded by the Federal USAC project to

build Integrated Municipal Information Systems, (IMIS). The concepts of IMIS are

[6]:

"(1) (2)

Integrated data processing systems should inter-re late municipal processes.

A fundamental analysis of municipal operations and ident i f icat ion of related

data processing components is a precondition to the effect ive use of computers.

(3) A systems approach is required throughout the development process.

(4) The automation of municipal operations must exploit the fu l l range of computer

technology.

(5) Automation of routine municipal processes is a fundamental condition to the

realization of an IMIS.

(6) An IMIS views the municipality as a basic building block for intergovernmental

information systems.

(7) Municipal information systems are by-products of computer-driven, operations-

based systems.

154

(8) Adequately designed data processing systems can be transferred from one

munic ipal i ty to another.

(9) The integrated approach to municipal systems development must proceed on the

basis of a plan within which incremental i ns ta l l a t i on may be achieved in

accordance with the p r i o r i t i e s and resources of any par t icu lar c i t y . "

The USAC ef fo r ts involve c i t y governments, and were the consequence of studies such

as the IBM/New Haven project [7] and the USC/Burbank project [8 ] . These groups

sought to develop a methodology, via a "systems approach", for the appl icat ion of

computers by munic ipal i t ies [6] but did not resul t in system implementation of

integrated municipal information systems.

The USAC approach wisely focused on operational sources to provide the current data

required for municipal decision-making. In the implementations, which are s t i l l in

progress, the c i t i es have concentrated on building up operational uses of computers,

and on implementing these applications on a central computer under an integrated

data-base management system. The value of computers to these operational functions

has been confirmed [9] but the applications of IMIS in the areas of management

decision making are s t i l l to be demonstrated. One of the d i f f i c u l t i e s that must be

overcome in providing a comprehensive municipal data base, (even in ci~ies with a

f u l l y integrated and operational IMIS constructed according to the USAC philosophy),

is that complete in tegrat ion, where a l l municipal functions use the same computer

and data base management system, is not a l i k e l y prospect with current local

governmental structure and with the l imi ted resources of local governments. For

example the data pert inent to decision making in a c i t y may be gathered by another

agency, such as the tax assessor (and conversely), and may reside on d i f fe ren t

computers, under d i f fe ren t data management schemes and in d i f fe ren t f i l e formats.

In addit ion, problems of data secur i ty , compat ib i l i ty of f i l e s , and high processing

costs may make complete integrat ion unrea l is t ic for many munic ipal i t ies.

Special data co l lect ions, such as the U.S. Census, and data avai lable from state

sources, must also be readi ly incorporated into a municipal data base. With census

data gathered according to blocks, block groups and census t racts , with assessors

property data coded according to assessor map, book and page, with public works data

in state-plane coordinates, and school data gathered by school attendance area, the

building and maintenance of a t r u l y integrated municipal data base presents a

formidable task.

I I I . APPROACHES PERMITTING DEVELOPMENT OF INTEGRATED FILES

A completely integrated data base would have a l l data re la t ing to any functions of a

munic ipal i ty residing on the same computer system, under the same data management

system and organized and indexed to f a c i l i t a t e corre lat ion. This ideal is not

155

attainable, given present organizational structures and computing capabil i t ies in

most municipalit ies. However, i f the computerized f i l es of various municipal

functions are "properly structured", i t w i l l be possible to achieve the same

benefits as i f there existed a completely integrated municipal data base. In

addition, such an approach wi l l not require re-implementation of current

applications, but rather leaves the application data base in the control of the

function responsible for i ts primary maintenance and use.

The approach taken is to make use of data, when data f i l es are "properly

structured", to develop ef fect ive ly the results as i f there existed a completely

integrated data base without requiring that complete integration take place. This

should not be construed as an argument against integration. I f integration is

p o l i t i c a l l y and technically possible, provides required data security, and is

economically at t ract ive, i t should be implemented. Even with an integrated data

base, there w i l l always be decisions which require di f ferent groupings of data than

those supported by the integrated data base. (There w i l l also remain, in practice,

data sources which cannot be integrated.) So, the problem of providing a

comprehensive data base from multiple data sources is unavoidable and is not

completely solved by an "integrated" data base.

The "proper structuring" required to make data integration possible can be

i l lus t ra ted by example. I f one is interested in information about burglaries, and

wishes to relate this information to neighborhood conditions, data sources could

include police dispatch f i l es , criminal justice arrest f i l e s , census data and tax

assessor f i l es . Suppose the police dispatch data is used for burglary incidence,

and that such data is available in terms of police beats, e.g. the number of

burglaries in each beat for each day. I f one wishes to use census data for

socio-economic information, and i f the census tracts and beats have few common

boundaries, no small area information is obtainable relating these data sources.

Al ternat ively, i f the police dispatch data is captured by the street address of the

ca l l , and i f a directory exists for the c i ty which w i l l permit ident i f icat ion of the

census tract for each street address, then burglaries and socio-economic data can be

related at the census tract level.

"Proper structuring" of data f i les only has meaning with respect to potential uses

of the data ( i .e . data f i l es are not an end in themselves). I f the objective is to

offer data to support decision making in a wide range of problem areas, then the

data f i l es must be as detailed as possible, within the constraints of economics,

privacy and security. The detailed data can then make possible the development of

the widest variety of data subsets and aggregations, and is more l i ke l y to permit

development of the required set of integrated data for a part icular decision-making

context. An additional requirement is the existence of data elements in each f i l e

which w i l l f ac i l i t a t e relating the data to that from dif ferent f i l es . (In this

156

paper, geographical references w i l l be singled out as data elements serving th is

function in municipal f i l e s . Common references to account numbers, project numbers

or personnel i den t i f i e r s are other examples of data elements permitt ing the

re la t ing of data from d i f fe ren t source f i l e s . ) A set of f i l e s w i l l be called

"properly structured" i f they contain information permitt ing the re la t ing of data

from d i f fe ren t source f i l e s so that integrated subsets of data at the appropriate

level of detai l can be developed to support the requirements of problem solvers.

Because municipal government is a service del ivery function, mutual references to

geography can often be used to re late data from the diverse f i l e s avai lable in

munic ipa l i t ies. A powerful f i l e in f a c i l i t a t i n g the re la t ing of data, based on

these common geographical references, is a Geographic Base Fi le (GBF).

Functional ly, the GBF contains data to support the re la t ing of data from other f i l e s

to geographical location and also the display of th is data on a map. The creation

of a GBF for a munic ipal i ty is a key requirement in the development of a municipal

data base from source f i l e s . Several d i f fe ren t approaches have been taken.

The simplest GBF is a f i l e sometimes cal led a Property Location Index (PLI) which

contains a l i s t of the val id addresses in the munic ipal i ty and an x,y coordinate for

each. This approach is the one used in Lane County, Oregon, and by the Assessor in

Santa Clara County , Cal i forn ia. To make th is more useful, a l i s t of public place

and st reet intersect ions and the i r x,y coordinates is appended. With such a GBF i t

is then possible to automatical ly convert addresses (in the police ca l l f i l e for

example) to x ,y coordinates. I f the GBF also contains the police beat, census

t rac t , and munic ipal i ty for each address, then i t is very simple computationally to

count the number of ca l ls in each beat. Evaluation of ca l ls by census t ract would

also permit consideration of socio-economic data with the crime data (of course,

police o f f i cers could also encode ca l ls by beat and census t rac t , but th is approach

is l i ab le to s ign i f i can t errors and seems to be a poor use of police manpower).

The most detai led GBF's contain d ig i t i zed land parcel boundaries, easement

locat ions, bui lding out l ines, u t i l i t y placements, and even topographic information,

along with s t reet address information on a l l parcels and names of a l l public lands

and bui ldings. This GBF is at the level of deta i l of surveyor's data, and is

suitable for engineering applications and detai led map bui lding. Ottawa, Canada's

National Capital Commission [ I I ] has pioneered in the development of th is kind of

GBF.

The most common GBF at the present time is the resu l t of work by the U.S. Census

Bureau in conjunction with the 1970 Census. Using the Metropolitan Map Series, a

massive feature label ing and d ig i t i za t i on was performed for 200 major metropolitan

areas in the United States. The resul t ing computerized maps were called the DIME

(Dual Independent Map Encoding) f i l e s [12]. Each entry (record) in the DIME f i l e

157

represents a l ine segment (a portion of a street segment, ra i l road, creek, c i t y

l i m i t , e tc . ) . Figure 7 shows a sample record and the map data from which the record

is derived. The segment has a "From" node and a "To" node, as well as a Left and

Right side. Thus, each entry describes i t s two ends and i t s two sides. The

description of the nodes includes x,y coordinates, produced by the Census Bureau's

map d ig i t i za t i on . The nodes are label led as f a l l i ng on a par t icu lar map of the

series, and given a sequential number within map and census t rac t . The other data

on each entry is feature ident i f#cat ion for the segment and i t s sides. Each feature

is ident i f ied by a pre f ix , name, su f f i x , type (e.g. , North Army Southwest Street) ;

only name and type are required (e.g., Coyote Creek), The description of a side

actua l ly describes the adjacent land. The census t rac t number, the block number,

and the place (c i ty ) code are included for each side. High and low address ranges

for the st reet segment sides are also given. The records are ordered by feature

name and low address (features without addresses

Administrative overlays (e.g. beats, census t racts)

of segments of this f i l e . Used in combination with

these computerized overlays f a c i l i t a t e development

any specif ied overlay map.

have no secondary ordering).

can be readi ly defined in terms

"point- in polygon rout ines",

of counts of events in areas of

As in the development of any large machine readable f i l e , high startup costs, data

errors, and poor standardization have hindered development of GBF's. But the key

problem in the development and use of a GBF is edi t ing (corrections and addi t ions) .*

Because of the startup cost, accuracy, and standardization problems, edit ing is a

key aspect of development. I t is pa r t i cu la r l y important to ve r i f y the topological

and coordinate accuracy of the f i l e . Even i f there were no developmental problems,

"geographic" changes, such as new streets or changing area boundaries, make f i l e

edi t ing essential to a useful GBF.

The Census Bureau and related ef for ts have produced programs for o f f - l i ne creation

and batch edi t ing of a DIME f i l e [12-13]. These programs require a d i g i t i ze r for

data entry and take large amounts of computer and c ler ica l time for edi t ing.

Although the procedures were used to create 200 GBF's, there has been l i t t l e

ed i t ing, and hence l i t t l e use, of these f i l e s , Some c i t i es (e.g., [14]) have

developed the i r own GBF's s imi lar to DIME. These ef for ts are also characterized by

the use of a d i g i t i z e r and batch computer programs for f i l e creat ion, and by

cumbersome f i l e edit ing procedures. There have been a few ef for ts to develop

on-l ine d ig i t i za t i on systems, (e.g., [15], and there are experimental systems which

could support on-l ine d ig i t i za t i on with visual feedback [16-17]. Yet, none of these

systems provide a l l of the capabi l i t ies required for e f fect ive GBF creation and

edi t ing.

*The Census Bureau uses the word "edi t ing" to mean topological ve r i f i ca t i on , and uses "update" for what we define as edi t ing in th is paper.

158

Conclusions drawn from the IBM study [ I0 ] regarding the requirements for in terac t ive

GBF edi t ing and maintenance were:

I . There must be a capabi l i ty for projecting hard copy maps and/or photographs onto

the display screen. I t must be possible to select a rb i t ra ry (contiguous)

sections of the maps, and to produce a range of scales.

2. The display system must be able to handle mul t ip le , non-rectangular geographic

coordinate systems.

3. The display system must be able to produce both text and l ines, with at least

three colors for l ines (in order to be able to dist inguish two maps).

4. The display system must enable selection of any addressable point on the screen,

whether or not anything is displayed at that point.

5. The creation and edi t ing functions must include: d ig i t i za t i on of base and

overlay maps; label ing of points, l ines, and polygons in the maps; moving and

delet ing points and l ines; display of any section of the maps, and of speci f ic

points, l ines and polygons; and checking for topological accuracy.

IV. DATA EXTRACTION

A. Philosophy and Operation

In the previous section, the combination of a GBF and properly structured source

f i l e s containing geographic references were iden t i f i ed as the basis for of fer ing a

problem solver an e f f ec t i ve l y integrated data base to support decision making.

Recent studies of in teract ive information systems applications in the solution of

unstructured problems [4,18,19] have ident i f ied the need for reduced subsets of data

for supporting the problem solving. Data reduction is required because:

a. the po ten t ia l l y useful data base w i l l be much larger than the data actual ly

used, b. the user w i l l want access to varying levels of deta i l in the data base,

c. the relevant subset of data w i l l vary during the problem-solving process,

d. some data (e.g, census and event data such as police ca l ls ) may not be

compatible at the detai l level of the data captured in the source f i l e s .

Extraction is a process by which an integrated subset of data is developed from the

source f i l e s relevant to a par t icu lar problem-solving appl icat ion. Extraction thus

provides the user with a capabi l i ty e f f ec t i ve l y indist inguishable from a f u l l y

integrated data base, without requir ing the development of such an integrated data

base at the deta i l level of the source f i l e s , i . e . i t provides a " v i r t u a l l y "

integrated data base.

The extract ion approach builds a data base subset from the source f i l e s according to

a pr io r i speci f icat ions for a par t icu lar appl icat ion. Total integrat ion of the

source f i l e s , and dynamic aggregation and subsetting of the data at the time the

159

data

a t t rac t ive in today's environment because:

items are required is of course an a l ternat ive approach. This approach is not

a. for any appl icat ion a l l the relevant source f i l e s would have to be on- l ine to

support conversational interaction,

b. protection of the source f i les would be more d i f f i cu l t ,

c. development of conversational information systems would require additional

standardized data structures and codes for the dynamic aggregation and

subsetting,

d. better conversational performance is possible when the problem solving accesses

a smaller data base.

Clearly the development of a fully-integrated, on-line data base from the source

f i les , solely for problem-solving applications, is not (currently) economical. Such

an approach would also require special procedures for keeping the duplicate records

current and consistent. With the extraction approach, the subset of data thought to

be relevant to the particular problem is developed and made accessable to the

problem-solving system in an extracted data base. The subset is an extract from the

available source f i les at the level of detail desired by the decision maker for

(that phase of) problem solving. This extracted data base may be thought of as a

set of tables. Each table contains values for a set of variables extracted from the

source f i les . For each variable there is one value in the table for each basic unit

(e.g. zone, account, employee) used for the problem solving. New variables can be

added direct ly to the extracted data base as an added column of the tables. An

example of an extracted data base is shown in Figure 8, for use in crime analysis.

The extracted data tables are formed from: source f i les containing lO years' data

on crimes, l a n d use, and population; a special purpose map of police

beat-building-blocks (basic zones); and an extraction specification for computing 20

crime categories and selecting population and number of houses by year. The result

is lO tables (one for each year) giving crime by category, population, and number of

houses for each basic zone.

The extraction approach leaves control of the operational source f i les in the hands

of the originating application. The extracted data bases are "snapshots" which are

current at the time of their development. The problem solver can re-invoke the

extraction process at any time to get a more current extracted data base. This

process decouples the data base used in problem solving from the operational f i l es ,

and assures the problem solver that the data base upon which he makes decisions is

under his control. This user control of the extracted data base, and the potential

performance advantages offered by access to the smaller extracted data set as

compared to access to the total set of data, make the extraction approach attractive

even in installations where an integrated data base exists (as with a complete IMIS

as in the USAC approach described in Section I I ) . Extraction is simplified with the

160

existence of an integrated data base, because there are then no d i f f i c u l t i e s with

f i l e formats and data conversion.

B. Extraction System Architecture

The architecture of a municipal information system designed using the data

extract ion philosophy would have three major sets of programs and data bases (e.g.

Figure 9). The f i r s t set would be the source data f i l e s and related programs for

data entry, update, and other routine processing. These f i l e s should be "properly

structured" as defined ea r l i e r in th is paper. The data base management for these

f i l e s may be an integrated system, such as IBM's IMS, or a more t rad i t iona l system

such as those provided by IBM's DOS. The second component includes accurate

reference f i l e s ( indices), such as the Geographic Base F i le , Programs for

maintaining these f i l e s , and programs for providing the data extract ion functions of

data matching, subsetting, and aggregation. This component is the key to

integrat ing the data base of source f i l e s . The GADS experience indicates that i t is

possible to develop general purpose programs for the data extract ion functions.

Essent ia l ly , these functions provide integrat ion through user-invoked processing,

rather than through the complicated data structures and accompanying processing

overhead often found in integrated data base systems. The data extract ion programs

are the interface between the municipal data base and the th i rd component of the

archi tecture, the extracted data bases and associated decision support system. The

GADS analysis and display functions are an example of a dec%ion support system for

non-programmer users. A data extract ion interface can provide mult ip le extracted

data bases for a single decision support system, or for mult ip le decision support

systems. For example there might be decision support systems for cash management,

budget preparation, urban planning, computer-assisted appraisal, crime analysis,

e tc . , a l l supported by a common extract ion interface. The data management

techniques for the extracted data bases should be ta i lo red for each decision support

system. However, the data access techniques may be the same as those provided for

the source data f i l e s .

The deta i ls of the data extract ion architecture and the implementation requirements

are beyond the scope of th is paper and there w i l l be i ns ta l l a t i on -spec i f i c comments.

(An extract ion implementation is b r i e f l y described in the Appendix). There i s ,

however, one general requirement for any data extract ion system. This requirement

pertains to the data aggregation functions of extract ion and can be described by

considering examples of the data sources encountered in municipal governments and

the kinds of extracted data to be developed from these sources. Consideration here

is l imi ted to data which can be related to points or areas. Data related to

networks, budget items, part numbers, e tc . , should be handled in an analogous

fashion.

t61

I , Compatible data

This is the easiest, and for tunate ly the most frequent s i tua t ion , i f data are

captured as specif ied in Section I I I . The data in source f i l e s which can be

ident i f ied with geographic points (x,y) can be d i rec t l y related. I f the extracted

data base is to be relevant to a study of slum dwell ings, fo r example, and i f health

cases, f i r e alarms and building code v io lat ions are a l l data sources which are

avai lable at the event leve l , ( i . e . by address) then an extracted data table showing

incidence of each of these events for specif ied address can be d i rec t l y developed,

Another frequently used extracted data base is the tabulation of such event data by

geographical area, in terms of a specif ied map. (The extracted data base in Figure

8 is an example of th i s ) . Extracted data bases in such cases are obtained by

matching coordinates of events to the corresponding map areas (via point-in-polygon

processing of the event coordinates against the map boundaries spec i f icat ion) .

Figure lO i l l u s t ra tes the extract ion and aggregation to relate property (assessment)

data and census data to support inqui r ies at compatible levels (e.g,) blocks or

block groups), and fur ther aggregation to support t rans i t planning models.

2. Non-compatible area data

I f data is avai lable by areas in the source f i l e s , and these areas are not

compatible ( i . e . one map is not a subset of the other), then the extract ion process

is more complicated. For a chosen set of variables from the source f i l e s , there is

a minimum level of aggregation at which an extracted data base is possible. For

example, school attendance areas and police beats (and therefore the associated

data) may only be compatible at the census t ract leve l , i , e . they may both be

(d i f fe ren t ) f iner par t i t ions of census t racts . The extract ion process should a le r t

the user to the non-compatibi l i ty and display for the user the minimum level of

aggregation necessary for compat ib i l i ty of the data sources of in teres t , in the form

of a map, and permit the user to specify fur ther aggregation from this map as

desired.

I f the user desires an extracted data base at a deta i l level f i ne r than is

compatible with the data sources given, the user must supply addit ional information.

For example, suppose the user is studying property values vs age d is t r ibu t ion of

inhabitants, with the age data on c i t izens avai lable from the census only at census

t ract levels of aggregation. Compatability exists at the census t rac t level . Any

f iner detai led extracted data base, at the c i t y block level for example, could only

be developed i f the user is w i l l i ng to make assumptions (such as homogenity of the

d is t r ibu t ion of population ages in the census t rac t ) .

V. SUMMARY AND CONCLUSIONS

The development of information for decision-making in munic ipal i t ies requires

integrat ion of data from the various operational f i l e s which are generated in local

162

government. Even when an integrated municipal data base does not ex is t , i t is

possible to develop integrated data from properly structured source f i l e s in

conjunction with a well-maintained Geographic Base F i le . The current sources of

information developed in munic ipa l i t ies , in par t icu lar the property data of the tax

assessor function and the operating f i l e s of various service del ivery functions,

provide a r ich source of information, augmented by special col lect ions such as the

U.S. Census.

Data Extraction is the process of developing integrated data subsets from diverse

source f i l e s to support in teract ive problem solving. Extraction provides the

interface to large data bases of source f i l e s and provides data descr ipt ion,

subsetting and aggregation functions. Our experience with GADS has shown that data

extract ion is useful when the user or problem character is t ics require access to

varying amounts, de ta i l , and selection of data, and conversational (rapid response)

interact ion with a decision support system. These character ist ics are l i k e l y to be

encountered when designing decision support systems for nonprogrammer, professional

users working on unstructured problems. The data extract ion interface matches the

functional and response time requirements of in teract ive decision support, can be

implemented on a var ie ty of computer system configurat ions, and can reduce the

operating costs of the decision support system.

Because data extract ion operations can produce mult ip le extracted data bases, with

d i f fe ren t structures~ a single data extract ion interface can support mul t ip le

decision support systems. In addit ion, ex ist ing decision support systems can be

supported and enhanced by data extract ion without major program revis ions.

163

APPENDIX: An Extraction System Implementation

A project in the IBM Research Division has developed an in teract ive Geo-data

Analysis and Display System (GADS as a vehicle for studying in teract ive problem

solving [ I - 4 ] . GADS supports nonprogrammer users solving unstructured problems

where the relevant data can be related to a geographic location. Examples of the

problems for which GADS has been used include: land use planning, police manpower

a l locat ion, school d i s t r i c t i n g , and commercial s i te locat ion. The need for data

extract ion was recognized during the f i r s t studies of the use of GADS. In

par t i cu la r , the need for data aggregated to a var ie ty of geographic levels (e.g. ,

block, police beat, census t rac t , neighborhood), and changing data needs expressed

by users indicated the inadequacies of the s ta t i c , special purpose data base and the

one-level, integrated data base approaches.

GADS data extract ion is configured essent ia l ly as shown in Figure 9. The extract ion

implemented in GADS is l imi ted to compatible event data. There is a requirement

that each record of each f i l e in the large data base contain a geographic code (such

as an address, x,y coordinates or block number) so that extracted data can be

related to points, l ines, and polygons on a map. A u t i l i t y program is provided to

transform geographic codes into x,y coordinates i f necessary for data extract ion or

display. A data base developed by extract ion is a table; an example is shown on

Figure 8. Adding another crime type, acres of commercial land use, or

re-aggregating by census t racts would take only a few minutes.

In the GADS implementation the large data base management system is a special

purpose one designed to handle f ixed format f i l e s with no hierarchies or repeating

groups. Simultaneous access to mult ip le f i l e s , and shared access to single f i l e s

are not supported. Mult iple f i l e extractions are handled by consecutive extractions

from the individual f i l e s . Sequential and d i rect access I /0 are provided.

Character, f ixed binary, packed decimal, and f loa t ing point (binary) data

representations can be used. The ent i re large data base is stored on disk, and

there is a u t i l i t y for loading f i l e s from tapes.

Figure I I gives examples of the data description and subsetting capab i l i t i es . The

data description implementation allows d i f fe rent formats to be used for the same

f i l e or the same formats to be used for d i f fe ren t f i l e s (Figure l l a ) . Seven data

types are allowed. The subsetting language includes constructs for : subsetting

based on any ari thmetic or logical combination of the items in a f i l e , creating of

new items, conditional subsetting or creation (IF, THEN, ELSE), and function ca l ls

(Figure l l b . ) . Results from subsetting can be displayed as l i s t s (Figure l l c ) or as

locations on a map (Figure l l d ) . Using the display capab i l i t i es , two dimensional

subsetting is possible. That is , the user can draw a polygon on the screen, and

select only those elements of a f i l e whose location is within that polygon. This

f a c i l i t y is much more user-oriented than algebraic specif icat ions for subsetting,

164

and other graphic subsetting operators would be useful (e.g., display a l l the crimes

of the same type as the one being pointed a t ) .

The aggregation operations in the implementation are rest r ic ted to forming the

extracted data base for the GADS analysis and display functions. This data base is

aggregated by areas of a map. I t is stored on disk, and is accessed by column name.

GADS is implemented in FORTRAN, but the data extract ion components were implemented

in PL/I because of i t s larger set of data types, and better functional s u i t a b i l i t y

for the extract ion tasks. The combination system runs on the IBM S/360 or S/370

series under the Time Sharing Option (TSO). The combination requires 220K bytes of

main storage. Separating extract ion reduces the main storage requirement to about

12OK. The user terminals may be IBM 2250s or storage tube display terminals. The

I/O time from the large data base, and the data rate to the terminal are the

l im i t i ng facotrs in extract ion response times ( i .e . selection and aggregation times

are negl ig ib le compared to I/O t imes). Although f i ve minutes may be required to

l i s t or display an ent i re f i l e , the user can see the results unfolding (e.g. the

selected items are l i s ted as they are selected). Thus users seem w i l l i ng to wait

during extract ion. Af ter a l l , the batch mode equivalent capab i l i t ies have response

times of days, and manual methods have response times of weeks or months.

165

REFERENCES

[ I ] P.E. Mantey, J. L. Bennett, E. D. Carlson, Information for Problem

Solving: The Development of an Interactive Geographic Information System.

IEEE Int. Conf. on Communication, Vol. I I . Seattle, Wash. June 1973.

[ 2] E. Jo Crist iani, R. J. Evey, R. E. Goldman, P. E. Mantey. "An Interactive

System for Aiding Evaluation of Local Government Policies," IEEE Transactions

on Systems, Man & Cybernetics, Vol. SMC-3, No. 2, March 1973, pp. 141-146.

[ 3] E.D. Carlson, J. L. Bennett, G. M. Giddings and P. E. Mantey. "The Design and Evaluation of an Interactive Geo-data Analysis and Display System,"

Proceedings of the IFIP Congress 74, International Federation for Informa-

tion Processing, Stockholm, August 1974. North Holland Publishing Company, Amsterdam, 1974.

[ 4] E.D. Carlson and J. A. Sutton, A Case Study of Non-programmer Interactive

Problem-Solving, IBM Research Report, RJ 1382, IBM Research Laboratory, San Jose, California, April 1974.

[ 5] T.R. Hammer, R. E. Coughlin, E. T. Horn IV, "The Effect of a Large Urban

Park on Real Estate Values," Journal of the American Insti tute of Planners, Vol. 40, No. 4, July 1974, pp. 274-277.

[ 6] "City Hall's Approaching Revolution in Service Delivery," Nation's Cities, January 1972.

[ 7] Concepts of an Urban Management Information System," a Report to the City

of New Haven, Connecticut, by Advanced Systems Development Division, IBM Corporation, Yorktown, January 1967.

[ 8] A Municipal Information and Decision System. University of Southern Cali- fornia, School of Public Administration, 1968.

[ 9] R.L. Stickrod and L. C. Martin. Data Processing: Analysis of Costs,

Benefits, and Resource Allocations. Lane County, Oregon, Management Report, February, 1973.

[ I0] G.M. Giddings and E. D. Carlson, An Interactive System for Creating,

Editing and Displaying a Geographic Base File. IBM Research Report, IBM Research Laboratory, San Jose, California, 1973.

[ I I ] D.C. Symons, A Parcel Geocoding System for Urban and Rural Information, Ottawa, Ontario, National Capital Commission, 1970.

[12] U.S. Bureau of the Census, Census Use Study, The DIME Geocoding System Report No. 4, Washington D.C., 1970.

166

[13]

[14]

[15]

[16]

[17]

[18]

[19]

U. S. Bureau of the Census: Census Use Study, The DIME Edjt in ~ S~stem

Washington D.C.~ 1970.

R. Ju l i , Geo-modeling: A Local Approach, Eugene, Oregon, Lane Council of

Governments, 1972.

R. D. Hogan, Remote #raphic Terminal and Urban Geographic Information

System Demonstation, Gaithersburg, Maryland, IBM Federal Systems Center,

1968.

R. D. Merr i l l , "Representation of Contours and Regions for Ef f ic ient

Computer Search," Communications of the ACM, Vol. 16, No. 2, February 1973,

pp. 69-82.

B. V. Saderholm, "Paper 'Keyboard' Runs Experimental IBM System," IBM

Research Division Press Release, Yorktown Heights, N. Y., March 8, 1973.

R, M. Cyert, H. A. Simon, and D. B. Throw. Observation of a Business

Decision. Journal of Business, 29, (1956), 237-248.

D. M. S. Peace and R. S. Easterby. The Evaluation of User Interaction with Computer-based Management Information Systems. Human Factors, 15, April

1973, pp. 163-177.

__~

AR.CF I

__A r)F)RE 9, °,

NUMBE R.

YEAR

__

~

CONST

SIZ

E

FLO

OR

.~_.S ~. P n

AR

EA

VA

LU

E

123-06-174

I00 ABBYWOOD

CT.

196.2

7145

12~--06--175

!02

A BBvwooD

CT:

!9~2

6~65

123-06-176

104 ABBYWOOD

CT.

1962

6500

123-06-!./-7

106 ASBV~Q~D

CT

: 1961

7!25

123--06-150

112 AB!NATE

LN

1960

7200

[P~-OA-! ~I

!!4 AS!~!^TE

t~-

!e60

71R~

123-06-1152

116

AS]NATE

LN.

o 0

I. 6~

6995

__123--06-!53

117

A.S

!NA

T~

iN:

--!q60

6q55

!850 -

31000

123-06-154

120 ABINATE

LN

. 1860

7095

1900

32000

!_~--06--!58---]q?

AB!NATE

L,~! ....

!<)6!

7!55

!750

-29000

123-06-160

III ~BINATE

[N.

1960

7005

1975

34000

~3-06

-]A1

11

3

~8

I~:A

TE

iN

, !qAO

6@95

!650

27

00

0

123-06-162

115 ABINATE

LN.

1960

7100

1750

30000

]P~--06--!63

!IY

~N

^,T

E

t~:

!060

7055

1780

30000

123-06-184

31

AFTON

CT.

1960

7020

1650

29000

1?~-oA-I~

59 ~TON

CT-

!960

7!65

1600

_27n(~0

123--06--186

35

A~Tf]N CT.

1961

7190

1775

29000

1675

2 8000

1650

29000

1600

27000

1650

25~

1800

35000

I ~

Nr~

~7

NN

N

1700

30000

Figu

re 1

. Ta

bula

r di

spla

y of

ass

essm

ent

data

OU

PS 1

00

. O.

~UIO

SC

~L(

R[D

R~M

RE

TUR

N

i .

i'.z

'"

if!: 11

1~11

Zll

iJl.

UA

RN

AM

[ U

NIT

U

S[M

AP

Z

ON

E

~P

[ S

;N

-1

g

OG

R

19

Z

OH

ES

YM

D

co

Fig

ure

2.

His

togr

am d

ispl

ay o

f ho

usin

g va

lues

ZO

NE

SY

MrB

l

SU

BM

AP

OU

[RL

AY

M

AP

6

ZO

NE

-UA

LU

ES

S

TA

TE

ME

NT

S

SS

MO

DE

EX

PA

ND

S

HR

INK

N

OR

MA

L

CL

EA

R

RE

TU

RN

ZON

£S

? FIND

ZO

NE

cO

Fig

ure

3a.

Map

dis

play

of

rela

tive

hous

ing

valu

es

MA

PS

+ ._

*÷.,

+

"+~'

F ~

l_-"

~_

"~

I

* ÷+.

~*+... ÷

*

%

- _

ZO

NE

-UA

LU

ES

S

TA

TE

ME

NT

S

~O

NE

SY

MB

I

SU

BM

AP

E

XP

AN

D

SH

RIN

K

NO

RM

AL

OV

ER

L#

Y

M#P

e

g

SSM

OD

E

CL

EA

R

RE

TU

RN

ZO

NE

1 t ? F

IND

Z

ON

E

C)

Figu

re 3

b.

Sim

plifi

ed d

ispl

ay o

f ho

usin

g va

lues

171

AVE $ IN

ACCESS $

ACCESS

AVLAND-S AVLAND-M AVLAND-C AVLAND-I

GROWl NXS GROH NXM

ODWU/A-S ODWU/A-M ODNU/A-T

EMP-MFG EHP-WHOL EMP-COMH EMP-TC&U EMP-GOVT EHP-TOTL

EHPDEN-C

HHOLD0-6 HHLD6-10 HHLD1015 HHLDZ5+

ISHOPCTR IBAY

OCDWUN-S OCDWUN-M OCDHUN-T

OCLAND-S OCLAND-M OCLAND-C OCLAND-I

PRODEN-S PRODEN-M

POPUL-S POPUL-M POPUL-T

POP/HH-S POP/HH-M POP/HH-T

RES-LAND

SLOPE

TSKTR

DISP SIN

S EI'IER FLOOD

AVERAGE IHCOtIE PER HOUSEHOLD

ACCESSABILITY TO DISPOSABLE INCOME

ACCESSABILtTY TO EHPLOYMENT

ACRES AVAILABLE FOR SINGLE FAMILY DEVELOPMENT ACRES AVAILABLE FOR MULTIPLE FAMILY DEVELOPMENT ACRES AVAILABLE FOR COMMERCIAL DEVELOPMENT ACRES AVAILABLE FOR I~DUSTRIAL DEVELOPMENT

GROWTH FACTOR IH SINGLE FAMILY GROWTH FACTOR IH MULTIPLE FAMILY

NO SINGLE FAMILY DWELLIt!G UNITS PER RESIDENTIAL NO MULTIPLE FAMILY DWELLING UNITS PER RESIDENTI TOTAL NO DUELLING UNITS PER ACRE OF RESIDENTIAL

NO. OF Et, IPLOYEES HORKING IN MANUFACTURING NO. OF EHPLOYEES WORKING IN WHOLESALE AND TRUCK NO. OF EHPLOYEES WORKING IN COMMERCIAL(RETAIL) NO. OF EMPLOYEES WORKING IN TRANS, COMMUN,AND U NO. OF EMPLOYEES WORKING IN GOVERNMENT TOTAL NO OF EtIPLOYEES

NO. COHHERCIAL EMPLOYEES PER ACRE OF COMMERCIAL

NO. OF HOUSEHOLDS WfTH IHCOME: 0- 6000 NO. OF HOUSEHOLDS HITH INCOME: 6000 -10000 NO. OF HOUSEHOLDS ~qlTH INCOME: lO000-15000 NO. OF HOUSEHOLDS I'IITH INCOME:IS000+

ZONES OF ANTICIPATED SHOPPING CENTERS ZONES TREATED AS BAYLANDS

NO. OF EXISTING SINGLE FAMILY DWELLING UNITS NO. OF MULTIPLE FAMILY DWELLING UNITS TOTAL NO. OF SINGLE AND MULTIPLE FAMILY DWELLIN

ACRES OCCUPIED BY SINGLE FAMILY DWELLINGS ACRES OCCUPIED BY MULTIPLE FAMILY DWELLINGS ACRES OCCUPIED BY COMMERCIAL DEVELOPMENT ACRES OCCUPIED BY INDUSTRIAL DEVELOPMENT

PROJECTED DEI;SITY FOR SINGLE FAMILY DEVELOPMENT PROJECTED DEHSITY FOR MULTIPLE-FAMILY DEVELOPME

TOTAL POPULATION IN SINGLE FAMILY D#IELLINGS TOTAL POPULATION IN MUt_TIPLE FAHtLY DWELLINGS TOTAL POPULATION

POPULATION PER HOUSEHOLD FOR SINGLE FAMILY DWEL POPULATION PER HOUSEHOLD FOR MULTIPLE FAMILY DW POPULATION PER HOUSEHOLD FOR ALL DWELLINGS

RESERVED LAND-NOT AVAILABLE FOR CURRENT DEVELOP

PERCENT OF LAND t i lTH LESS TH#N 10~ SLOPE

CHANGE FACTOR IH INTRAZONAL TRAVEL TIME

DISPOSABLE I I;COHE PER ItOUSEHOLD

SEWER SERVICE DISTRICT FLOOD CONTROL AREA

Figure 4. Land use/transportation planning data base [2]

App

'l. D

ate

App

rais

er N

o.

Site

Use

Cod

e

Zone

Tota

l Pr

op.

Val

.

Land

Val

ue

AB04

t

~ ~

l i

I

Imp. Va,

ue

AB06

1 i

i i

i i

New

Lot

Val

ue

AE02

[

I I

I I

I ~

: =

:I_

lee'

d, D

ate

A~_

_~T

I

l ~

I I

F4-1

- 4

Sal

es P

rice

Con

firm

ed

N 1 y

2

TEM

POR

ARY

VA

LUE

Bo~rd

Ac,~o

. ~

I ~

;;

Oth

er

I AF

03 |

N

~-

IMM

EDIA

TE A

REA

Mkt

, D

eman

d J

AGO

1 I

P i

A 2

G 3

Trans

.Tren

d ~

._ D

~ S

2 G

~

Res

. A

rea

~--

5[

"-'~

Y-

2

Sin

gle

Faro

. I

AG06

]

N'

y2

TOTA

L PR

OPE

RTY

Arc

h. A

ttr.

AH01

P

~ A

~ G

3

Beac

h F

Prop

. hn

prov

ed

AH0,

21

N'

Y;~

--Prop

. Lot U

tit,

l AH0

3 l Nr

'y~

Con

dom

iniu

m

N ~

y2

AO

~2E

~ V

entu

rm O

oun1

¥ A

ilII

I~¢

RE

SID

EN

TIA

L FA

CT

SH

EE

T A

PN

(A

A01

)

SHEE

T ._

__

_

OF ___

DIS

[RIC

T (A

C16

)_

NEI

GH

BOR

HO

OD

(A

C17

)

TRAC

T (A

C0

6)_

_

LOT

----

E

OC

K_

_

SITU

S (A

A10

) --

REC

OR

D D

ATA

LA

ND

ATTRILBUTES

__

~L

BUIL

DIN

G--D

ATA

~.---~

t.TAc

tua,}

_~

~

IREL

. I~

j__

N

• ~c

o,,~

so

Ft ,

0saa

.,e)

A,

~~

I~

H__

A~,,

i ~"

T~

0ie.,

A,0.

L "

~~

~.

0.

I-

I-

,+,,,

,r.

egat

a.

~'_

. ~

.0

~

.ooT

.ruS.

A,

,O I

.'~

~til.o

oms

__1~

.]~

$~ES

OATA

St

-Fron

tage

A,.

I "'Y

= ~,

.oom

a IA

C='i

i i

Alle

y Al

l3 I

"7~

co

nd,,

,o~

j~9

/ P,

A' G

3 Ut

,, O~

G A,,

.___ t

.' Y'

~o

r.=ns

hi~

t ~'°

/ P'

A' ~

C

urbs

& G

utte

rs

AI1

5 I

N t

Y~

Stg.

Spa

ce

I A J

ll

/ P~

A2

G3

__SH

EETS

CO

ST D

ATA

Qua

lity

Cla

ss

AC

-~

--

~

Z I

I $

Tota

l Li

ving

Are

a AC

O5 I

i

i i

--

1st Flr. A

rea

AK01 i

} i-

-i--

---

2ndF

lr, A

rea

AZ

02i

i i

!

2~d E

lf. E

~tor

~o

3 ! T

T--

,-

t1~

. 3r

d Fl

r. A

rea

AK

04

3~d

Fir,

Fac

tor

AK05

I

i i

i I"

I' l

Bsm

t. A

rea

AK__

~ !

! ,,,,

,i ---

Bsm

t. Fa

ctor

A

K_~

~

__

A

ddn.

Are

a ~

08

'

' '

Addn

. Fa

ctor

~

~-

'-

-

N I

yZ

Hea

ting

(Duc

ted)

N I

y2

Car

port

Area

AK

11 I

Car~

Fac

tor

~,~t

i~

i

Pch. A~'

ea

A~1

3

Pch.

Fac

tor

AK14

Sid

ewal

ks

~i1

6

St.

Ligh

ts

A117

N

~ y

2

Hor

ses

" i

Com

mon

Gre

en

At 1

8 t

N 1

y

Com

mon

Rec

. A

I19

I N

1 y

2

H.

& B.

Use

A

I20

I N

1 y

2

Sew

er

AI21

I

N 1

y2

Vie

w

AI2

~

N'Y

2

Vie

w O

ual.

AI2

3 I

P ~ A 2

G 3

Traf

fic F

low

A

t24

H ~

A 3

L 3

Wat

er F

ront

A

25

N 1

y2

Beac

h Fr

ont

~i2

6

1 N

1 y

2 I

Doc

king

Rig

hts

AI2

7 I

N 1

y2

A,!~

....

N' Y~

FIEL

D U

SE o

NLY

- D

O N

OT E

NC

OD

E

Base

,ot

A!~

I L

I I

Coo

ling

(Duc

ted)

Gar

.

Gar

, C

ony,

Cav

ort

Pat

io

Dec

king

Pool

Fenc

e

Gue

st H

ouse

Fire

plac

e

Bui

lt~In

s

Stru

. Fa

ilure

Ti le

Roo

f

A J1

2

A J1

3

A Jt

4

A~m

A J17

A

J18

~20

A J2

1

A J2

2

~25

AJ1

3 N

1 y2

N ~ y2

N 1 y;

N I y2

N ~ y2

N I

y2

N 1

y2

N 1

y~

N';

Y~

N I

y2

N'~

-.r,2

L'~

M2

H3

N t

y:Z

Fire

plac

e C

ost

.~

AK

I~

~st-

- J.A

Kml

! i

i I

Pat

io A

rea

IAK

17

Pat

io F

acto

r IA

K18

Pool

Are

a IA

K!9

Po

ol E

xtra

s ]A

K20

Mi s

c, C

ost

IAK

21

Mis

c, S

troct

. C

ost

I A

K22

TOPO

GR

APH

Y ii

Gr.do

IA

L0i

I .!

~ ~3

/ Ba

nk

AL0_

~ N'Y

___L

J IA

.B3

I N'

Y'

A N;

"Y ~ "

IL°'l

.

Slop

e

Oth

er

t~

Figu

re 5

. Fi

eld

wor

k sh

eet

for

appr

aisa

l da

ta

County of Santa Clara

173

Office of the County Assessor 201 County Admirfistrat~on Building

70 West Heddin9 Street San Jose, California 95114 299-3941 Area Code 408

California

2.

3.

4.

5.

Date Recorded

Recorder's Deed #

Property Description #

Our records indicate you purchased this property.~

What was the full price?

a. Amount of cash down payment:

b. Please enter details concerning any balance:

(I) Ist Deed of Trust $ at Duration of Loanyears

(2) 2nd Deed of Trust $ at Duration of Loan years

c. Was a trade involved? Yes No

d. Outstanding Improvement Bonds against property $

e. Did price include personal property? Yes No

If yes, please estimate value of such property $

f. If this is income property, please enter the monthly gross income

as if it were 100% occupied $

Remarks: Please enter on the reverse side or by attachment any infor- mation you feel may help us to make a fair appraisal of your property.

If you would like mail concerning this property sent to a different address from the one above, please indicate below.*

If there are questions regarding this questionnaire please contact the Assessment Standards Division at 299-3941.

% interest

% interest

See Reverse Side.

Signature of Owner

Telephone Number

Date

(~, 7047 REV, 11/71 .~

Address

City, State, Zip

F i g u r e 6. Assessor 's q u e s t i o n n a i r e f o r f i n a n c i a l da ta

Sour

ce M

ap

1 ®Bl

ock

1 21

~

Blo

ck 4

17

2O

O

Blo

ck 8

16

19

< -&

Blo

ck 2

1st S

treet

Blo

ck 5

2nd

Stre

et

7~

B

lock

3

6 D

B,o

ok7

13

11\

12

Blo

ck 9

1

2B1 lO

ck 1

0

Num

ber o

f Seg

men

ts

Map

le A

v.

5 El

m D

r, 3

Coy

ote

Cr,

6 1s

t St.

4 2n

d St

. 4

= 22

seg

men

ts (re

cord

s)

Con

tent

s of a

Sam

ple R

ecor

d C

onte

nts

Map

le A

v. N

.W.

2 15

3000

0 31

000

18

1530

100

3010

0 20

5 20

5 42

42

5 10

0 19

8 10

1 19

9

Lege

nd

Nam

e T

ype

S

u{f

ix

Fro

m n

od

e m

~m

ber,

x,

y, m

ap

nu

mb

er

To

no

de

nu

mb

er,

x,

y, m

ap

nu

mb

er

Le

ft t

ract

, ri

gh

t tr

act

, le

ft p

lace

, ri

gh

t pl

ace,

{ef

t b

lock

, ri

gh

t b~

ock

Le

ft L

ow a

ddre

ss,

hig

h a

ddre

ss

Rig

ht

low

add

ress

, h

igh

add

ress

Map

6

Cen

sus T

ract

205

Pl

ace 4

2

:3

4~

Figu

re 7

. 'D

IME

' ge

ogra

phic

bas

e fi

le s

truc

ture

~[

Yea

r 10

I Y

ear

2

Yea

r 1

Zon

e 1 2 3 N

i C

rimes

C

rimes

...

I

#1

#2

I

Hou

ses.

..

Pop

ulat

ion

3 15

50

14

6 17

6

53

203

4 4

41

192

26

9 15

47

--4

Figu

re 8

. E

xam

ple

of t

able

s in

ext

ract

ed d

ata

base

176

r l Data I Extraction I Functions/ Presentation I Aggregation t

I l Data Subsetting I Description I

Large Data Base Management

h I [°'''Ex"iracted / I 1 Data Base i:~X4 I Management

J i\\

F ' I ; ; ©-.-© Large Data Base

Figure 9. Interactive data extraction and problem solving system

177

Census- Compatible Property Data

T Property I nfo. System

Transportation Zones Data Base

Dynamically- Aggregated Property and Census Data (Compatible)

II

Added Data (Public Lands, etc.)

Figure 10. Extraction and aggregation relating property and census data

Z U

m--W

X.J

Wm.-$

Ol.-

b~

.Z

W

W2

>

..JW

JO

::) I--

0 Z

H

W Z

~t W

U

Ul

Z Z

Z O

O

O

~

-i H

~

I"-- I---

1"- ~

t ~

U

H

U

H

lUL

¢~

W

ell gL

1E

Z "~

O

Y

0

H

Z~

gL

E

Z 0 Z L~

2>

U~

I-- Z W

2> w

_J

gl

p..,~ H

Z

o'1

~9

Z W

Z H

O

Z ~J

o E

2~

E

o~

E

o x LU

o') LL

[RP

S

2

~'14

TCH

CH

1

3

1 rr

No

¢H

S

fL

AN

K1

CH

t

Sr~

rN~R

[ C

H

16

rTT

YP

E

CH

2

L~T

Y

CH

10

2"

IP

cz

s rL

RN

K2

C

H

1 C

Z 7

rt.~

NK

3

CH

2

TC

! 6

m

m

m

m

m

CR

[RT

[ A

ND

[D

ZT

D

AT

A

D[|

CR

ZP

TZ

ON

|

ZB

fl

[HP

LO

YE

E

FZ

L[

~Z

TH

X

,Y

HA

TC

H

KE

Y

rRO

fl

UN

XflR

TC

H

ST

R[[

T

HU

R|[

R

ST

RE

[T

NR

H[

9T

RE

[T

TY

P[

NR

fl£

CO

D(

CO

OR

DZ

NA

CO

OR

DZ

NR

T[

SC

RR

TC

HP

RD

~

0

]LIP

UT

R

OD

E

GE

T

PR

G[

6 R

[ST

OR

[ R

(PL

RC

[ R

OD

[ sO

N

UZ

(~

TIT

L[~

~'"

C

H[C

K

SR

P

AG

E

RE

TU

RN

R

(DR

RU

...

...

[RA

S[

DE

LE

TE

L

INE

IN

SE

RT

B

LA

NK

L

ZN

[ C

OP

Y

LIN

E

DO

WN

P

RZ

NT

Fig

ure

11a

. D

ata

de

scri

pti

on

fu

nct

ion

CR

ER

A

ND

[D

IT

S[L

.[¢

T][

ON

S

L,P

X<

nt

OR

X

~2

ttia

il

TH

EN

X

~t

;

Y(=

m

OR

Y

)lll

iJJ

88

0

TN

[N

yn

J;

~L

EC

T

MN

I[R

[ Z

IP

~ 9

51

25

SC

RA

TC

HP

AD

INP

UT

H

OD

E

=ON

G

ET

PA

GE

1

5

RE

ST

OR

E

RE

PL

AC

E

MO

DE

V

IEW

T

ITL

E~

"-'-

- C

HE

CK

S

AV

E

PA

GE

1

5

RE

TU

RN

R

ED

RA

M

....

....

E

RA

S[

DE

LE

TE

L

%N

E

INS

ER

T

BL

AN

K

LIN

E

CO

PY

L

IN£

D

ON

N

PR

INT

Figu

re 1

1b.

Dat

a se

lect

ion

func

tion

NT

I81

8~

2

NT

~I@

EB

4

MT

8~

18

~4

fl

T0

11

80

4

MT

me

eS

g4

h

Tll

eli

4

MT

~ll

il4

HT

00

00

04

MT808084

MT8580B4

MT008082

MT080804

MT008802

MT080004

MT

@08084

MT~08004

MT000084

MT

g0

02

84

MT~08~84

MT000804

MTeeeeQ4

MT8808~4

MT~gO~4

MT000004

GR

AD

DR

ES

S

MT8%OB~4

MT

~0

00

04

M

Tg

g~

ge

4

HT

80

08

~4

MTee0004

MTeeeee¢

1818

8 0

18

45

1

16

08

8

16

31

I1

19

1

81

14

8

01

48

7

e 8

15

39

8

01

49

9

8 ~

1~

54

01

93

0

01

74

2

81

34

8

a1

55

1

01

48

2

81

62

3

01

5~

8

81

55

0

81

06

0

01

8B

8

81

82

4

81

47

9

~1

51

4

01

50

2

02

24

3

~1

65

6

0 e

~9

4~

~

o9

18

e~870

ee

91

%

81

67

4

E1

43

6

LIS

TIN

G:

FIL

E

= £M

P

US

ING

P

lP2

: N

O

GL

EN

EY

RI[

G

N[Y

RI[

G

NF

IEL

D

GL

EN

FI£

LD

G

LEN

N

GLE

NN

G

LE

NP

INE

GL

EN

PIN

[

GL

EN

PIN

E

GLE

NU

NR

G

L[N

UN

A

GLE

NU

NA

G

LEN

WO

OD

G

RA

CE

G

RA

CE

G

RA

CE

G

RA

CE

G

UA

DA

LA

JAR

A

HA

MIL

TO

N

HA

RM

IL

HA

ZE

LWO

OD

H

ER

VE

Y

HE

RU

EY

H

ICK

S

HIC

KS

H

ILL

SD

~L

E

HU

MM

ING

BIR

D

HU

MM

ING

BIR

D

HU

MM

ING

BIR

D

HU

MM

ING

BIR

D

HU

ST

ED

H

US

TE

D SE

LE

CT

ION

Fig

ure

1 lc

. Li

stin

g of

dat

a se

lect

ed

SA

N3

0$

E

S5

12

5

15

91

72

3

29

|3|4

A

U

SA

NJO

8

51

28

1

59

88

39

2

97

9E

1

DR

9

AN

~O

B[

95

12

5

15

92

58

1

28

$7

80

D

R

SA

N3

08

E

95

12

5

15

92

37

?

28

58

47

A

U

8A

N3

0S

E

95

12

5

15

91

59

3

29

63

20

A

U

SA

NJO

SE

9

51

25

1

59

17

58

2

96

37

6

DR

S

AN

30

S[

95

12

5

e

D~

S

AN

3O

S[

95

12

5

g

DR

S

~N

30

9E

9

51

25

AU

S

AN

JOS

£

95

12

5

15

96

14

8

29

36

96

S

~N

JO

S[

85

12

5

15

96

62

8

29

28

22

A

U

SA

NJ

OS

E

95

12

5

15

96

14

#

29

36

98

9

RN

JOS

E

95

12

5

15

92

29

1

29

33

14

A

U

SA

NIO

SE

9

51

25

1

58

75

14

2

93

09

5

RU

9

AN

30

SE

9

51

25

1

58

77

19

2

94

19

2

AU

S

AN

3OS

E

95

12

5

15

87

51

4

28

30

85

A

U

SA

N3

0S

E

95

12

5

15

87

68

7

29

32

32

A

U

SA

N/O

SE

9

51

25

1

59

26

33

2

67

30

4

~Y

6

AN

IOS

E

95

12

5

15

91

10

9

29

24

71

N

Y

SA

NIO

SE

9

51

25

1

59

62

99

2

93

94

4

AU

S

AN

30

SE

9

51

25

1

59

60

56

2

92

68

1

LN

SA

N/O

SE

9

51

25

1

59

66

32

2

97

08

4

LN

S

AN

JO

SE

9

51

25

1

59

73

40

2

96

30

5

AU

S

AN

JO

SE

9

51

25

1

59

16

25

2

93

92

3

AU

S

AN

JO

SE

8

51

25

1

59

28

33

2

89

48

2

AU

5

AN

ZO

SE

9

51

25

DR

S

AN

JO

SE

9

51

25

1

58

91

86

2

89

58

7

DR

S

aN

/OS

£

95

12

5

I59

92

25

2

89

35

2

DR

9

AN

30

SE

9

51

25

1

59

92

25

2

89

35

2

DR

9

RN

JOS

E

95

12

5

15

99

22

5

28

93

52

A

U

SA

NIO

SE

9

51

25

1

59

13

35

2

85

41

2

~V

S

AN

IOS

E

95

12

5

15

94

50

1

28

76

5~

=

EM

P

RE

TU

RN

P

AG

E FO

RW

AR

D

~A

GE

5

FIL

E

= E

MP

S

EL

EC

TIO

N

= [M

P

M'C

lP:

DR

~W

EX

PI~

ND

S

NR

INK

N

OR

MR

L

SE

T

PlP

2:

DISPLAY

5~ EVENTS

SCALE

TO PIP2

CENTER

RE

TU

RN

EI~ASE

Fig

ure

11d.

Map

sho

win

g lo

catio

n of

sel

ecte

d da

ta