dissertation - university of california, san...

59
Cyberinfrastructure: An Emerging Organizational Form for Systematic Data Integration. The Case of GEON. David Ribes Dissertation Prospectus (re-write: Sept 2004) Contents: 1. Introduction 2. Introduction II Interoperability – a (socio)technical diatribe 3. Investigative Structure 4. Empirical Concentrations 5. Methods 5a. Coding, Archiving and Data Management 6. Chapters 6a. Outline 7. Workplan 7a. Tasklist 7b. Timeline 8. Works Cited Introduction There is now a driving thrust for the production of large-scale data and digital resource interoperability at the level of the state (digital government, see www.diggov.org), sciences and engineering (cyberinfrastructure Atkins 2003) and local services and government (e.g. http://www.nvc.cs.vt.edu/~dgov/ ). What policies, technical approaches, and particular organizational structures will guide these efforts remains an open and active debate. For relatively clear reasons development efforts for digital

Upload: others

Post on 13-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

Cyberinfrastructure: An Emerging Organizational Form for Systematic Data Integration. The Case of GEON.

David RibesDissertation Prospectus(re-write: Sept 2004)

Contents:

1. Introduction2. Introduction II Interoperability – a (socio)technical diatribe3. Investigative Structure4. Empirical Concentrations5. Methods

5a. Coding, Archiving and Data Management6. Chapters

6a. Outline7. Workplan

7a. Tasklist7b. Timeline

8. Works Cited

Introduction

There is now a driving thrust for the production of large-scale data and digital

resource interoperability at the level of the state (digital government, see www.diggov.org),

sciences and engineering (cyberinfrastructure Atkins 2003) and local services and

government (e.g. http://www.nvc.cs.vt.edu/~dgov/). What policies, technical approaches, and

particular organizational structures will guide these efforts remains an open and active

debate. For relatively clear reasons development efforts for digital government – the

production of online access to government services and information -- has already begun to

receive substantial consideration from the social sciences [Fountain, 2001 #254] along with

some significant financial and institutional backing for this research1: notions such as

democracy, privacy, access and governance are well entrenched in these fields. Meanwhile

the other large-scale state thrust to produce data interoperability has received relatively little

1 Digital Government Research Center (USC/Columbia - http://www.dgrc.org/); Center for Technology in Government (University at Albany - http://www.ctg.albany.edu/); National Center for Digital Government (Harvard - http://www.ksg.harvard.edu/digitalcenter).

Page 2: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

attention from the broader social-research communities. The cyberinfrastructure initiative2 is

primarily directed at the production of interoperability within sciences and engineering, but

with the broader goal of providing access and resources to policy, education and public

spheres. Cyberinfrastructure development projects have, thus far, tended to primarily focus

on sub-disciplines (e.g. BIRN – brain imaging), disciplines or trans-disciplines (e.g. SEEK –

the ecological sciences). Each of the CIs is mandated to provide computing resources, data

storage and management, tools (such as visualization and other forms of representation), and

data integration techniques, along with educational and broader access considerations.

However, the organizations, experts and technologies of cyberinfrastructure are not specific

for each application to a science domain. The San Diego Supercomputer Center (SDSC) is an

organizational hub for the production of CI – all three CI’s listed above, along with GEON,

the focus on this study, are organizationally centered at the SDSC – and this single location

has generated substantial ‘synergism’ between these projects. For example, the particular

deployment of physical infrastructures and Grids3 can be consistently shared between CIs, the

development of interoperability are specific to each domain, to each case in fact, but the

methods and tools for production are shared, and IT specialists working on multiple CIs are

able to gather design experience.

One focus of this dissertation will be to demonstrate isomorphic4 tendencies between

CIs. But the SDSC is also involved in a series of digital government ventures at various

scales. Just as the IT experts, and particular technologies under development for CIs are not

necessarily specific to a single domain science these development can carry to a larger arena

2 Broadly construed, CI includes efforts on the part of the NSF, the NIH, NASA, and the DoE, along with many coalitions with state, corporate and international ventures. 3 Systems for the management of heterogeneous computer resources: computing time; data storage, management and back-up; manipulation tools &c.4 Isomorphic here will mean: tending towards similarity despite heterogeneous origins. In other words projects for the interoperability of data and digital resources are initiated for a multiplicity of reasons, and through various means, but because they all converge on the contemporary organizations, experts and technologies of IT and coupled with policy mandates to produce greater and greater interoperability between data, developmental trajectories will tend to converge.

Page 3: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

of interoperability projects. In short organization, expertise and technologies for the

development of data and digital resource interoperability have isomorphic tendencies across

boundaries such as science, engineering and digital government.

This project will take as its primary case GEON – the geo-sciences network

www.geongrid.org – an ambitious emerging trans-disciplinary cyberinfrastructure for the

geo-sciences, intended to provide computing resources, data interoperability and digital

resources (mapping, visualization &c.). GEON is organizationally centered at the SDSC –

where the majority of its IT developers are located, along with its management and

administrative core – but distributed nationally across the U.S. through its geo-science

primary investigators (PIs) and ties to data sources (e.g. USGS) and education resources (e.g.

DLESE). GEON, then, sits at the nexus an extremely diverse cast of actors (see fig.1),

representing

IT PI’s

CS and IT Community

CIs @ SDSC

CI Initiative

Geo-Science C

omm

unity

SDSC

Geo-PIsDS B

CS

AS

IT @ SDSC

Fig 1. GEON sits at the nexus of heterogeneous players, each of which sits in a larger organizational or institutional milieu. The inner circle is directly related to GEON through people, the outer circle represents a constituency in which these people participate. Quadrant A includes those IT experts immediately involved with GEON, participating in a larger community of IT as the SDSC, and in the general discipline of CS and IT. Quadrant B includes the geo-science PIs, and the geo-science community at large. Quadrant C is the organizational milieu of CI’s at the SDSC, which exist as part of the CI Initiative; Quadrant D are the IT PI’s directly involved with GEON, who partake in the SDSC. Thus while two quadrants may overlap in terms of particular individuals, each quadrant represents an irreducible scale of analysis e.g. each IT PI is also part of the larger CS community; these roles are sometimes contradictory, but also sometimes do not appear simultaneously at all

GEON

Page 4: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

Technical

Organization

Policy

the geo-sciences as well as computer science / information system practitioners and

technologies.

Empirically this research will focus on the design and implementation of a socio-

technical system for the production of data and digital resource interoperability5. The

production of data interoperability cannot be understood solely as a technical feat, nor be

black-boxed as a fiat of policy implementation rather it must be understood as simultaneously

both, and a socio-technical organizational structure. Three data collection sites, or empirical

concentrations, will serve as primary signposts for this research: i- the larger ‘politicized’6

milieu in which GEON operates, including policy circles, the geo-science community (with a

focus on geo-informatics), the SDSC as a player in the sphere of the cyberinfrastructure

initiative; ii- the organizational structure of GEON, including both its SDSC technical and

administrative core, and its distributed geo-

science nodes, but also GEON’s placement

within the organizational structure of the SDSC

and its multiple parallel CI projects; iii- the

implementation of data and digital resource

interoperability, and in particular the primary

technical vehicle for this within GEON:

ontologies.

Introduction II Interoperability: a diatribe on the (socio)technical

5 GEON self-classifies as having three levels of developmental operation: i- systems (physical infrastructure, computing resources and the GRID), ii- knowledge representation (ontologies, data integration, controlled vocabularies) and iii- tools (mapping, visualization). In practice these divisions tend to become obfuscated, however, the majority of my technical research will focus on knowledge representation.6 “Governments not only “power” … they also puzzle. Policy-making is a form of collective puzzlement on society’s behalf; it entails both deciding and knowing. The process of making … policies has extended beyond deciding what “wants” to accommodate, to include problems of knowing who might want something, what is wanted, and what should be wanted, and how to turn even the most sweet-tempered general agreement into concrete collective action. This process is political, not because all policy is a by-product of power and conflict but because some men have undertaken to act in the name of others.” (emphasis added Heclo 1974, p.305)

Page 5: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

Interoperability is a loose term for a very diverse, but highly intertwined, series of

goals within contemporary computer science, information technology fields, and most

importantly for this research, within Cyberinfrastructure. In short, interoperability refers to

the ability of digital data to transfer unproblematically across various software and hardware

platforms. Today we are able to open older Corel WordPerfect files (.wpd) with Microsoft

Word – this is interoperability across software platforms – and we are able to move Adobe

Portable Documents (.pdf) between the Macintosh Operating System and the Windows

Operating System, or even to a high-end personal digital assistants (such as the Palm Pad) –

this is interoperability across hardware platforms. So why is interoperability a site for

sociologically informed investigation? The two examples above, apart from the

hardware/software distinction, are also different because of their strategies for producing

interoperability: in the case of .wpd, Word conducts an automated translation of the

WordPerfect file in order to display it within the Word application; in the case of .pdf, a

single standard has been created which all applications draw from in order to read the file. It

is the particular strategies for the production of interoperability, and the organizational

systems which enable their deployment that will be a primary focus of the more technically

oriented portions of this research; it is the particular work required today – programming and

design, yes, but also arguments, agreements, compromises and concessions – which makes

the automated interoperability of tomorrow appear so easy.

As those of us who have been working with digital documents for some time are

aware, this interoperability across file formats has not always been so readily available, and

even today often remains the source of much frustration. I am still unable to recover

documents written on my Mac Plus previous to 1994. Contrary to visions of digital data as a

metamedia able to represent smoothly between formats, times and places (e.g. see the early

works of [Kittler, 1997 #19]), the production of interoperability is work, and often quite

Page 6: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

contentious within the relevant data communities. In many senses the two examples above

– .wpd and .pdf – are deceptively simple cases of interoperability and risk obscuring the

kinds of very explicit tensions that are currently being played out within IT and domain

constituencies. When I manage to convert an old document (for example my 1996

WordPerfect undergraduate paper “Lest We Forget the Cyborg: Theoretical Conflations”) I

may be frustrated by the disjointed formatting that the Word translation displays, the disarray

of my meticulous and semantically arranged dance of slashes, hyphens, semicolons and my

strategic use of italicization. But the text remains legible, even comprehensible. It is possible

to reconstruct meaning despite the lack of perfect data interoperability. For some time now

various branches of semantically oriented social theory (postmodern, semiotic, cultural) have

been reminding us that texts are wholes, or that the meaning of texts are produced in relation

to contexts, histories or experiences. This is one, fairly plausible, explanation for why with

my cyborg article my interpretive ability remains despite a ‘not perfectly translated text’; I

can place garbled portions of the text in relation clearer portions, in relation to my experience

of having written it, and to a base of relevant literatures. The possibility of this kind of

contextual reconstruction is precisely what differentiates ‘texts’ (or semantic data) from the

kind of data that Cyberinfrastructure seeks to manage: large (sometimes in the terabyte

range), vastly quantitative, and abstract datasets. In these cases, garbled data means useless

data – there can be no reconstruction, and precision is a key consideration for scientists along

with many other data-users.

As long as a seismologist who has a deep understanding of her data continues to use a

single data-set on a single software/hardware platform (say DB2 on a Windows OS),

interoperability is a moot question. But when that scientist wants to migrate her data to a

Linux OS, collaborate with colleagues across the nation or visualize the data using Arc IMS,

the question of interoperability begins to become relevant. How to ensure that this carefully

Page 7: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

recorded data, meticulously measured and collected by precision instruments is ‘the same’

across database platforms and visualization programs and that the ‘hyphens, backslashes and

semicolons’ [or in this case quantitative values representing x, y, z] have not been garbled in

translation?

The question then becomes, what can be done with scientific data to ensure that

interoperability is unproblematic. The problems of interoperability can be considered at three

scales of action:

(a)- what is required to meet the interoperability goal(s);

(b)- how the goal(s) will be practically executed and;

(c)- enrolling the necessary actors to accomplish the goal(s).

The scales are not a sequence, but rather must be considered simultaneously; collectively they

are what I refer to as a strategy for interoperability. The specific strategy of interoperability

cannot be identified holistically by pointing to a technology (e.g. ontology) or a goal (e.g.

visualization), but rather will emerge through the process of implementation. A consideration

of (a) a particular interoperability goal, say long-term data storage, is also a consideration of

(b) the durability of storage media, resources for data management, the stability of funding

sources and the consideration of (c) the willingness of scientists to invest their time in

specifying the relevant data, adding metadata, and occasionally returning to manage the

storage.

One of the strategies for interoperability of Cyberinfrastructure (as with digital

archiving more broadly) is to encourage the production of metadata for scientific datasets.

Metadata is something like a technical version of postmodern interpretive cues: it can include

adding contextual information (‘who made this data? and why?’), semantic information

(‘what does this data mean’), and even verification structures (what values of data are

acceptable, and thus which values may indicate corruption of the file – or ‘what kinds of

operations can be conducted on this data?’). Metadata can be either machine readable code or

Page 8: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

more conventional textual and numerical descriptions that a data-user can review. As with all

the interoperability strategies, metadata production involves some very involved questions of

planning, and a great deal of implementation work and maintenance. What metadata should

be included? This depends on what people think that the data will be for: a seismologist

writing for a fellow seismologist can assume a large shared familiarity with domain

knowledge, and even disciplinary data structures – but if the data is intended for

geophysisists other kinds of additional information will be required. When the data is

intended for a non-geoscientist altogether, say a the US military’s geographic intelligence

systems, the question of what metadata is relevant becomes highly ambigious. Of course,

interoperability is not only for travel across domains, but also across time, or across kinds of

operations. What kind of metadata is necessary to solve the problems of legacy archives?

How might the data be used in the future and what will be necessary to facilitate this? What

kinds of computational operations (such as visualization or clean-up) can be permitted on the

data without causing corruption? Broadly speaking we can think of ‘interoperability’ not

simply as a technical capacity but also as a short-hand for three kinds of concerns, or possible

goals of interoperability:

(i) long-term data archiving, maintenance and re-usability;

(ii) the provision of access to larger constituencies, more diverse constituencies or across traditional constituencies, and;

(iii) the facilitation of data manipulation, integration, and representation.

Each of these goals, in various forms, are part of GEON’s mandate. GEON seeks to make

data a community resource rather than the basis for a single scientist’s findings: this means

making the data comprehensible to people who did not create it, and yet ensuring that it is not

misunderstood or misused (re-usability, larger constituency). Furthermore GEON seeks to

serve the geo-science community - with a focus on the solid-earth – with the particular goal

of making producing new scientific ventures crossing traditional disciplinary lines, such as

Page 9: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

collaborations between geo-physists and paleobotanists (across traditional constituencies). In

order to facilitate these collaborations it is necessary to draw data together from multiple

sources and in multiple formats (integration) and make tools for its accurate and

comprehensible mapping and visualization for users (representation). This said, while all

these goals are present, and some effort is made at an upper-level of GEON to align these

goals, in practice GEON is, at this point, a loose federation of multiple data interoperability

efforts.

Returning to the scales of action, enrolling scientists in the work of developing

metadata is substantial hurdle– often the least considered in advance within IT circles.

Metadata, or interoperability more generally, is a form of infrastructure: it does not

immediately benefit the developer who already understands her data, but instead is

constructed with the goal of providing data-access to other colleagues, disciplines or even

times. Furthermore there are few reward structures for the development of metadata:

scientists do not receive acclaim for making data public, these are not considered ‘science

results’ (“new knowledge about the Rockies”) and they rarely count as publications towards

tenure or grant success. Finally, while these are some of the concerns in producing metadata

for particular kinds of interoperability goals, the work involved in implementing metadata is

also highly consequential7: who will dedicate their time to writing the elaborate additional

information? Determining what metadata is necessary for meeting a particular

interoperability goal occurs at the same time as a strategy for writing metadata is constructed.

The task of metadata writing cannot be easily delegated to a technical or administrative staff:

the deep knowledge of database structure and a history of empirical inquiry is held in the

hands of particular experts in a domain community. The work of extracting understandings

about databases – often held tacitly amongst practitioners, and in an unstructured manner

7 Enrollment is the work, or method, for involving scientists in producing metadata, or interoperability more generally, while the ‘work’ of the scientists is the active production of the formal knowledge necessary for machine computability or contextual interpretation.

Page 10: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

such that they don’t immediately know what information is relevant – and communicating the

metadata information to competent technical hands, who then must formalize and encode the

information as metadata is complicated enough (my sections on ontologies will discuss this

task more thoroughly).

It is for these reasons that we must call strategies for interoperability a sociotechnical

endeavour. While every possible strategy for digital data interoperability involves what can

be identified as a technical component (ontologies, metadata, standards), each also involves

an organizational structure, strategic planning, various risks of failure, enrolling participating

actors and even some future forecasting. Interoperability work is highly charged activity for

participants, significant for contemporary action with data – including the production of new

data, and new knowledge – and of progressive importance for future data-oriented activities.

Because the practices of interoperability cannot be separated from a ‘purely technical

substratum’ of software and hardware the term ‘technical’ will serve as a shorthand for the

component of this research which directly relates to the production, use and maintenance of

data interoperability.

Page 11: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

Investigative Structure

The three empirical angles, or ‘cuts,’ of this project – political, organizational and

socio-technical – mirror the three part analytical thrust of this research:

i- IT and the transformation of the extrinsic and intrinsic relations of science

-the progressive rationalization/bureaucratization of the sciences

-the production of a system of disciplinary visibility: a neo-census of the sciences

-literature core addressed: Science and the State, governmentality, IT and the State (digital government)

ii- the emergence of an techno-organizational form for the production of systematic data interoperability – cyberinfrastructure, techno-organizational isomorphism,

-a system of organization, experts and technology which is significant not only for specific sciences, but potentially for all data-integration efforts i.e. digital government

-literature core addressed: Organizational (neo-institutionalism), CSCW/DCP (Computer Supported Collaborative Work/Distributed Collaborative Practice),

iii- the importance, organizational and epistemic, of the particularities of interoperability efforts for the domains of technological enactment

-automation: the work of developing and implementing a technology is crucial in understanding its effects (technology determines only after it has been determined )

-IT mobility: organizations, experts and technologies cross domain boundaries seamlessly

-literature core addressed: standardization (STS), IT and domain science itself

i-In her studies of the rise of a virtual state, Jane Fountain argues that ‘government to

citizen’ (G2C) services has only been the first phase in the integration of IT with the

contemporary state (G2C is the provision of state information and services online) and

instead turns our focus to ‘government to government’ (G2G) transactions: linkages, data

exchange, and service sharing between institutions, boards, directorates &c.-- which she

believes will lead to radical transformations in the organizational and institutional form of

government8. In short she shifts attention from the transformation in the public face of the 8“ The promise of a seamless interface with public at the level of a computer screen is the promise of the first wave of G2C digital government. The second wave, G2G, is integration and connection across jurisdictions and

Page 12: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

state by informatics, to the internal relations of the state which are facilitated and

spearheaded by data exchange and interoperability. This study will contribute to showing a

similar transformation in the geo-sciences – and by showing cyberinfrastructure isomorphism

the influence on a broader set of science and non-science fields (see ii). While GEON will

certainly begin to transform the relationships of scientists to their data (including sharing of

collected data, access to shared data, and manipulation of that data) it will also spur an

internal transformation of the organizations and institutions of geo-science, creating trans-

disciplinary linkages, new purposes and outlets of collective action and new funding

interrelationships. While within geophysics, research has long been tied to large-scale

organization with costly instrumentation, geologies’ organizations – such as the Geological

Society of America (GSA) -- have tended to be primarily sites of sociality and exchange of

research findings, this is similarly true with paleobotany and metamorphic petrology. Thus

cyberinfrastructure harkens a new form of bureaucratization or rationalization (Weber 1978)

of the sciences with clearer hierarchical structures, administrative bodies and closer ties to

state establishments (in particular the NSF, in the case of GEON, but cyberinfrastructures are

variously funded by federal or state-level bodies).

But the particular future technologies of GEON, developing more generally in CS and

IT, are more fine grained and penetrative than traditional bureaucracy or methods of

rationalization. Information systems can be understood as largely determining the attentional

processes of an organization (Pfeffer and Salancik 1978), or in the methodological precept of

Bowker and Star “within an organizational context, it is easier to explore the distribution of

memory and forgetting than the distribution of representation,” (Bowker and Star 1999,

p.266). The new technologies of knowledge representation are attempts to make the very

semantic material of knowledge accessible to detailed encoding and retrieval, most often

programs behind the interface in the bricks and mortar of government. The second wave is a about politics and the structure of the state,” (Fountain 2001, p.202)

Page 13: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

with the intent of automating the ‘knowledge-work’ and delivering highly manipulated

results to the user (see iii). The result of such technologies should not be singularly

understood as tools for the scientific user, but also as the first steps towards a computability

of knowledge itself, permitting a finer grained vision of terrains of knowledge. Foucault has

tracked the emergence of concepts such as ‘population’ and ‘economy’ (see also Mitchell

2002) and these techniques dispositifs permitted a shift in scale to the analytic of ‘society,’

but more importantly led to new methods of intervention e.g. acting on the population

through policies related to birth-rates (Foucault 1991). The socio-techniques emerging within

cyberinfrastructure are thus not simply representations of individual knowledges, but have

already begun to be used to create larger meso- and macro- views of domains9; we can begin

to surmise that these representations will become sites of new interventions (Hacking 1983)10.

The broader argumentative stroke of Foucault is of the development of a more pervasive,

penetrative, subtle, ‘nodal’ version of power (Foucault 1978). Power, in its modern

configuration, is more effective when it is subtle, unseen, and offering fewer faces for

available critical techniques. We can see a similar movement in the creation of

cyberinfrastructure; sub-disciplines, disciplines or trans-disciplines become known not only

as budgetary objects, regulated at the federal, state and university levels, but as knowledge

objects, constituted and regulated by large-scale infrastructures which include available

resources and even calculable knowledge maps. The movement from a discipline as

budgetary/statistical object to an ‘ontological’/database object is a shift in the subtlety and

penetration of disciplinary regulation. It is not necessary to posit an intentionality here – a

9 http://www.1-900-870-6235.com/KnowledgeMap.htm a conference for the production of domain level knowledge maps, by following the links it becomes clear that the efforts are drawing substantively from very fine grained knowledge representations (ontologies &c.) available in each domain. 10 For example, within federated databases, standardized distributed databases and ontology enabled interoperable databases it is already possible to keep track of the access rate of particular portions of the data. In technical terms this is known as the ‘active set’. This impetus to produce this technology is from the scientists themselves, who believed that data sharing would be encouraged if the importance of a particular scientist’s contribution was calculable. In short the use of particular data points would be referenced, and thus add prestige, recognition &c. to the data-producer. The end result is a fine grained visibility of what data is accessed, who it was produced by, and who is using it.

Page 14: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

state wanting to observe the content of knowledge makers within its territory – but rather to

trace an alignment between emerging forms of representation (‘what sub-fields are

blossoming’) and forms of intervention (‘who gets funded’); my goal is to observe the

construction of a chain between the content of knowledge practices and policy spheres.

ii-

The general drive for the production of interoperability is by no means solely a top-down

effort. While umbrella organizations and funding institutions such as the National Science

Foundation (NSF) and the Advanced Cyberinfrastructure Initiative (ACI) provide the

enabling structure for encouraging these projects, endeavours such as GEON are in many

senses bottom-up11, initiated from within the geo-science community and primarily managed

by IT- and geo- PIs (although with closer than usual supervision of a grant by NSF

representatives). For GEON, and many other cyberinfrastructures and interoperability

ventures, the bottom-up and top-down initiatives meet at the meso-organizational scale of the

SDSC. The SDSC acts as the site for both the technical and administrative core (Scott 1992)

of GEON, and a central location for management the distributed geo-science PI’s.

A second analytical thrust of this research project will be to show the importance of

local, particular developments of information technologies to much broader venues. In short,

the SDSC is an organization of a particular sort: it is an engine for the systematic

transformation of digital resources. Similar to an obligatory passage point (Latour 1987)12,

11 Who’s bottom, and who’s up is a relative question: from many geo-scientists perspective GEON is a top-down effort on the part of the NSF directorates along with senior level geo-scientists; but from the perspective of the NSF and other state bodies, mobilization and organization has been initiated from within the domain itself: the GEON geo-science PIs. This said, the NSF funding structure for Information Technology Research (ITR) is such that if the geo-science directorate does not request funding, others will secure this funding, thus in some sense geo-informatics is an incentive driven institution, if no ITRs are given to the geo-science, they are likely t think of it as a loss of potential funds. On other occasions the NSF has insisted on SDSC involvement in an IT project before permitting funding (e.g. CHRONOS), this is another form of top-down incentive.12 The term is too strong, since the SDSC is by no means obligatory, however the notion captures the sense in which multiple diverging trajectories (digital government, science and engineering disciplines) are enrolled in the use of similar IT technologies, goals of interoperability, and organizational structures to produce them.

Page 15: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

the SDSC’s products are isomorphic (Meyer and Rowan 1977). The SDSC concentrates a

variety of leading-edge experts, technologies and domain specific funding grants in a single

institution. Funding grants are primarily application oriented (rather than ‘pure research’) and

‘soft money’ (rather than continuous funding) both of which encourage complete

development cycles rather than the half-baked efforts for which CS has often been criticized

(Star and Ruhleder. 1994)13. Experts and technologies (software and hardware) are highly

mobile between particular cyberinfrastructure and other data integration projects (see fig.3)

and thus despite disparate domain applications, technological implementation will tend

towards isomorphism14. Furthermore IT experts and technologies alike, are somewhat

agnostic

towards the domains in which they operate. A harsh way of saying this is that IT specialists

don’t care about the domain, or more reasonably that their focus is on IT. The same

13 There are a variety of other incentive structures to encourage application success, see organization field paper for a more thorough discussion.14 Recent STS research in social informatics has tended toward an insistence on the specificity and locality of IT applications (Monteiro and Hanseth 1997; Kling and McKim 2000; MacKenzie 2001), however another tendency of STS has been to understand the production of universals in the circulation of particulars (O'Connel 1993), that is to say the inspection of techniques which permit the abstraction and mobility from the local (Bowker 1993; Porter 1994). My research will tackle both, as each ontology is a particular, domain specific, application of what is becoming a universalizing method.

Fig 3: Within the SDSC experts and technologies are mobile between particular projects. Experts may learn techniques for IT application in one project and these skills travel between them. These experts are also in dialogue with broader academic and research communities within their specialties outside the SDSC.

Page 16: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

individuals building interoperability for GEON (geo-science), are also working on BIRN

(neurobiology), projects with General Atomics (private industry, with strong links to the

departments of energy and defense) and Socioinformatics (an as yet unfunded endeavour

between social sciences/economics and digital government). Jane Fountain’s statement below

-- which resonates with the literature on governmentality (Rose ????; Foucault 1991; Porter

1994; Rabinow 1999) -- takes on a new meaning when we begin to think about this

movement between government, private, and public-science spheres: “Yet information

architecture, both hardware and software, is more than a technical instrument; it is a powerful

form of governance. As a consequence, outsourcing architecture is effectively the

outsourcing of policy making,” (Fountain 2001, p.203). In the case of projects at the SDSC,

‘outsourcing’ brings with it an entire entourage of technologies in development, experts, and

organizational form.

iii-

Ontologies are another

solution to the problem of

interoperability. Other

solutions include, but are not

limited to, data federation or

the creation of community

standards. Data federation is

the one-time translation

between disparate datasets, producing a single unified database. Community standards

requires that the production of new datasets be encoded in a shared classificatory framework

and controlled vocabulary. Ontologies are technologies for the in-the-moment and ongoing

Page 17: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

translation between datasets, controlled vocabularies or program outputs and inputs. As with

a thesaurus there is no need to transform individual languages in order to translate: when

translations are necessary the thesaurus can be used to accomplish this work. This is at least

the dream of ontologies, as represented by IT to domain. It is this myth which has permitted

the motto of GEON to flourish: ‘we don’t make standards:’

This rhetoric permits the immediate diffusing of two outstanding issues in the minds of geo-

scientists: i- the problem of legacy data and ii- the intervention in individual research

practices. Aware, in various forms, of the politicized nature of standards, geo-scientists are

wary of the difficulties in maintaining legacy data following standardization and the

invasions in the individual work (quite important in most of the field based geo-sciences) as

standards are implemented and enforced. Ontologies are touted, by IT, as capable of

integrating legacy data with contemporary research, and that they may travel seamlessly over

idiosyncratic categorizations and languages (schemas). ‘We don’t make standards’ is a

depoliticization of the cyberinfrastructure project and ontologies are the enabling vehicle for

that move15.

If we are to assume that translations are politics by other means, then ontologies are a

shifting of the terrain of debate from single act of translation (in the case of federation) or

(sometimes tacit, sometimes explicit and sometimes conflict ridden) community consent (in

the case of standards) to the production of ontologies. As with a thesaurus, ontologies require

the production of equivalencies (see Irreductions in Latour 1988) between terms. More than a

thesaurus these equivalencies must be machine computable – and thus are formally encoded

in the languages of logic (concepts, predicates, operators &c.)16. GEON is innovating a

method for the production of ontologies – which they call concept-space workshops – in 15 I don’t know if this has become the motto for any other project, most likely not as such per se, however, ontologies are generally understood by the knowledge representation community as extrinsic to the problems of standardization. Slides such as the GEON integration scenario above are available for various other projects; each unproblematically demonstrating the translation of a query, into multiple ‘raw data’ languages, and back, into a synoptic (Latour 1986) view. 16 The particular form of logic is specific to the form of knowledge encoded (predicate, modal, temporal).

Page 18: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

which domain experts are placed in a room along with knowledge representation specialists

and together concept spaces (or ‘napkin drawings,’ which are pre-computable knowledge

representations) which, over time, become formalized within ontologies. This particular

technique of knowledge acquisition17 has some resemblance to Latour’s characterization of

the laboratory (Latour, 1983). Local, small scale manipulability, and the particularities of this

setting are then become modalities, simplified (Star 1983) and then must then be go through

the trials of extension (Callon 1998). Similarly the closed system of a concept-space

workshop permits the bracketing of broader debates within a disciplinary community, the

production of local, pragmatic, concessions in translations, the particular histories of which

become erased as the concept space is formalized into clean logical operators and the

‘objective technology’ is disseminated.

These three cuts – socio-technical, organizational, political -- are not to be considered

coeval with the three classic scales (micro, meso, macro) for the implications of each cross-

cut throughout. An emerging model for knowing the terrain of science is premised on a the

fine-grained technologies of knowledge representation, but must be actuated through the

techno-organizations of GEON/SDSC/Cyberinfrastructure and translated through the domain

disciplines18. In her theory of technological implementation Jane Fountain argues that the

importance of IT technologies emerge at the moment of transformational integration with

organizational structure:

information technologies are not simply purchased and plugged in, even when off-the-shelf products and services are procured for government

17 Within knowledge representation there are many techniques for knowledge acquisition (Boose 1989) , this particular model is in lineage of expert solicitation (Meyer and Booker 1991) which has spawned an entire literature on the difficulties and use of this method (Booker and Meyer 1990 (submitted?)). 18 In this case it seems safe to say that domain disciplines can be considered obligatory passage points – some form of assent is required in order to authorize the knowledge. The particular translation, however, is still specific: assent may mean active consent on the part of a community, popular use of the software (implicit assent), use by representatives and so on. The counterfactual, for which I have no empirical case of yet, is of a community which rejects an ontology, and thus the particular domain knowledge mapping and representation becomes nul. It seems safe to deduce that domains are still the arbiters of their knowledge…

Page 19: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

organizations. They are always subject to extensive design of their use within an organization and must be integrated with work processes, communication channels, means of coordination, culture, authority structures -- every central element of an organization (Fountain 2001, p.195)

Her methodology is insightful, she argues for the pervasive importance of

technological integration and organizational change attempting to address a systematic gap

within neo-institutionalist literature. However, Fountain’s unit of analysis is limited to the

implementation stage: technologies are endogenous, or inherently significant, but are not

considered prior to the moment of integration into organizational structure. It is the ability to

change lines of communication, to shift organizational form, memory practices that interest

Fountain. However, this is not a consideration of either the full development arc of the

technology or the various implementations which occur for the technology. This research will

demonstrate both that development phases of an IT technology – in this case ontologies –

embed the particular choices of their designers, investments in articulation, and what are

considered the limitations of the technology itself. It will also demonstrate that experts,

expertise and organizations are often linked to implementations and that these travel along

with technologies. Organizations such as the SDSC are sites for the application of basic

research in IT, but while particular applications are specific, organizational substructures of

new cyberinfrastructures are informed by previous efforts; furthermore experts are mobile

between applications, gaining experience through the deployment of technologies over

various projects; finally experts produce a stream of formalized expertise in the form of

publications, presentations, FAQ’s and classes (many staff members at the SDSC hold

positions at the UCSD). In short, while a full understanding of the implications of a

technology must take under consideration implementation, STS has shown the importance of

tracing a full line of development (Rapp 1998), that a technology’s importance bleeds beyond

its immediate confines and is informed by various histories such that a researcher may need

to travel from technical details to classical political scales, this paper will follow:

Page 20: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

i- full development arcs of a technology;

ii- the inter-domain mobility of a particular technology;

iii- the predicated basis of new cyberinfrastructure organizations on previous endeavours;

and iv- the knowledge built up by experts and their encoded knowledge in texts &c. which inform new implementations.

Empirical Concentrations

The GEON project is itself both ambitious in scope and diverse: a cast in flux

includes a large team of IT experts at the San Diego Supercomputer Center, eleven PI’s

spread across the nation each with a handful of computer scientists, and collaborations with

USGS, DLESE and with the Canadian Geological Survey. The telescopic structure of this

research (technical, organizational and policy) demands a careful selection of empirical

concentrations in order to permit both a detailed understanding of developmental facets and a

larger vision of its place within the Cyberinfrastructure Initiative. My preliminary research

during the last sixteen has relied on three empirical concentrations, a fourth concentration,

discussed at the end on this section, is necessary to complete the telescopic nature of this

study:

i- The Organizational and Communications Structure of GEON: the weekly workgroup meetings of top administrative managers and the IT team are an excellent vantage point from which to observe the general organizational functioning of GEON.

ii- Concept-Space Workshops: These workshops are for the production of scientific workflows and ontologies, they are one of the points of greatest interaction between IT and domain sciences.

iii- Geo-Scientist Sub-Groups: Each geo-science PI works relatively autonomously on GEON projects with a local team of academic geo-scientists and information technologists.

While GEON is geographically distributed across the nation, the San Diego

Supercomputer Center and its team of IT experts could be described as its ‘core,’ and

constitute the organizational hub. Held on a weekly basis these meetings bring together the

Page 21: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

top level administrative managers of GEON, along with the central team of IT experts. It is

this team that is essentially responsible for the central administration of the GEON project.

GEON itself can be described as an ‘adhocracy’ (Mintzberg and McHugh 1985; Mintzberg

1992) in which there is a low division of operating and administrative roles; in GEON the act

of administration is tied to the act of technological development. In the past few months my

research has tracked the emergence of an organizational structure for GEON as they have

attempted to build successful lines of communication between this core and the distributed PI

team; they have also arranged to represent GEON with demos and booths at geo-science

conferences (Geological Society of America, American Geophysics Union). By observing

these meetings, and other planning meetings (see next section), I will able to identify larger

trends in the development of cyberinfrastructure and trace the unfolding of organizational

structure. It is also at these meetings that the links to the NSF, the cyberinfrastructure

program, and other cyberinfrastructures are discussed. Participants are variously concerned

with future evaluations, the general direction of CI’s, and what bottlenecks or solutions they

have identified in other projects in which they participate.

The concept space workshops are held irregularly. They are two to four day meetings

in which IT experts and geo-science domain specialists meet to discuss the production of

ontologies and workflows19. Ontologies are software technologies of knowledge

representation, they are conceptual glues which allow the interoperability of databases and

enable sophisticated search devices and queries. These workshops are the exemplar case for

IT/domain interactions; although IT and geo-scientists come from radically divergent

disciplinary grounds the successful production of an ontology is predicated on reaching

19 ‘Workflows’ are technologies for the automation of certain aspects of practice – they link together strings of acts conducted on computers (e.g. extract data from source x y z => integrate => clean data => model data => create visual representation). This study may or may not turn to a greater focus on workflows, which have strong relationships with ontologies.

Page 22: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

understanding between participants. The majority of time in these meetings is spent

formalizing the domain knowledge of geo-scientists and explaining these formalizations to IT

experts. These IT computer scientists, in turn, will encode the geo-scientific knowledge into

ontologies. These practices require various forms of consensus building, pragmatic alignment

or temporary dismissal of disagreements, and thinking about the long term implications of

solutions to the problem of knowledge representation: this is doubly-so in the case of

ontology building, since the results become ‘hardwired’ within computer software

applications; misunderstandings at the conceptual level of an ontology can have serious

repercussions for the utility of tools produced by GEON.

Geo-science PIs are located across the nation. My research will involve travel to a

selection of these sites in order to record the particular progress of these projects, evaluate the

connections between these groups and the central organizing groups at the SDSC as well as

between geo-PI groups. Each of the geo- PI’s within GEON have assembled a team of local

geo-scientists and IT specialists. The final goal of GEON is producing a single interoperable

architecture for scientific work within the geo-community. But producing interoperability is a

particularly difficult task (Harvey 1999; Harvey 2001). What organizational linkages and

lines of communication have been established between GEON teams located outside of its

central hub in San Diego? Furthermore what kinds of alignment mechanisms have been

established with the other geo-science PI’s, located across the nation, and participating state

organizations such as the US Geological Survey (USGS)? A tight series of organizational and

communications linkages will be necessary in order to ensure the success of GEON in

meeting the needs of such a diverse constituency. Finally, the geo-science PI’s are the portal

for understanding the larger dynamics of geo-informatics, and geo-science endeavours.

Page 23: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

Together these three empirical concentrations – the central workgroup, concept space

workshops, and geo-PI sub-projects – will permit the triangulation developments from the -

technical, and organizational dimensions. The central workgroup offers an excellent vantage

point from which to understand the organizational structure of GEON, what does a

successfully structured cyberinfrastructure look like, and what does it do?; the focus will be

on lines of communication, and linkages to other organizations or organizational sub-units.

The concept space workshops serve as an exemplar of IT/domain communications; in

cyberinfrastructure communications will generally be knowledge intensive, and require

sophisticated methods to ensure successful dialogue. Finally, the geo- PI’s are the crucial link

to the broader geo-science constituency that GEON is mandated to serve. These three

concentrations serve to address primarily the technical execution and the organizational

development and structure. The workgroup meetings will also provide an inlet for

understanding GEON’s relationship ‘up and across’ to the SDSC, to other

Cyberinfrastructures, to the NSF, and the Cyberinfrastructure Initiative. For the remaining

time of the fieldwork I propose to add a fourth empirical concentration which will seek to

further develop an understanding of the policy sphere – what is GEON to the NSF, and other

organizational representatives of the state:

iv- ‘follow GEON up’ to policy: through interview and document analysis this concentration will seek to understand how GEON fits into a larger developmental view of the cyberinfrastructure program, and to ‘state visions’ of the large-scale funding of information infrastructure development more generally. At this point it is unclear what specific pathway this research will take, but I propose to ‘follow the network’ from the highest level players within GEON up to the NSF stakeholders / program directors (in the geo-science and computer science directorates), and from there collect more sites for research in a snowball fashion.

Along with a certain ambiguity for the particular pathway for research of concentration (iv),

it is also not altogether clear what will be the results of such an inquiry. I do not expect to

find an explicit ‘statement of governmentality’ that will connect the knowledge bodies to

control mechanisms within the state, much less a direct connection between the granularity of

Page 24: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

knowledge automation in ontologies and access of this knowledge for state functions20. My

goal instead will be to focus on (a) the particular goals in the development of GEON, the

justifications for such massive investment of resources, and the links between GEON and

other cyberinfrastructure projects; (b) the particular goals of the Cyberinfrastructure Initiative

(CII), how these alliances came about, the justifications for such a massive investment of

resources and the particular links between the various state agencies in the CII; and (c) the

relationships between CII and other state-funded, large-scale, information infrastructure

projects such as digital government. In order to answer these questions it will be necessary to

push beyond the passive ‘following the network’ and attempt to lead the discussion in

relevant directions.

Methods

At their inceptions cyberinfrastructures do not exist as anything more than an ad hoc

social network of experts collected from diverse fields. The work of beginning a CI is that of

building a common series of goals and expectations across domains as well as between

domain scientists and computer scientists, then creating a functional division of labor, and

securing an organizational structure to ensure long-term accountability. Thus, at the

beginning cyberinfrastructures are only peripherally technological artefacts and rather must

be seen as social and organizational endeavours.

Based on the research I have already conducted in the last year and a half I can detail

a general trajectory in the production of a physical infrastructure and the development of a

software base, but this has been a slow process. GEON held its kick-off meeting in

November of 2002, an assembly which primarily served to introduce the IT team, and their

planned technologies, to geo-scientists. Many of these geo-scientists were only loosely

20 Fortunately governmentality, as envisioned by Foucault, has never required this level of top-down planning, but rather the serendipitous alignment between forms of ruling and the development of technologies of knowing.

Page 25: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

familiar with each other, coming from diverse sub-disciplines. Approximately one year later,

in December 2003 GEON began to purchase its hardware stacks to build its physical

infrastructure, and is currently deploying this GRID (a system for the distribution of data,

resources and tools, see Buyya 2002). In short, cyberinfrastructures exist initially as a social

network, require many years to become a formal organization, and even upon the deployment

of a physical infrastructure remains primarily bound together by human work. Social

informatics research is methodologically tailored to study these social networks and emergent

organizational forms: primary methods include ethnography, interviewing and content

analysis (Denzin and Lincoln 2003).

A discussion of the research practices I have been employing for the last sixteen

months in the study of GEON will serve as a surrogate for future methods, although in future

years it may be possible to collect data through web-based and quantitative evaluation,

depending on the development of the infrastructures themselves.

For the past sixteen months I have been attending and participating in community

meetings, conferences, workgroups, e-mail discussions, and informal get-togethers as a

participant observer in social informatics. This form of research is known as participatory

design and participant observation or action research as it involves both the collection of data

and evaluative feedback to the participants (Blomberg, J.GIagomi et al. 1993; Schuler and

A.Namioka 1993). Throughout my work with these communities, I have provided formative

evaluation, both officially in presentations, but also informally in discussion of organizational

structure and inter-community tensions (Engestrom, Ahonen et al. 2000). Action research

usually presents two kinds of ‘problems’ which can be summarized as objectivity and

complicity. Does my participation, in the form of presence and feedback, alter the course of

the development of GEON? This form of criticism misrepresents the nature of

Cyberinfrastucture building today: the presence of a social researcher was requested, from

Page 26: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

within GEON, pointing to an already present interest and awareness of the difficulties in

building cross-community organizations. Furthermore, CII, as exemplified by the Atkins

Report, already contains a smattering of references to social informatics, organizational

theory and even science studies. In short, if the question of objectivity is posing the ‘social’

as a pollution to pure research in CI building, then it is misconceiving the question. Rather

the more nuanced question of particular configurations of organization, management and

planning must be considered. Regarding the question of complicity there are two issues: does

my involvement compromise a my ability to systematically analyze GEON?; and am I

supporting the CII? These questions are more complicated and will require continuous

thought throughout the research/feedback process. This said, some important distinctions

must be made: I am not a ‘stakeholder’ in GEON, that is, unlike the PIs, I have no deep ties

to the larger success of GEON. While I have been in many ways contributing to the ‘social

lubrication’ of relations and planning within GEON, my primary justification is simply a

matter of reciprocity, rather than political support for GEON and CI more generally. That is,

the participants of GEON have gone out of their way to welcome me within their community

and have always both respected my research agenda, and contributed their time to a more

sophisticated understanding of the project. In the face of this it seems appropriate to

reciprocate, as I can.

Data collected is ethnographic, interview and textual. I will use ethnographic and

participatory design methodologies (Schuler and A.Namioka 1993; Star 1999). All meetings

and interviews are tape recorded, with consent, and partially transcribed. All textual material,

including presentations and slides, articles, schedules and so on, are collected and archived. A

systematic archival system will be devised in order to produce an ethnographic and historical

data/artifact repository. Data analysis will be conducted using computer assisted qualitative

Page 27: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

data analysis software, which doubles as a systematic archival system for both primary and

secondary data. Somewhat labor intensive from the point of view of data entry and

maintenance, computer assisted systems do not replace familiarity with the data, nor do they

do the analysis for the researcher. What they do promise is to assist in the management of

large data sets by providing a flexible formal structure of storing, coding and retrieving notes,

interviews and memos. Specifically I am relying on the QSR program NVivo, along with

WebWhacker for storage of websites.. Finally, regular formative evaluation provided to the

communities of GEON will also generate valuable feedback as to the validity of our own

research.

Interviews will be conducted throughout the data collection period, and will focus on

the project manager’s and PI’s, but also on key IT players and geo-scientists within the

subject communities. These interviews will serve to understand the subjective success of

cyberinfrastructure deployment in the geo-science community; since we cannot rely solely on

technological success to evaluate interoperability, it is important to also focus on the general

enrolment of the larger science communities as participating constituencies of

cyberinfrastructures.

Technical literature produced both from the IT and domain-science components will

be collected, analyzed and archived. Technical literature can serve as a surrogate for the

success of a CI. A successful CI will result both in publications by users, but also

collaborations across traditional disciplinary boundaries. The qualitative analysis of literature

will also be able to bring-forth changes in research fronts of users: e.g. is GEON having an

impact not just on how research is conducted but on what objects are researched? Thus

technical literature serves both as a marker of the success of cyberinfrastructure in usage but

also of cyberinfrastructure in effect.

Page 28: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

Crucial to any social informatics endeavour is the willingness of participants to make

themselves available to researchers and provide access to relevant sites. As a whole GEON a

and the SDSC have provided excellent access, and data collection has been greatly facilitated

by this.

Coding, Archiving and Data Management

This section will detail the specific practices which I have used thus far to ensure that

data remains analyzable in the long term, it will also outline future plans for more meticulous

data management. All digital qualitative data is being filed with Nvivo, and thus some

discussion of its technical capacities is necessary. Because GEON itself is quite IT intensive,

it has been possible to take ethnographic notes almost exclusively with access to a laptop, and

thus also to code ‘in vivo.’ Furthermore, although I have not transcribed the vast majority of

my recordings, all notes are indexed in relation to the recording timeframe, and thus it is

possible to connect specific notes to specific moments in a digital recording, also stored with

the ethnographic data. My particular transcription strategy has been to mark timeframes for

later transcription, typically often this averages less than 5 minutes out of a given hour of

recording. These transcriptions can be coded using the NVivo

software in the same manner as ethnographic notes. Finally I

have also been collecting my notes of secondary literature

(relating to sociology or science studies) within NVivo, and

coding these, again in the same approximate manner.

The image to the left is a representation of the coding

scheme that I have been using thus far. The primary categories

have thus far been ontology, organization, and visualization,

with the occasional use of methodology, and the more recent

Page 29: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

addition of ‘field/politics.’ Each category has been subdivided with ‘daughter’ categories. For

example, organization has been divided into data about the organizational of the SDSC, of

GEON itself, and more general theoretical comments about organization – the category of

‘GEON organization’ is divided between issues relating to IT or geo-scientists. NVivo

permits bringing up all coded text within a ‘node,’ as well as bringing forth text surrounding

a coded portion, or more complex ‘assays’ in which proximate texts with multiple codes are

brought forth. In the future I intend to more systematically develop coding structures relating

to policy/political dimensions – these categories will be emergent depending on the data.

I have also collected a significant archive of primary data produced by GEON, the

SDSC, the NSF and the CI Initiative. These texts include: email correspondence within

GEON, academic articles of results or plans, larger statements of CI vision within the SDSC,

and by the NSF, and so on. These texts are both physical and digital. Digital texts have been

filed, occasionally marked within NVivo (which permits referencing external documents), but

rarely further coded. Emails remain a somewhat daunting morass, and will require a

systematic coding practice in the future using NVivo. Physical texts have been filed with

labels regarding their topic.

In the past months it has become apparent that I have collected a significant amount

of data over the last two years, and that the next year will only intensify this accumulation.

Thus it is equally apparent that in the immediate future it will become necessary to begin

producing systematic summaries of collected data, of both ethnographic notes and primary

data. In the initial phases of my data collection I had been producing primary ethnographic

notes followed by “level 2” analyses that would summarize entire sessions in a page, along

with “level 3” analyses would serve as specific theoretical formulations of an event (e.g. an

all-hands meeting). I intent to return to a more systematic execution of this practice to ensure

long-term data manageability.

Page 30: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

5. Chapters

5a. Chapter Outline

1. Introduction

2. Methodology

3. Part I – The ‘Field’ of Cyberinfrastructure3.1 A Pre-History of Cyberinfrastructure Institutions (the ‘field’ of CS and IT)3.2 A Pre-History of GEON (the ‘field’ of geo-informatics)

4. Part II – Organizing for Cyberinfrastructure4.1 GEON the organization (Early History of organizational emergence)4.2 Technical Cores and Domain Peripheries (IT-mobility)

5. “We don’t set standards” Interoperability, ontologies, and community consent-(this chapter is the rationale for the micro to macro approach)

6. Part III – Cyberinfrastructure in Action: Ontologies and Database Interoperability6.1 Knowledge Representation and Ontologies6.2 Knowledge Representation at the SDSC: the sciences as object6.3 The Practice of Building Ontologies6.4 Of Ontologies in Practice

7. Conclusion

6. Workplan

6a. Tasklist

Empirical Work:i- Continue with ethnographic research of GEON general organizationii- history of GEON, history of supercomputer center, history of cyberinfrastructures programiii- ontologies being built, ontologies being usediv- the organizational ‘field’ of geo-informaticsv- the political ‘field’ of cyberinfrastructure (NSF, SDSC etc.)

Object Fields:i. domain geo-science (as determined by specific ontology foci)ii. ontologies – philosophy and CSiii. logic – description logics, and programming

Social Science Literatures:i. keep up on STS IT work, and broader IT workii. learn the digital governments literature, governance, governmentalityiii. get up to date on organization and technology, organization and ITiv. learn history of computing

Page 31: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

Write-Up:i. write a dissertation

6b. Timeline

2004 2005Sept-Oct

Nov-Dec

Jan-Feb

Mar-Apr

May-June

July-Aug

Sept-Oct

Nov-Dec

Jan-Feb

Mar-Apr

May-June

Empirical Work

Revision&Defence

Object Fields

Social Science Lit.Write-Up

Little WorkSome Effort/Maintenance

Major ConcentrationPanic/Success

Works Cited

Atkins, Daniel E. (Chair) (2003). Revolutionizing Science and Engineering Through

Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon

Advisory Panel on Cyberinfrastructure, National Science Foundation.

Blomberg, J., J.GIagomi, et al. (1993). Ethnograpic Field Methods and Their Relation to

Design. Participatory Design: Principles and Practices. D Schuler and A Namioka

(ed.). Hillsdale, NJ, Lawrence Erlbaum Associates: 123-155.

Booker, J.M. and M.A. Meyer (1990 (submitted?)). “Common Problems in the Elicitation

and Analysis of Expert Opinion Affecting Probabilistic Safety Assesments.” CSNI

Workshop on PSA Applicaitons and Limitations.

Boose, J.H. (1989). “A survey of knowledge acquisition techniques and tools.” Knowledge

Acquisition 1(1): 3-37.

Bowker, Geoffrey C. (1993). “How to Be Universal: Some Cybernetic Strategies, 1943-70.”

Social Studies of Science 23(1): 107-127.

Page 32: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

Bowker, Geoffrey C. and Susan Leigh Star (1999). Sorting things out : classification and its

consequences. Cambridge, Mass., MIT Press.

Buyya, Rajkumar (2002). Economic-based Distributed Resource Management and

Scheduling for Grid Computing. Computer Science. Melbourne, Monash University.

Callon, Michel (1998). An essay on framing and overflowing: economic externalities

revisited by sociology. The Laws of the Markets. Michel Callon (ed.). Oxford,

Blackwell Publishers/Sociological Review.

Denzin, Norman K. and Yvonna S Lincoln, Eds. (2003). Strategies of Qualitative Inquiry.

Thousand Oaks, Calif., Sage.

Engestrom, Y, H Ahonen, et al. (2000). Knowledge Management -- the second generation:

Creating Comptetencies within and between work communicties in the Competence

Laboratory. Knoweldge Management and Virtual Organizations. Y. Malhotra (ed.).

Hershey Group Publishing, Iea Ge.

Foucault, Michel (1978). History of Sexuality: Volume 1. New York, Vintage Books.

Foucault, Michel (1991). Govermentality. The Foucault Reader: Studies in Governmentality.

G. Burchell and C. Gordon (ed.). Chicago, University of Chicago.

Fountain, Jane E. (2001). Building the Virtual State: Information Technology and

Institutional Change. Washington, D.C., Brookings Institution Press.

Hacking, Ian (1983). Representing and intervening : introductory topics in the philosophy of

natural science. Cambridge Cambridgeshire ; New York, Cambridge University Press.

Harvey, Francis (1999). “Semantic interoperability: A Central issue for sharing geographic

information.” The Annals of Regional Science 33: 213-232.

Harvey, Francis (2001). “Constructing GIS: Actor Networks of Collaboration.” URISA

Journal 13(1): 29-37.

Page 33: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

Heclo, Hugh (1974). Modern Social Politics in Britain and Sweden: From Relief to Income

Maintencance, Yale University Press.

Kling, Rob and Geoffrey McKim (2000). “Not Just a Matter of TIme: Field Differences and

the Shaping of Electronic Media in Supporting Scientific Communication.” Journal of

the American Society for Information Science 51(14): 1306-1320.

Latour, Bruno (1983). Give Me a Laboratory and I will Raise the World. Science

Observed. K. Knorr-Cetina and Michael Mulkay (ed.). Bevery Hills, Sage.

Latour, Bruno (1986). “Visualization and Cognition: Thinking with Eyes and Hands.”

Knowledge and Society: Studies in the Sociology of Culture Past and Present 6: 1-40.

Latour, Bruno (1987). Science in action : how to follow scientists and engineers through

society. Cambridge, Mass., Harvard University Press.

Latour, Bruno (1988). The pasteurization of France. Cambridge, Mass., Harvard University

Press.

MacKenzie, Donald (2001). Mechanizing Proof: Computing, Risk and Trust. The MIT Press,

Cambridge, Massachusetts.

Meyer, J and B Rowan (1977). “Institutionalized Organizations: formal structure as myth and

ceremony.” American Journal of Sociology 83(340-63).

Meyer, M.A and J.M. Booker (1991). Eliciting and Analyzing Expert Judgement: A Practical

Guide. London, UK, Academic Press.

Mintzberg, Henry (1992). Structure in Fives: Designing Effective Organizations. Englewood

Cliffs, N.J, Prentice-Hall.

Mintzberg, Henry and Alexandra McHugh (1985). “Strategy formation in an Adhocracy.”

Administrative Science Quarterly 30(160-97).

Mitchell, Timothy (2002). Rule of experts : Egypt, techno-politics, modernity. Berkeley,

University of California Press.

Page 34: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

Monteiro, Eric and O Hanseth (1997). “Inscribing behaviour in information infrastructure

standards.” Science, Technology & Human Values 21(4): 407-426.

O'Connel, Joseph (1993). “Metrology: 'The Creation of Universality by the Circulation of

Particulars'.” Social Studies of Science(23): 129-73.

Pfeffer, J and G Salancik (1978). The External Control of Organizations. New York, Harper

and Row.

Porter, T.M. (1994). Information, Power and the View from Nowhere. Information Acumen:

The Understanding of Use of Knowledge in Modern Business. L Bud-Fierman (ed.).

London, Routledge.

Rabinow, Paul (1999). French DNA. Chicago, University of Chicago Press.

Rapp, R (1998). “Refusting prenatal diagnosis: The Meanings of bioscience in a multicultural

world.” Science, Technology & Human Values 23(1): 45-71.

Rose, Nikolas (????). Governing "advanced" liberal democracies. Foucault and political

reson. Andrew Barry, Thomas Osborne and Nikolas Rose (ed.). Chicago, University

of Chicago Press: 37-64.

Schuler, D and A.Namioka (1993). Participatory Design: Principles and Practices. New

Jersey, Lawrence Erlbaum Associates.

Scott, W.R. (1992). Organizations: Rational, Natural and Open. Englewood Cliffs, NJ,

Prentice-Hall.

Star, S.L. (1983). “Simplification in Scientific Work: An Example from Neuroscience

Research.” Social Studies of Science 13: 205-228.

Star, S.L. (1999). “The Ethnography of Infrastructure.” American Behavioral Scientist 43:

377-391.

Page 35: Dissertation - University of California, San Diegointeroperability.ucsd.edu/docs/04Ribes_DissertationProspectus(re-write).doc · Web viewThe two examples above, apart from the hardware/software

Star, S.L. and K. Ruhleder. (1994). Steps Towards and Ecology of Infrastructure: Complex

Problems in Design and Access for Large-Scale Collaborative Systems. NC, USA,

Chapel Hill.

Weber, Max (1978). Economy and Society. Berkeley, Ca, University of California Press.