next generation research and the university of...

20
Next Generation Research and the University of California: Planning for the Future of UC’s Cyberinfrastructure A report on the UC VCR-CIO 2015 Summit December, 2015

Upload: lykhuong

Post on 16-Mar-2018

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

Next Generation Research and the University of California:

Planning for the Future of UC’s

Cyberinfrastructure

A report on the UC VCR-CIO 2015 Summit December, 2015

Page 2: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 2

UC VCR-CIO Cyberinfrastructure Vision Steering Committee

Michael Pazzani, Vice Chancellor of Research and Economic Development UC Riverside Sandra Brown, Vice Chancellor of Research UC San Diego Tom Andriola, CIO UC Office of the President Larry Conrad, CIO UC Berkeley Jim Davis, Faculty Representative and Conference Panelist UCLA

Willeke Wendrich, Faculty Representative and Conference Panelist UCLA Terry Gaasterland, Faculty Representative and Conference Panelist UC San Diego MacKenzie Smith, University Librarian UC Davis

Support provided by Charles Rowley, Assoc. VC and CIO UC Riverside

The UC VCR-CIO Cyberinfrastructure Writing Group

Jim Davis, UCLA (Chair) David Greenbaum, UC Berkeley David Minor, UC San Diego Arash Naeim, UCLA

Valerie Polichar, UC San Diego Charles Rowley, UC Riverside Yvonne Tevis, UC Office of the President

Conference Website: http://cnc.ucr.edu/uccybersummit/

Page 3: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 3

Next Generation Research and the University of California: Planning for the Future of UC’s Cyberinfrastructure

Table of Contents

Introduction

Compelling Case for a UC Next Generation Cyberinfrastructure

UC Researchers – Advancing Cyberinfrastructure as a Systems of Systems

Definitions

o Central/shared services

o Federated services

o Intercampus services

o Cyber facilities

o Cyber collaboration infrastructure

o Platforms

o Sociotechnical infrastructure

Positioning UC – Cyberinfrastructure Needs

Recommended Actions

o Create a UC Cyberinfrastructure Alliance tasked to define, build, stage and

orchestrate federated and centralized operations and policy

o Develop systemwide and campus “Cyberinfrastructure Mediator” support

o Develop an effective systemwide “marketplace” for research cyberinfrastructure

o Make research data an institutional asset

o Develop cyberinfrastructure “connective tissue” and associated tools to join

services, create cyber platforms, and enable federated services

o Develop approaches to scale discipline-similar requirements across campuses

o Position health, patient and clinical data for research access, patient care, and

other strategic uses

o Build on UC’s expertise via a development structure for UC researchers and

support staff

Immediate Next Steps and Moving Forward

Summary and Conclusion

Appendices

Page 4: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 4

Next Generation Research and the University of California: Planning for the Future of UC’s Cyberinfrastructure

Introduction The modern research landscape continues to evolve dramatically: Science and digital

scholarship are becoming data driven (e.g. precision medicine, prevision agriculture,

climate modeling, etc.), research now occurs in increasingly collaborative environments,

time to action and outcomes is a key driver, researchers must be both domain and data

experts, and data as a language enabling research and scholarship is the new normal.

Moreover, as researchers address more complex problems, an individual researcher in

isolation no longer develops a hypothesis, conducts experiments, collects data and analyzes

the data and the hypotheses. Rather, teams of researchers with different expertise and

perhaps in different locations are needed these grand, complex issue.

This new environment is driving change and presents new challenges and opportunities,

from ethics to data access to human analytical capacity. Clearly, this evolving environment

requires the University of California (UC) to consider and plan for its collective future, and

a thoughtful research cyberinfrastructure strategy is required to ensure UC addresses

these challenges and that every opportunity is leveraged.

The costs of not addressing this collective UC need are significant. The grand, complex

challenges facing humankind can only be resolved with robust, coordinated, and

collaboratively utilized cyberinfrastructures and related services and support. If UC does

not act, the University will have reduced capacity to address these challenges and realize

increasingly competitive funding opportunities that are trending toward resolving these

issues. Further, resting on UC’s collective laurels is not an option: lack of action risks

compromise to UC’s world-class reputation.

The 2015 VCR and CIO Cyberinfrastructure Summit was a call to action. Based on

conference themes and observations, this cyberinfrastructure vision document offers

prioritized recommendations and action plans to move the UC research enterprise to the

next level in its ability to innovate, collaborate, attract funding, and blaze new trails. This

plan has been reviewed and vetted by UC’s VCRs, CIOs, Librarians, and the over twenty UC

faculty members who served as conference panelists.

The roadmap described herein will enable UC to optimally support the future success of its

research enterprises. Data-driven science, digital scholarship, and the associated (and

enabling) cyberinfrastructures this vision document discusses are core to UC campuses, its

laboratories, and to the University of California’s collective ability to address the grand

challenges facing California, the nation, and the entire world.

Compelling Case for a University of California Next Generation Cyberinfrastructure

Page 5: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 5

A decade ago at the 2005 UC VCR-CIO Summit, the emphasis was on the cyber facilities

needed to provide capacity and capability for high-performance computation-based

research. By the 2011 Summit, the tenor of the discussion had shifted from cyber facilities

to a direct focus on the researcher-defined, front-end research capabilities that comprise

cyber collaboration infrastructure. The 2015 Summit revealed a much more extensive

cross-disciplinary research interest, an increased diversity of targeted uses, and an

expectation of precision in findings, predictions and insights. All disciplinary areas now

depend on data and analytics in some way. The 2015 Summit featured widely cross-

disciplinary breakout sessions, and all disciplines noted the importance of infrastructure

and expertise to support research data management, preservation and analytics. Facilities

such as compute, storage and transit were presumed to be essential, but are not always

present at the necessary levels. The term “informatics” was employed frequently. The

expected precision of solutions and team-based informatics amplified the dependence on

agile and flexible research tools that facilitate shared, team-based research. This in turn

generated further need for more purpose-built integrated, end-to-end collaboration

infrastructure capabilities, which are referred to here as platforms. The institutional role,

and the need for platforms that no single researcher or research group can individually

provide was underscored, along with the role of people and the importance of

sociotechnical infrastructure.

All in all, the 2015 Summit provided a compelling case for action:

The grand challenge problems facing researchers today will require collaboration —

across disciplines, campuses, and capabilities — on an unprecedented scale to solve. UC

has the world-class faculty needed to address these challenges, but it must develop the

“connective tissue” to bring researchers, data, tools and capabilities together across

traditional boundaries, and must treat research data as a strategic asset.

To continue to attract top-notch faculty, students and staff, UC must continue to provide

top-notch facilities, including cyberinfrastructure. Technology facilities that were novel

and groundbreaking ten years ago are no longer sufficient to support modern research

methodologies or attract researchers in critical fields.

The ability to attract substantial grant funding will increasingly depend on the

capabilities and facilities available to the researcher, and to the partnerships

researchers can forge across disciplines to solve complex problems.

The increasingly data-driven research landscape means that data itself has become a

critical and valuable institutional asset. Effectively managed, curated and shared data

has both reputational and financial benefits that are increasing in importance over time.

In order to prepare our students for success, whether in further studies or in the

increasingly high-tech commercial world, UC must provide them with instructional and

research opportunities that employ cutting-edge technology, and give them access to

Page 6: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 6

tools, data, support and resources to maximize its use.

The NSF’s long-term vision for cyberinfrastructure stresses that the complexity of

research analytics is increasing, and that solutions will demand approaches to big data,

strategies to deal with data from new technologies, interoperable capabilities and

resources, partnerships, and rational data access, analytics and archiving strategies.

These trends are reflected by other agencies and initiatives as well.

Facilities infrastructure is still important, but now it must be joined by development of

tools, middleware/connective tissue, platforms, and sociotechnical infrastructure (in

particular support, training, and facilitation) necessary to enable true collaborative

research. A new emphasis on data means partnering between appropriate technical and

Library entities to achieve goals important both to individual researchers and to the

institution as a whole. Creating a central entity to identify, develop and integrate tools and

best practices, to facilitate the sharing of our system's best solutions, and to prepare our IT

staff to serve a new generation of researchers is critical to realizing efficiencies, attracting

and nurturing novel research, top faculty, and research dollars, and maintaining/bettering

our reputation as an institution.

UC campuses are individually recognized as world-class research universities. Each campus supports a wide range of research and each campus claims particular areas of research leadership. When UC’s research areas, grants, patents, scholarship recognitions, etc., are considered as a whole, the University is unrivaled as an institution. Indeed, the University of California already has resources that are unrivaled by any other university system. These include:

The San Diego Supercomputer Center (SDSC) at UCSD was established as one of the nation’s first supercomputer centers by the National Science Foundation (NSF) The Center opened its doors on November 14, 1985 and has recently launched Comet, SDSC’s newest HPC resource, a petascale supercomputer.

The California Research and

Education Network (CalREN) is a multitiered, advanced network managed by the Corporation for Education Network Initiatives in California providing connectivity to UC campuses at speeds up to 100Gbs.

Next-generation Research and Supporting Cyberinfrastructure: a geosciences researcher at one UC campus sees a program solicitation that would benefit from collaboration across multiple fields. She heads to a faculty profiling system, where she determines a list of potential collaborators and contacts them to hone a proposal team. Her campus digital technology resource advisor works with colleagues at the partner campuses to develop a compelling facilities description to accompany the proposal. The multi-campus, cross-disciplinary approach nets a large, multi-year grant. Once the proposal is awarded, the team selects appropriate data management, collaboration, analysis, visualization and workflow tools from a central marketplace to make collaboration seamless for their particular domain. They use the library to obtain geospatial data that they use to hone their data collection. Team members use local and remote instruments to collect experimental data, and easily analyze or visualize each other’s data without regard for the data’s origination or compute location. In addition to the published papers resulting from the work, the resultant data sets are curated and made available to future researchers in digital libraries. There the data are cited frequently, and are reused to both replicate the experimental work of the project and to build on that foundational work to take science forward.

Page 7: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 7

Lawrence Berkeley Labs is the home National Energy Research Scientific Computing Center, or NERSC, one of the world’s leading supercomputing centers for open science which serves nearly 6,000 researchers in the U.S. and abroad.

As part of UC Biomedical Research Acceleration, Integration, and Development" (UC

BRAID), the UC ReX Data Explorer is a secure online system that enables cross-institution queries of clinical aggregate data from 14+ million de-identified records.

Nevertheless, for the most part, UC research and cyberinfrastructure capabilities are

operationally separated by campus, with little inter-campus visibility, access or interaction.

Both in research and in cyberinfrastructure, UC is perceived as ten individual campuses,

not as a system. Indeed, UC has a history of competing as individual campuses rather than

aggregating strengths as a system or cluster of campuses when responding to state and

national initiatives and funding opportunities.

The NSF recently noted, “Although team science promises to address increasingly complex

scientific questions, conducting research collaboratively can introduce challenges that slow or

prevent projects from achieving their scientific goals.” In order for UC to be successful, we

must embrace this trend and build the experience with the tools to achieve frictionless access

to services and support.

UC Researchers – Advancing Cyberinfrastructure as a Systems of Systems

The cyberinfrastructure of the future will allow the researcher to leverage far more

integrated tools and services than is easily available today, with enormous potential impact

on the ability to win grants, achieve impact, and advance the reputation of the institution.

Importantly, cyberinfrastructure as a “systems of systems” makes on-demand, composable

cyberinfrastructure solutions a reality.

Researchers at the recent Pacific Research Platform Workshop at UCSD (October, 2015)

presented many specific and compelling science drivers for a more mature research

platform environment — and not just that represented by the PRP’s new high-speed

network to join campus science DMZs. Examples of science that could benefit from a more

comprehensive approach include the work of Sergio Baranzini, professor of neurology at

UCSF. He describes a set of research challenges that include democratization of data

collection equipment, distributed data generation, and individualized analysis. Frank

McKenna’s research at the Pacific Earthquake Engineering Research Center at UC Berkeley

involves the collection of relatively small data over long periods of time, and collaborations

with other university researchers, government, and industry. He notes that in an ideal

world, “it would appear to me like local files and applications were on my desktop. I just

define the workflow and the system figures out where to run it.” These descriptions sound

very much like that of the platforms described below that should be built to accommodate

such research. Daniel Cayan, a researcher at the University of California, San Diego Scripps

Page 8: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 8

Institution of Oceanography, describes needs for a “better catalogue describing available

data, better access tools for a range of users (small to large), remote processing and

analysis tools, and more accessible expert network knowledge” — approaches to address

most of these needs are described below.

Definitions

Because jargon is rampant and terminology is used differently in different contexts, the

following guide is provided to define the cyberinfrastructure terms contained in this

document:

Central/shared services are those managed centrally for the good of all campuses. Such

services are provisioned centrally and access is extended to campuses. An example of a

central/shared service is the California Digital Library shared subscriptions.

Federated services are distinguished from centrally shared services with respect to

approach, resources and operations. A federated service is a value-driven coordination

of services drawing upon the strengths and diversity of the distributed approaches, and

typically involves a coordinating front end plus middleware to provide access to

distributed back-end systems. An example of a federated service is the UC ReX secure

patient data search service, which allows secure searching for patient cohorts that are

assembled from multiple campuses’ independent patient data stores.

Intercampus services are services developed and managed locally by one or a set of

campuses, which are made available to other campuses within the UC system. An

example of an intercampus service is the SDSC Colocation facility.

Cyber facilities include the physical compute, storage, data center and network facilities

and the operational standards, software and code that comprise the computational,

storage and network system layers of cyberinfrastructure. Facilities also include

sophisticated routers, servers, fiber, cabling, data centers, power and cooling, etc.

Cyber collaboration infrastructure describes the tools, applications and processes that

are layered on the cyber facilities:

a. collaboration tools for multiple research groups to work together with analytics,

modeling, simulation and visualization capabilities

b. software-based processes for data management, data modeling, curation,

preservation, and aggregation for accessing, reusing and building broadly used

research data assets, as well as protecting and securing them

c. cyber environments for readily promoting, accessing, using and collaboratively

building software applications, i.e., research software stores

Page 9: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 9

d. networked tools and search mechanisms for discovering and accessing expertise,

both formally and informally and in directed team-based projects, to spark

innovation, discovery and trial

e. network-based channels for conducting team-based R & D securely, tech transfer

that manages IP, processes that manage regulated data, etc. not only within higher

education, but also with commercial and industry partners, recognizing that data

are valuable intellectual property and technology transfer assets

Platforms combine cyber facilities (now considered basic needs) and cyber collaboration

infrastructure (new, enabling tools and processes) with “connective tissue” (e.g.

middleware, front ends, networks) to create integrated cyberinfrastructure capabilities

and services that, in aggregate, offer new functions, often taking into account the full

research data life cycle or the end-to-end process of collaboration. An institutional

research cyberinfrastructure platform might, for example, integrate network,

computation, data, workflow and security facilities and services to facilitate the ability

of researchers at different locations and institutions to progressively analyze data sets.

Mobility services might be added to facilitate distributed human-centered data input.

Different database structures might be integrated to facilitate different data analysis

and integration needs. A HIPAA-compliant platform might make it possible to do health

sciences research involving patient data. Discipline-specific platforms could be built

separately or over general-purpose platforms. Platforms are typically federated

environments (e.g. the upcoming Pacific Research Platform network) joining the

strengths of multiple campuses.

Sociotechnical infrastructure – this term, in increasing use in higher education, refers to

the technical expertise, guidance, workflow, procedures, interfaces and other human-

technology interventions (such as the Cyberinfrastructure Mediator service described

later in this document) that facilitate the use of cybertechnologies by humans in the

research environment. The importance of this type of service was stressed at the

conference and must be developed in concert with the facilities and infrastructure that

accompany it.

Positioning UC – Cyberinfrastructure Needs

The 2015 Summit generated a spectrum of topics worthy of consideration. However, seven

of these received particularly strong, cross-disciplinary attention, as measured by how

often they surfaced in the disciplinary sessions and summit panel sessions. These seven

resonant priorities form the basis for the recommended actions:

Cyberinfrastructure “concierge” service (here called Cyberinfrastructure Mediator)

Collaboration tools, portals, and services

Page 10: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 10

Storage vision and ecosystem

Data management, curation, metadata / interoperability

Data access – UC and beyond

Skills development, training, “boot camps”

Policies and ethical considerations

These can be further organized into actionable themes: (1) make cyber collaboration tools

accessible, (2) expand and scale skills and access to expertise, (3) support data as research

assets to be managed, curated, and preserved; and (4) bring it all together into a platform

“ecosystem” of federated services, systems, and support. The following notes provide

detailed thoughts and notes concerning these themes and UC’s emerging

cyberinfrastructure needs:

Make cyber collaboration tools accessible

° Enabling a broader base of researchers. Easier-to-use, self-guided and more highly

abstracted transformative tools and services that embody informatics expertise will

enable a broader base of researchers to conduct novel research without having to

develop or invest in the same expertise. In addition, new models for research

informatics support will support researchers who may be in silos or who lack resources

to establish independent infrastructure and support systems. Such models may also

realize cost savings. Emerging technologies and access to standardized approaches to

data management will be accessible to all faculty, including those in fields where such

capabilities have traditionally been underdeveloped. Finally, widely available training

for students, research staff and faculty in applying new technologies to research will

help develop cyberinfrastructure skills into standard research techniques.

Expand and scale skills and access to expertise

Cross-disciplinary collaboration. Collaboration and partnerships across departments,

schools and fields of study will increase our ability to solve complex research problems.

Innovative approaches for generating, collecting, and analyzing data to bridge disciplinary

languages, dictionaries, and areas of interest will provide vast opportunities for cross-

disciplinary researchers to share ideas, data, tools, and algorithms and to approach

research and global problems with a shared context. However, such collaborative

approaches and data driven inquiry require new skills and approaches to support success

within a shared, interdisciplinary context. UC therefore needs to invest in the development

and growth of both its researchers and information technology staff across the system.

Support data as research assets to be managed, curated, and preserved

Page 11: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 11

° Data ownership and big data. Big data has three attributes: volume (scale), variety (its

many forms, e.g., structured/unstructured, text, multimedia), and velocity

(dynamism/real-time qualities). The ability to more readily collect, access and analyze

data beyond the walls of the institution, and to store and analyze large amounts of

disparate data (or big data) generated both locally and distally, will increase

opportunities for new kinds of research, analysis and decision-making. Real-time

dynamic data and analysis will transform traditional research approaches and

methodologies by accelerating the generate-analyze-apply-learn cycle. Systems will use

networked, information-based technologies to integrate intelligence in real time across

entire enterprises and will use data-driven modeling, simulations and Key Performance

Indicators to communicate optimal actions and results in real time. There are

significant policy, regulatory, security, privacy, and ethical issues to be managed.

° Multi-use data. The line between operational, business and research data is blurring.

Data is quickly becoming dual-purpose or multi-use as organizations integrate potential

research data collection more seamlessly into business workflow and operations. Policy

and governance will be critical to efficiently and effectively manage data in

organizations with potentially multi-purpose data innovative approaches to human

subjects protection and compliance issues. Business operations will have to consider

how to support business and research simultaneously.

° Data visualization. Of increasing importance for managing large data sets, data

visualization involves the graphic display of data too complex for manual processing or

assessment; the resultant imagery is typically the end result of an algorithmic process

or generated from large-scale data sets. It encompasses a broad range of analytic tools

and techniques that include statistical visualization, GIS, and 3D modeling, all of which

share the common goal of organizing data into a coherent visual display that can be

readily interpreted and understood.

Platform “ecosystem” of Federated Services, Systems, and Support

o A federated but connected and interoperable infrastructure of platforms. UC campuses

and medical centers can and should build tools, services, and infrastructures to address

compelling local needs and support research innovation via agile and nimble service

provisioning. However, a federated approach will allow UC to discover ways to share,

leverage, and connect campus and medical center capacity (including data and data

services) to enhance UC’s collective research enterprise as a system of systems.

Such an approach is key to helping the campuses enhance capacity and capability

individually and across the system. Federated infrastructures will extend the tools and

capabilities that form an institutional “nervous system” (distributed resources,

capabilities, expertise, policy and ethics) through which data can be moved and

methodologies accessed. Organized for campus leverage, these federated platforms will

Page 12: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 12

cultivate individual researcher capability. Mobile information and communication

technologies will play a major role. Policy will be an important driver, and initiatives

must reflect the ethical values that the UC wishes to project.

An example of such an approach is UC ReX, which begins with the premise that all six

UC medical centers have independent and effective clinical operations. Through a

federated approach, it is now possible to share patient cohort data so that each medical

center can use all of UC’s data to perform research and optimize clinical strengths (e.g. a

clinic that specializes in Alzheimer's has additional data for therapy optimization and

precision).

Recommended Actions

UC should begin by focusing efforts on the first two actions below. It is not necessary, nor is it

advisable owing to human resource limitations, to initiate all of these action items at once;

however, UC should complete at least the first two within twelve months in order to realize

measurable benefits quickly and to provide momentum for completing the effort.

Action 1: Create a UC Cyberinfrastructure Alliance tasked to define, build, stage and

orchestrate federated and centralized operations and policy.

The Cyberinfrastructure Alliance should be established and staffed as the initial federation

operating entity. As a start-up itself, the Alliance will be responsible for prioritizing

federated capabilities, commissioning working groups and supporting and orchestrating

the activities of each working group. This Alliance would start small, with the

recommended Actions indicated in this document; if proven effective, it could grow into a

larger and more permanent body. (Please see the note comparing the Cyberinfrastructure

Alliance organizational structure to CENIC on page 19 of this document.)

The Cyberinfrastructure Alliance will include a Federated Governance Board (FGB) as well

as project management and other support positions, since it will need to coordinate and

manage resources from the beginning. As capabilities become operational and others enter

the development process, the Alliance will become an operating entity. It is recommended

that the Federated Governance Board (FGB) comprise two VCRs, two CFOs, two CIOs, two

librarians and several key faculty members from multiple campuses. The

Cyberinfrastructure Alliance will interact with campuses through existing senate and

administrative structures, as well as create events, such as workshops, to define, shape and

build operational direction and interest and to build the infrastructure needed to facilitate

capability.

In time, the Cyberinfrastructure Alliance will address all the projects, initiatives, and

themes that surfaced during the Cyberinfrastructure Conference. The following notes

Page 13: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 13

describe how the Cyberinfrastructure Alliance will position UC for the successful delivery

of these projects and initiatives:

Policies, Practices, Procedures, Organizations – Enhance, Modify, and Create Policies,

Practices, Procedures, and Organizations that Enable Federated and Intercampus Services

Enabling the UC Research Enterprise.

At present, UC is not organized operationally or financially to facilitate federation or

intercampus services. Some services exist despite this gap: for example, SDSC’s

provision of intercampus colocation services, or the federated UC ReX system for

securely searching patient data. In general, however, policies, practices, and incentives

often encourage the creation rather than the dissolution of silos. Although a

“federation” is challenging to the currently fully decentralized business and financial

structures of the UC system, it is highly valuable and should include the ability to

interoperate with services, facilities and support from across the United States and

beyond, as well as within UC. While difficult, UC should tackle and promote the

development and use of federated or intercampus-accessible services. The following

actions and organizing principles are essential to developing and promoting these

services:

o Marketplace of Services and Support. Establish as an organizing principle for a

systemwide “Research Cyberinfrastructure Marketplace” consisting of available

central/shared, federated and intracampus-offered services, platforms, technical

expertise, and accessible, reusable research data [see Action 3 for initial development

steps] with an emphasis on federating offerings for the benefit of all campuses.

Precisely because of the broad nature of individual campus research strengths, UC is

well positioned to build and demonstrate the power of federation. UC federated

services would allow individual campuses to retain their interests and strengths,

and to build on them and draw on crossover strengths where there are multi-

campus benefits. Federation should be used to create interoperability opportunities

that take advantage of the system, infrastructure and expertise at each campus for

the purposes of accelerating, enhancing and promoting the development of each

campus’s unique research strengths.

There are several national Marketplace models that could be considered or mixed

(e.g. Smart Manufacturing through the Smart Manufacturing Leadership Coalition,

Industrial Commons through the Digital Manufacturing Design Innovation Institute,

Hubzero at Purdue, and Community Apps Sharing Architecture through the

Instructional Management Systems Coalition. Such an environment would make

available software, data, and service resources and allow users and developers to

review, access or contribute to the store to address a wide variety of research needs.

Tools, standards and best practices might also be made available in the Marketplace,

Page 14: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 14

as well as brokering of cloud services. Such a Marketplace would need to be

continuously updated, and a process of inventory and discovery put into place to

work alongside it.

o Infrastructures, Tools, Services to Enable Federation. To make this work, determine

the appropriate infrastructure (such as network connectivity and systems

interoperability), transparency, and incentives necessary to facilitate federated

resource sharing between campuses [see Actions 4, 5, 6, 7 and 8 for initial

development steps]. Federated resources must not be determined solely in a top-

down, system-level manner, but must be allowed to emerge from individual or

collaborative campus efforts and identified and selected for federation. Bottom-up

structures are often more agile, approach new technologies sooner and address a

broader range of disciplinary and cross-disciplinary research activities. Top-down

transparency, organization and facilitation can be combined with campus-level

development, expertise, and emerging skills to maximize impact.

o Cyberinfrastructure Alliance Guiding and Organizing Principles. The following

guiding / organizing principles will allow UC to break down policy barriers to

collaboration with specific timelines and the following deliverables:

o Inventory of services, systems, and support. Strategies are needed to communicate

the existence of shared services and to facilitate inter- campus use of such

devices, systems, tools, and services.

o Institutional support for sharing services across the UC system. The barriers to

entry for intercampus sharing and for utilizing common tools across campuses

must be eliminated or greatly reduced. These barriers include financial, cultural,

incentive, policy, and organizational constraints.

o Federated services strategy. Services should be selected for federation where

such action would lead to significant improvement in the technical support and

trusted partnerships that UC researchers most need, in a reasonable period of

time and in a cost-effective manner. Importantly, not all campuses must utilize a

particular service, nor it is necessary for all shared services to be provided by

UCOP or a particular campus or center. Rather, UC’s strategy should recognize

that intercampus collaborations of two or several campuses or research centers

might generate significant efficiencies and benefits. (This does not preclude such

services being identified as shared service opportunities at a later time.)

o Common approach to data access, security, etc. UC does not have a common

(campus, discipline, health sciences) approach to data access, security,

availability, etc. UC should develop and support a suite of transparent policies,

procedures, and incentives that are easy to understand / utilize and that

promote the wide availability of data and resources within UC. Issues that must

Page 15: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 15

be addressed include compliance (e.g., HIPAA), security, bio-ethical topics, and

clinician / researcher relationships.

o Ethical considerations. As access to data increases, UC must ensure appropriate

policies and standards for privacy, confidentiality, data ownership, public /

private partnerships, etc., are considered and adopted.

o External (non-UC) data. UC must investigate policies and practices relating to

data security, access, privacy, etc., that will facilitate the acquisition of data from

organizations, firms, and other groups outside UC.

Successful Delivery of Federated Services and Support – Service Delivery Approaches

Designed for Success.

Federation must be viewed as an operation in its own right that facilitates and sustains value-driven federation-oriented policy, infrastructure activities and interoperability collaborations, which together produce measurably increased campus and collective research capability and capacity. In sharp contrast to centralization, federation involves sustaining an evolutionary development lifecycle that will generally consist of the following steps:

1. Identification of a high-potential federated capability

2. Inventory and visible exposure of campus capabilities, e.g., websites and workshops

3. Detailed review of federated potential, consideration of approaches and funding, policy and capacity needs/barriers

4. Highly visible pilot orchestrated with a small subset of campuses to champion, demonstrate and shape an approach

5. Resolution of funding, policy, infrastructure or capability barriers

6. Scaling from the successful pilot, moving to operational requirements and scaling to critical mass interest

7. Adjusting and sunsetting a capability when requirements, technologies and value changes.

To execute on this development pattern, a working group for each potential federated

capability needs to be identified. Each working group must be supported with

increasing involvement and project management. This will ensure the demonstration of

value and review on the merits of capability, and will avoid the loss of capabilities

because of lack of support, resources or commitment at any one step. Federated

capabilities that survive the pilot process need to be able move into a managed

operational start-up and scale-up mode with identification of appropriate federated

value, investment in resources, and resolution of policy barriers. The VCR-CIO Summit

identified a first slate of candidate federation capabilities. The descriptions for each of

Page 16: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 16

the following recommended actions provide proposed agendas for the associated

working groups.

Per the project delivery strategy noted above, for each of the following actions, these basic

steps are envisioned:

1. Create a cross-campus working group to guide development

2. Survey existing offerings, tools & services; contribute to the central inventory and

identify federation possibilities for the Marketplace

3. Produce an online “best practices” guidebook/manual for campuses

4. Initiate a process to actively monitor/maintain the landscape over time

Action 2: Develop systemwide and campus “Cyberinfrastructure Mediator” support.

During the UC Cyberinfrastructure Conference, participants uniformly recommended

creating a “concierge” service, a capacity for digital technology resource guidance that

brings federated expertise and capabilities together to deliver appropriate

cyberinfrastructure services to meet individual research needs. This important capacity

has been named UC’s Cyberinfrastructure Mediator partnering and support function, and it

aims to reduce the time faculty spend bringing cloud, national, UC wide and local campus

cyberinfrastructure capabilities together to address research goals and objectives. UC’s

Cyberinfrastructure Mediators will sponsor and create “ask an expert” services and provide

“how to do things or get things done” guidance; it is also envisioned that these support staff

will partner very closely with UC faculty and provide synergistic input relating to how

technology and technical services can be leveraged to address research challenges and

promote collaboration and next-generation science.

Action 3: Develop an effective systemwide “marketplace” for research

cyberinfrastructure.

The University of California has a vast array of cyberinfrastructure tools, services, support

and related data that facilitate and enable its research enterprise. However, in general,

these various cyberinfrastructures and associated data are not readily available to

researchers who do not “own” or have not directly participated in the provisioning of a

particular tool or service. Indeed, such siloed environments exist at the campus and

medical center levels.

As a result, leveraging UC’s collective cyberinfrastructure is relatively difficult and can be

quite costly given that each new partnership requires discovery of a particular service,

developing an understanding of how it might be federated or shared, discussion of fiscal /

financial issues, and addressing data interoperability and challenges. UC should therefore

create of a “Research Cyberinfrastructure Marketplace” with very low barriers to entry and

Page 17: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 17

commensurately low “transaction processing” costs for sharing, utilizing, and federating

services and support.

The “Research Cyberinfrastructure Marketplace” will be built on a clear understanding of

available central/shared, federated and intracampus-offered services, platforms, technical

expertise, and accessible and reusable research data. UC’s Cyberinfrastructure Mediators

will routinely refer researchers to UC’s Research Cyberinfrastructure Marketplace as an

option for obtaining services and support, and the marketplace itself will expand over time

as UC’s Cyberinfrastructure Alliance identifies and acts on opportunities for expanding

federated service offerings. Importantly, other suggested action in the document will lower

“barriers to entry” and “transaction costs” associated with utilizing the Research

Cyberinfrastructure Marketplace (see, for example, Action 4 relating to research data as an

institutional asset).

Associated Action - Build a shared software store.

One tangible component of UC’s cyberinfrastructure marketplace will be a software

brokerage infrastructure and appropriate policy for sharing/promoting/buying cloud

software applications across the UC system. Similarly, the UC federation should be set up to

facilitate a technology channel for data and software with respect to internal and external

partnerships. Collectively, UC research is a major producer of software, and this asset can

be leveraged within the system to enhance research achievements for all.

Action 4: Support research data as an institutional asset.

UC must acknowledge the role of research data as valuable University intellectual property,

and to develop and implement a set of guidelines for its management. Further, UC must

develop new — and integrate existing — tools and services based on these guidelines,

bringing together local campus data management initiatives and system-level tools where

appropriate. The libraries’ critical role in building research data into a University research

asset emerged strongly in the Summit — issues relating to data management (short and

long term), data quality, curation, retention practices, and metadata structures that enable

interoperability, etc., are foundational to optimizing UC’s effectiveness and cementing UC’s

reputation as a leader. UC must leverage expertise within its libraries and partner with

technology organizations to address this important need.

Action 5: Develop cyberinfrastructure systems “connective tissue” and associated

tools to join services, create cyber platforms, and enable federated services.

It is essential to develop platform tools that bring researchers and their work into a more

visible, discoverable state to facilitate shared expertise and to increase the potential of

collaborations. For example, how does one researcher find another researcher doing

Page 18: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 18

something similar with cyberinfrastructure, especially across disciplines? Collaboration

tools, federated data and database interfaces, interconnected networks and more are

required to empower UC as a system. UC needs to agree on standards and build the

necessary “connective tissue”: campus network interconnects, middleware, scheduler

technologies, cloud service management technologies etc. to make it possible for federated

services, databases, facilities and tools to interoperate. This will make it possible to take

advantage of cross-system and commercial cloud technologies to assemble services for

particular research needs, and may also realize efficiencies.

Action 6: Develop approaches to scale discipline-similar requirements across

campuses.

Not all research areas have large, concentrated discipline-specific data needs that are

accommodated by formal structures such as centers. There is a huge diversity of research

and scholarship programs working with smaller but equally valuable data assets that lack

the ability to scale, share, and leverage data resources across campuses, and when

appropriate, the entire UC system. UC should leverage institutional and cross-institutional

discipline-specific data resources to allow smaller data assets to take advantage of shared

resources. Importantly, this initiative will also bring together UC faculty and practitioners

from across campuses who are thought leaders in their fields; such collaborations will

enable faculty to exchange innovations and novel approaches as well as discuss and resolve

field-specific data challenges.

Action 7: Position health, patient and clinical data for research access, patient care,

and other strategic uses.

The five UC medical centers and many health science programs and their attendant health,

patient and clinical data are unparalleled data assets for research. The UC ReX (discussed

above) and Big Cogito pilot are examples. Key challenges will be standardization of

terminology across UC, and the development of appropriate policies and data governance

that allow the UC to simultaneously work as one collaborative system in certain situations

while promoting healthy competitive innovation and excellence as individual campuses. UC

must define a HIPAA-safe approach and infrastructure to advance research collaboration;

identify data workflows, interfaces, and data standards to allow for precision medicine

within the electronic medical records; provide ready access to de-identified clinical data to

faculty outside of the school of medicine or outside of health sciences. In the process, UC

must examine challenges around specific types of data, highlight data visualization needs,

and engage patients and the community.

Action 8: Build on UC’s expertise via a development structure for UC researchers and support staff.

Page 19: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 19

Collaboration and partnerships across departments, schools, fields of study, campuses, and

medical centers and will increase UC’s capacity to solve complex research problems.

However, these collaborations and partnerships (that are often built around big data and

associated informatics / analysis) require new skills and approaches relating to

collaboration, data capture and analysis, systems and tools, and algorithms required to

approach research and global problems within a shared, interdisciplinary context.

UC therefore needs to invest in the development and growth of both its researchers and

information technology staff across the system.

The notion of cyberinfrastructure support staff includes the full range of domain experts

who choose non-faculty career paths supporting researchers, as well as technology experts

who are responsible for keeping research operations running. Professional development

will include the soft (interpersonal) and hard (technical) skills needed so that research

technology professionals can move comfortably from helping to address local problems to

participating in cross-campus and multi-campus collaborations. An ultimate goal of this

process should include the establishment of a UC community of well-connected research

technology consultants and cyberinfrastructure engineers who can serve as an adjunct

“community of experts” to the Cyberinfrastructure Mediators described above.

Importantly, UC must provide similar developmental opportunities for its faculty who now

require non-disciplinary expertise (data capture, data management, informatics / analysis

capabilities, etc.) and skills relating to collaborating with non-traditional colleagues and

partners (e.g. bridging disciplinary languages, dictionaries, areas of interest, etc.).

Immediate Next Steps and Moving Forward

As noted earlier in this document, UC is currently not organized to successfully provide or

facilitate federation or intercampus services. Indeed, current incentives and organizational

structures, in many cases, encourage the provisioning of tools, infrastructures, and services

in that are inherently not shareable and/or interoperable.

As a result, UC should create an organizational approach and formal structure to facilitate

and enable the federated, collaborative vision outlined in this document. This approach

will not only provide immediate benefit to UC’s research enterprise, but will also provide a

structure to prioritize, implement, and manage initiatives over the next several years and

beyond. It is therefore proposed that UC act on and complete the following two actions

items within the next twelve months:

UC should create the UC Cyberinfrastructure Alliance to provide oversight, guidance, and

structure for acting on the various recommendations resulting from the

Cyberinfrastructure Conference.

Page 20: Next Generation Research and the University of Californiacnc.ucr.edu/uccybersummit/images/ucvcrciosummitreportpositionpape… · Next Generation Research and the University of California:

UC VCR-CIO 2015 Summit 20

This body will create the framework necessary to support federated and intercampus

services. The UC Cyberinfrastructure Alliance will also prioritize initiatives and

coordinate overall project planning/management.

UC should develop a systemwide and campus Cyberinfrastructure Mediator style service

and begin development of a systemwide “marketplace” for research cyberinfrastructure.

This effort will produce a formal group that provides immediate service and benefit to

the UC research community, including cybersecurity recommendations, while the UC

Cyberinfrastructure Alliance is formed.

Summary and Conclusion

The University of California’s most successful and important technical collaboration is

CENIC (Corporation for Education Network Initiatives in California). CENIC is a federated

service insofar as UC campuses are able to build, operate, and optimize local networks to

meet campus needs, but the “connective tissue” that integrates these networks is provided

by CENIC. California’s federated educational network is a best-of-breed solution that is

critical to UC’s teaching, research, and public service missions.

Similar to the CENIC model for providing federated services, the proposed UC

Cyberinfrastructure Alliance and the recommended service implementation plans will

provide (and/or facilitate) the tools, platforms, data management practices, and other

services that will provide UC researchers and scholars frictionless access to the marketplace

of UC cyberinfrastructures and support. Importantly, this approach will not compromise

individual campus’ or researchers’ tools, systems, and initiatives, but will connect them in

novel and synergistic ways.

This effort to implement federated services will better position UC to support faculty like

Berkeley’s Frank McKenna (mentioned earlier in this report). Cyberinfrastructure related

tools and services will ideally “appear to me like local files and applications were on my

desktop. I just define the workflow and the system figures out where to run it.”

UC’s Vice Chancellors of Research and CIOs are ready to produce a plan for creating and

operationalizing the Cyberinfrastructure Alliance for three years. This operating plan will

include the creation of the campus Cyberinfrastructure Mediator service as well as a suite of

metrics and reports defining and measuring impacts and success. If these next steps are

approved, this plan will be created by February 2016.