the changing collections for escience: what it means for libraries

Download The Changing Collections for  eScience: What it Means  for Libraries

If you can't read please download the document

Upload: maylin

Post on 09-Jan-2016

14 views

Category:

Documents


0 download

DESCRIPTION

The Changing Collections for eScience: What it Means for Libraries. Julia Gelfand University of California, Irvine 26 February 2009. And from the UK:. - PowerPoint PPT Presentation

TRANSCRIPT

  • The Changing Collections for eScience: What it Means for Libraries

    Julia GelfandUniversity of California, Irvine26 February 2009

    *

  • And from the UK:Is about global collaboration in key areas of science, and the next generation of infrastructure that will enable itand the purpose of the UK E-Science initiative is to allow scientists to do faster, better, or different research.

    John Taylor, Director General of Research Councils Office of Science & Technology*

  • Making Sense of eScience?Courtesy of http://www.wordle.net

  • e-Science Definede-Science is not a new scientific discipline in its own right: is shorthand for the set of tools & technologies required to support collaborative, networked science. The entire e-Science infrastructure is intended to empower scientists to do their research in faster, better and different ways. (Hey & Hey, 2006)

    Cyberinfrastructure more prevalent usage of term in USNFS: Revolutionizing science and engineering through Cyberinfrastructure, 2003 (Atkins Report)Describes new research environments in which advanced computational, collaborative, data acquisition and management services are available to researchers though high-performance networks more than just hardware and software, more than bigger computer boxes and wider network wires.It is also a set of supporting services made available to researchers by their home institutions as well as through federations of institutions and national and international disciplinary programsMore inclusive of fields outside STM, emphasizes supercomputing & innovation*

  • Overview

    E-Science is the big pictureOpen Data is the goalDigital Repositories and Open Access are the methodsVision of joined up research can be the processCombining cultures, connecting people; means new roles for libraries

    *

  • Data Deluge?The End of Theory: The Data Deluge Makes the Scientific Method ObsoleteGoogles founding philosophy is that we dont know why this age is better than that one: If the statisticssay it is, that is good enough. No semantic or causal analysis is required. Thats why Google can translate languages without actually knowing them

    Chris Anderson, Wired Magazine, 23.06.08*

  • e-ResearchE-Science is a shorthand for a set of technologies and middleware to support multidisciplinary and collaborative researchE-Science program is application driven: the e-Science/Grid is defined by its application requirementsThere are now e-Research projects in the Arts, Humanities, and Social Sciences that are exploiting these e-Science technologies*

  • What is e-Science? Possesses several attributes E-Science is:Digital data drivenDistributedCollaborativeTransdisciplinaryFuses pillars of science:Experiment, Theory, Model/Simulation, Observation/Correlation

    Chris Greer, 16.10.08

    *

  • Context of E-Education

    E-Education is:Information-DrivenAccessibleDistributed InteractiveContext-AwareExperience and Discovery-DrivenPotentially personalized*

  • Beyond the Web?Scientists developing collaboration technologies that go far beyond the capabilities of the webTo use remote computing resourcesTo integrate, federate and analyze information from many disparate, distributed, data resourcesTo access and control remote experimental equipmentCapability to access, move, manipulate and mine data is the central requirement of these new collaborative science applicationsData held in file or database repositoriesData generated by accelerator or telescopesData gathered from mobile sensor networks

    *

  • Academic Research Libraries

    It is the research library community that others will look to for the preservation of digital assets, as they have looked to us in the past for reliable, long-term access to the traditional resources and products of research and scholarship.

    Association of Research Libraries (ARL) Strategic Plan 2005-2009*

  • Orders of Magnitude

    In 2006, the amount of digital information created, captured, and replicated was 1.288 x 10 to the 18th bits (or 161 exabytes)this is about 3 million times the information in all the books ever written.

    Three years later, it far exceeds this.*

  • Some Examples e-Science ProjectsParticle PhysicsGlobal sharing of data and computationAstronomyVirtual observatory for multi-wavelength astrophysicsChemistryRemote control of equipment & electronic logbookEngineeringNanoelectronicsHealthcareSharing normalized mammograms, telemedicineEnvironmentClimate modeling*

  • Interdisciplinary ChallengesCritical support for eScienceClarifies distinctions in research methodologiesNot to be confused with multidisciplinary, transdisciplinary and other forms of disciplinarity-splicingSupports Evidence-Based scholarshipAligns the Clinical and Translational SciencesAffirms new emerging directions and disciplines

    *

  • Select Recommendations

    Educate trainees and current investigators on responsible data sharing and reuse practicesEncourage data sharing practices as part of publication and funding policies (NIH & other mandates)Fund the costs of data sharing and support for repositories*

  • Cyberinfrastructure/ e-Infrastructure and the GridThe Grid is a software infrastructure that enables flexible, secure, coordinated resource sharing among dynamic collection of individuals, institutions and resources (Foster, Kesselman, and Tuecke)Includes not only computers but also data storage resources and specialized facilitiesLong term goal is to develop the middleware services that allow scientists to routinely build the infrastructure for their Virtual Organizations*

  • Cyberinfrastructure Goals

    A Call for ActionHigh Performance ComputingData, Data Analysis and VisualizationVisual Organizations for Distributed CommunitiesLearning and Workforce Development*

  • Key Drivers for e-ScienceAccess to Large Scale Facilities and Data Repositoriese.g. CERN LHC, ITER, EBINeed for production quality, open source versions of open standard Grid middlewaree.g. OMII, NMI, C-OmegaImminent Data Deluge with scientists drowning in datae.g. Particle Physics, Astronomy, BioinformaticsOpen Access movemente.g. Research publications and data*

  • Key Elements of a National e-InfrastructureCompetitive Research NetworkInternational Authentication and Authorization InfrastructureOpen Standard Middleware Engineering and Software RepositoryDigital Curation CenterAccess to International Data Sets and PublicationsPortals and Discovery ServicesRemote Access to Large-Scale Facilities, e.g. LHC, Diamond, ITER,International Grid Computing ServicesInteroperable International StandardsSupport for International StandardsTools and Services to support collaborationFocus for industrial Collaboration*

  • Digital Curation Centers (DCC)Identify actions needed to maintain and utilize digital data and research results over entire life-cycleFor current and future generation of usersDigital preservationLine-run technological/legal accessibility and usabilityData curation in scienceMaintenance of body of trusted data to represent current state of knowledge in area of researchResearch in tools and technologiesIntegration, annotation, provenance, metadata, security*

  • Digital Preservation: IssuesLong-term preservationPreserving the bits for a long time (digital objects)Preserving the interpretation (emulation/migration)Political/socialAppraisal What to keep?Responsibility Who should keep it?Legal Can you keep it?SizeStorage of/access to Petabytes of dataFinding and extracting metadataDescriptions of digital objects*

  • Data Publishing: BackgroundIn some areas most notably biology databases are replacing (paper) publications as a medium of communication think Genome mappingThese databases are built and maintained with great deal of human effortThey often do not contain source experimental data sometimes just annotation/metadataThey borrow extensively from, and refer to, other databasesYou are now judged by your databases as well as your (paper) publicationsUpwards of 1000 (public databases) in genetics*

  • Data Publishing: IssuesData integrationCompiling data from various sourcesAnnotationAdding comments/observations to existing dataBecoming a new form of communicationProvenanceWhere did this statistic come form?Exporting/publishing in agreed formatsTo other programs as well as peopleSecuritySpecifying/enforcing read/write access to parts of your data*

  • Considerations*

  • The Cosmic Genome Packages:ExamplesWorld Wide Telescope (http://www.worldwidetelescope.org)The Sloan Digital Sky Survey (www.sdss.org)Valuable New ToolsGenePattern (http://www.codeplex.org)GalaxyZoo (http://www.galaxyzoo.org)Semantic Annotations in WordChemistry Drawing for OfficeNew Models of Scientific PublishingHave to publish the data before astronomers publish their analysisIntegrates data and images into research papers*

  • Emergence of a Fourth Research ParadigmThousand years ago Experimental ScienceDescription of natural phenomenaLast few centuries Theoretical ScienceNewtons Laws, Maxwells EquationsLast few decades Computational ScienceSimulations of complex phenomenaToday Data-Intensive ScienceScientists overwhelmed with data sets for many different sourcesData captured by instrumentsData generated by simulationsData generated by sensor networkse-Science is the set of tools and technologies to support data federation and collaborationFor analysis and data miningFar data visualization and explorationFor scholarly communication and dissemination*

  • Cloud Science ExamplesUsing different tools & software to demonstrate the value of cloud services Scientific Applications on Microsoft AzureVirtual Research EnvironmentsOceanography Work BenchProject Junior @ Newcastle University

    *

  • Collaborative Online Services

    Exchange, Sharepoint, Live Meeting, Dynamics CRM, Google Docs, etc.No need to build your own infrastructure of maintain, manage serversMoving forward, even science-related service could move to the Cloud (e.g. RIC with British Library)*

  • A world where all data is linkedData/information is inter-connected through machine-interpretable information (e.g. paper X is about star Y)Social networks are special case of data meshesImportant/key considerationsFormats or Well-known representations of data/informationPervasive access protocols are key (e.g. http)Data/information is uniquely identified (e.g. URLs)Links/associations between data/information*

  • How is it done?e-ScienceScience increasingly done through distributed global collaborations enabled by the InternetUsing very large data collections, terascale computing resources & high performance toolsGridNew generation of information utilityMiddleware, software & hardware to access, process, communicate & store huge quantities of dataInfrastructure enabler for e-ScienceCloud- New, easier & cheaper opportunities to host, store, share & integrate, tag & link; utilizes more Web 2.0 applications*

  • More Implications of Technology Trends

    Web 2.0More egalitarian affects scientists, students, educators, general publicCollaborative classification flickrPower of collective intelligence AmazonAlternative trust models Open SourceService Orientation within & outside of librariesSemantic Web Promotes linking*

  • Data Centric 2020Data-centric 2020 vision resulting from Microsoft Towards 2020 Science (2006)Data gold-mineMultidisciplinary databases also provide a rich environment for performing science, that is a scientist may collect new data, combine them with data from other archives, and ultimately deposit the summary data back into a common archive. Many scientists no longer do experiments the old-fashioned way. Instead they mine available databases, looking for new patterns and discoveries, without ever picking up a pipette.

    For the analysis to be repeatable in 20 years time requires archiving both data and tools. *

  • Organizing eScience Content: Examples TagsSubjectInstrumental

    NationalNational/SubjectInternationalRegionalConsortiaFunding AgencyProjectConferencePersonalMedia Type

    PublisherData RepositoriesProject Meaning or OriginarXiv, CogprintsUniversity Research Institutes Southampton, Glasgow, Nottingham (SERPA). Max PlanckDARE (all universities in the Netherlands), Scotland IRIS)OceanDocsAfricaInternet Archives Universal, Oaister (Harvester)White Rose UKSHERPA-LEAP (London E-Prints Access Project)NIH (PubMed), Wellcome Trust (UK PubMed), NERC (NORA)Public Knowledge Project Eprint Archive11th Joint Symposium on Neural Computation, May 2004Peer to peerVCILt Learning Objects Repository, NTSDL (Theses), Museum Objects, Repositories, ExhibitionsJournal ArchivesUK Data Archive; World Data Centre System; National Oceanographic Data Center (USA)*

  • Ongoing Issues for e-ScienceMacro and micro issues are similar for both text and data repositoriesIP and Licenses**Distributed over many researchersOver national boundariesLack of awareness amongst researchersCultural roots and resistance to changeFunding costs, sources & accountabilityPolitics institutionally & within the disciplinesStandardsInteroperabilityVocabularies & Ontologies

    **Necessary to understand science practices: technical, social & communicative structure in order to adapt licensing solutions to the practice of e-science

    Research Issues: Information retrievalInformation modeling,Systems interoperability, and policy issues associated with providing transparent access to complex data sets*

  • New Roles: Data ScientistNew Skills RequiredUnderstanding of basic research problem & interdisciplinary connectionsQuantitative & systems analysisData Curation & Text MiningIntegrate data management within the LIS curriculumStronger IT & negotiation skillsDeeper subject backgrounds; standards & resourcesIn PracticeVarious approaches to develop and obtain digital curation skillsEstablished ties to facultySkills are there but often in discrete communities: we need to bring communities togetherIntegration within the curriculum: undergraduate students library & information science, archival studies. Computer scienceProvide recognition and career path for emerging data managers & scientistsThere must be a blurring of the boundaries between previously well defined silos that existed between information managers and data managers*

  • Role for Libraries in Digital Data UniverseData as primary source material LibrariesWill not be primary providers of large scale storage infrastructure requiredWill not provide the specialized tools to work with dataWill not provide the detailed information about the dataUnlikely to provide the solutions to digital preservation because of costCan contribute library practicesCollection policies (appraisal, selection, weeding, destruction, etc.)Data clean up, normalization, descriptionData citationCuration and preservationCollaboration with researcher re scholarly communication, deposit, education, and trainingInnovative discovery and presentation mechanisms

    Data part of enhances publication Libraries:Well positioned to define standards forTaxonomies and ontologies (for complex publications that include data)Persistent identifiersConsistent description practicesData structuring conventionsInteroperability protocols for searching and retrievalWell positioned to exploit IR experiences*

  • Role of Digital Libraries - IRsInstitutional Repository is a key component of e-InfrastructureMostly in library domainAccess and preservationDigitization data archaeologyInteroperable with departmental, national, subject repositoriesData CurationCreation metadata, preservation institutional intellectual assetsBut disparate data types and ontologiesTraining ProvisionResearch methods training for researchersData creation, documentation, managementsAdvocacy, policy settingCross disciplinary approach to key issuesExpand OA agendaInterweave e-Research, OA, andVirtual Research Environments*

  • Roles for LibrariesInstitutional Repositories accept small datasets (size of subject outside remit of Data Repositories). Data deposited in IR until accepted by data repositoryDevelopment of Regional or Discipline Repositories alongside IRs (singly or via consortia). Research libraries a natural home for content curation, (with funding)Mapping of commonalities (e.g. metadata) across disciplines, maintaining ready interoperabilityManagement of metadata throughout a research projectAddress conditional and role-based access requirements for scientific dataSupport e-Science interface functions for local usersAdding Value: linking, annotation, visualizationLibraries and researcher can add value by creating e-Science Mashups - data needs to be re-used in multiple ways, on multiple occasions and at multiple location (reuse, remix)*

  • Reinventing the LibraryIntensity urgently needed to support eScience:Emphasis is Data thus, new forms of collections & auxiliary resourcesInstitutional commitmentSustainable funding modelsRedefining the library user community-include researchLegal and policy frameworksLibrary workforce skills infusion for data science managementLibrary as a computational center as well as a text & media centerSustainable technology framework*

  • Courtesy of http://www.wordle.netThinking about things in new ways

  • Closing Proverb

    If you want to go fast, go alone.If you want to go far, go together.

    African Proverb read by Al Gore when he accepted 2007 Nobel Peace Prize*

  • Thank You

    Questions / [email protected]*

    Changing Collections for E-Science: What it Means for Libraries*Changing Collections for E-Science: What it Means for LibrariesChanging Collections for E-Science: What it Means for Libraries*Changing Collections for E-Science: What it Means for LibrariesChanging Collections for E-Science: What it Means for Libraries*Changing Collections for E-Science: What it Means for Libraries