dirk roorda, coordinator infrastructure
Post on 24-Mar-2016
31 Views
Preview:
DESCRIPTION
TRANSCRIPT
Overview
Part 1: The rising role of dataPart 2: The free use of dataPart 3: The care for dataPart 4: The re-use of data
Part 1: The rising role of data
http://en.wikipedia.org/wiki/Exabyte
Internet size (May 2009): 500 EB500.000 PB500 million TB500 million fat USB disks500 billion memory cards of 1 GB70 memory cards per person
Data deluge
http://www.datadeluge.com/ http://en.wikipedia.org/wiki/File:Tree_of_life_SVG.svg
http://tolweb.org/tree/
Where does it come from?• Instruments
• satellites, sensors, dna-sequencing• Records
• administrations, censuses, surveys• Digitisation
• the analog legacy• Hobby
• pictures, movies, genealogy• Integration
• better interoperability of existing data
The driving force
Information and Communication Technology
Babbage Analytical Engine1870
A datacenter
Genealogy2,5 PB5328 servers1,12 MW
http://blog.familytreemagazine.com/insider/Inside+Ancestrycoms+TopSecret+Data+Center.aspx
http://www.ancestry.com/
A closer look• Linguistics
• text corpora, automatic translation• Philology
• how to read a million books?• History
• historical census data• Archeology
• archive law, commercial research
Linguistics and PhilologyA chronometric approach to Indian alchemical literatureAssessing frequency changes in multistage diachronic corporaEvaluating methods for computer-assisted stemmatology using artificial benchmark data sets A Corpus Study of the Rigveda Dictionary generation for less-frequent language pairs using WordNetAn exercise in non-ideal authorship attribution: the mysterious Maria Ward
http://llc.oxfordjournals.org/
Archaeology
http://edna.itor.org/nl/intern/upload_directory/a00002/downloads/IMG0013.tif
Archaeology (2)
http://edna.itor.org/nl/oai/oai_addi/oai_addi/OAI:EVALMA:a00002.xml/
Part 2: The free use of Data
Open Access
Data is informationInformation is knowledgeKnowledge is powerWhy share it?
Open Access
Shared knowledge is double knowledge
Without free sharing of knowledge, scientific progress will halt
Tensions between sharing and not sharing remain, though
Work to do
• organise your data• let your data work together with those of
others • (colleagues, future scientists, the public)
• ask new questions to the data• because there is so much of it
• create new (virtual) data collections
Part 3: The care for data
Research Data Recycling
• existing data• collecting by experiments, surveys
• primary research data• verifying results by others• preserving unique data from experiments
• compilation, aggregation, annotation• databanks
• data mining, analysis, visualisation• new data as research input
Challenge: Software
Operating system (DOS, Windows 95, ...)Programming Languages (Basic, Pascal)File formats (Word Perfect, dBase)Applications (Addressbook, Websites)
Old data may be locked up in old software.
Meeting the challenge
To prevent the problem in the futureBackward compatibilityOpen StandardsOpen Source ApplicationsModular software engineering
keep data separated from interface and business logic
To remedy the problems of the pastEmulationMigration
Challenge: Human organisation
Forgotten jargonForgotten knowledgeNo metadataWebsites with broken links
Jargon
• II.17. Posterior berry aneurysm with subarachnoid bleed.
• II.18. Subarachnoid bleed with extension into the ventricles.
• II.19. Ruptured berry aneurysm at the end of the internal carotid artery, with obstructive hydrocephalus. Morgagni found the rupture.
• II.22. Subarachnoid hemorrhage.
http://www.pathguy.com/morgagni.htm
Meeting the challenge
Persistent IdentifiersEnough MetadataCodification of knowledge and practices
WikipediaDatamanagement early on
Part 4: The re-use of data
Data management
Use common infrastructure rather than private means
Use open formats rather than proprietary formats
Use open source software rather than closed software
Use standard ways of documenting datataxonomies, ontologies, metadata schemes
Common Infrastructure
Local file sharesUniversity repositoryDANSEuropean Infrastructures
EASY
Dataset
Datafiles
Metadata
linguists make their technology accessible- resourcesalgorithms techniques
humanities and social sciences- they are the target users
Geleerdenbrieven=
Circulation of KnowledgeArchiving
=circulation of information
Keep imagining
top related