storing and organizing data informatics i101 february 18, 2004 john c. paolillo

19
Storing and Organizing Storing and Organizing Data Data Informatics I101 Informatics I101 February 18, 2004 February 18, 2004 John C. Paolillo John C. Paolillo

Upload: isabel-wiggins

Post on 16-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo

Storing and Organizing DataStoring and Organizing Data

Informatics I101Informatics I101

February 18, 2004February 18, 2004

John C. PaolilloJohn C. Paolillo

Page 2: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo

Storing DataStoring Data

• Encoding: fixed or variable widthEncoding: fixed or variable width

• MemoryMemory

• Storage medium:Storage medium:– Magnetic: tape, disk, hard diskMagnetic: tape, disk, hard disk– Optical: CD, DVD, etc.Optical: CD, DVD, etc.– Silicon: Programable Read Only Memory Silicon: Programable Read Only Memory

(PROM), Erasable PROM, etc.(PROM), Erasable PROM, etc.

Page 3: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo

Compact Disk RecordingCompact Disk Recording

Lens

Phot

ocel

l

Data groove, etched insurface of plastic, hasa slight “wobble” that helps locate the data

LED

Light beam

Lens

Page 4: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo

Crystaline metal alloyrecording surface

The Recording ProcessThe Recording Process

Pits of amorphous solidleft when metal re-cools

Light beam — pulses to record on and off states,steady for reading

Page 5: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo

1.6µm 0.74µm 0.32µm

Page 6: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo

CD Media StatesCD Media States

• CrystalineCrystaline: bright, reflects light well : bright, reflects light well – ““off” stateoff” state

• AmorphousAmorphous: dark, scatters light : dark, scatters light – “ “on” stateon” state

• Micro-crystalineMicro-crystaline: reflects light, but not : reflects light, but not brightly brightly – “ “erased” state (= “off”)erased” state (= “off”)

Page 7: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo
Page 8: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo

How Erasing Takes PlaceHow Erasing Takes Place

Page 9: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo

Writing isn’t perfectWriting isn’t perfect

The center pits (dots) are partly erased by the heating caused by the writing of the nearby longer pits (dashes)which were written later.

Page 10: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo

ReferenceReference

van Houten, Henk; and Wouter Leibbrandt. van Houten, Henk; and Wouter Leibbrandt. 2000. “Phase change recording”. 2000. “Phase change recording”. Communications of the ACMCommunications of the ACM, 43.11: 64-71., 43.11: 64-71.

http://www.acm.org/dlhttp://www.acm.org/dl

Page 11: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo

Storing DataStoring Data

• Encoding: we may need to change from one Encoding: we may need to change from one encoding to another encoding to another – Task of the Task of the device driverdevice driver– Gives us a stream of bitsGives us a stream of bits

• Medium: different media require different Medium: different media require different treatment of the data for storagetreatment of the data for storage– Task of the device hardware itselfTask of the device hardware itself– Gives us a stream of bits read/write-able by the deviceGives us a stream of bits read/write-able by the device

But how do we find the data later?But how do we find the data later?

Page 12: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo

Data OrganizationData Organization

• Index for the dataIndex for the data– File names, extensionsFile names, extensions– Metadata (date, program that uses it, etc.)Metadata (date, program that uses it, etc.)– Directory structuresDirectory structures

• All data storage systems use some kind of All data storage systems use some kind of data organizationdata organization– The principles of data organization are the same The principles of data organization are the same

no matter what the data or where it is organizedno matter what the data or where it is organized

Page 13: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo

When Organization is CriticalWhen Organization is Critical

• National Center for Biotechnology Information National Center for Biotechnology Information (NCBI)(NCBI)Genbank:Genbank:– 28 billion DNA base pairs (A, C, G, T)28 billion DNA base pairs (A, C, G, T)– 22 million sequences (possible genes)22 million sequences (possible genes)

This is a lot of data to manage. In NCBI it has This is a lot of data to manage. In NCBI it has been indexed with many kinds of metadata and been indexed with many kinds of metadata and integrated with information from scientific integrated with information from scientific publications, so the overall enterprise is larger yet.publications, so the overall enterprise is larger yet.

Page 14: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo
Page 15: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo

Other Similar ApplicationsOther Similar Applications

• NASA mars and other missionsNASA mars and other missions– http://photojournal.jpl.nasa.gov/index.htmlhttp://photojournal.jpl.nasa.gov/index.html

• The National Virtual ObservatoryThe National Virtual Observatory– http://www.us-vo.org/http://www.us-vo.org/

• Centers for Disease ControlCenters for Disease Control– http://www.cdc.gov/http://www.cdc.gov/

• Homeland SecurityHomeland Security

Page 16: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo

Data and MetadataData and Metadata

DataData: : any object of interest which can be any object of interest which can be characterized and encoded in digital formcharacterized and encoded in digital form

MetadataMetadata: : data data aboutabout data — data used to help index data — data used to help index and locate data of interest in some and locate data of interest in some application application

Page 17: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo

Data Organization SchemesData Organization Schemes

• HierarchicalHierarchical– Data organized into object hierarchies for easy accessData organized into object hierarchies for easy access– Metadata is in the tree structure of the hierarchiesMetadata is in the tree structure of the hierarchies– XML DatabasesXML Databases

• NetworkNetwork– Objects link to some selected other objectsObjects link to some selected other objects– Metadata is embedded in the dataMetadata is embedded in the data– The World-Wide WebThe World-Wide Web

• RelationalRelational– Data organized into Data organized into relationsrelations– Metadata is in the structure of the relationsMetadata is in the structure of the relations– Most Database Management Systems (DBMSs)Most Database Management Systems (DBMSs)

Page 18: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo

RelationsRelations

ActorActorMeryl StreepMeryl StreepJohnny DeppJohnny DeppMeg RyanMeg Ryan......

MovieMovieThe HoursThe HoursDead ManDead ManAgainst the RopesAgainst the Ropes......

DateDateSummer 2003Summer 2003Summer 1994Summer 1994Winter 2004Winter 2004......

MetadataMetadata

DataData

Page 19: Storing and Organizing Data Informatics I101 February 18, 2004 John C. Paolillo