introduction to scientific data lecture 5. the informatics effect computers have transformed how we...

Post on 04-Jan-2016

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Scientific Data

Lecture 5

The Informatics Effect

Computers have transformed how we collect, store, analyze, and visualize data. Notably,

However, scientific data is not the same as computer data.

• scientific data vary more than ever before;• we can store more data than ever before;• we can analyze more data more quickly than ever before;• we can visualize data in revealing new ways.

So, What Are Scientific Data?

• Do data need to be numerical?• Do data need to be measured for a purpose?• Is free text data?– What about clinical reports from physicians?– What about web sites?

• Can processing steps turn computer data into scientific data?• Is the distinction a matter of quality?

We can ask several questions about scientific data, including:

Why is the distinction important?

Neuroscience Data

Blood flow in the brain

Genetic Data

Gene expression levels across conditions

Oceanographic Data

Sea surface temperature

What Are Scientific Data?

What do these examples have in common?

They aren’t data.

Then What Is This?

1. Satellites record infrared radiance as a collection of numbers.

2. Computers convert these numbers to temperature data using algorithms based on the theory of blackbody radiation.

3. One can plot these temperatures as a heat map to get a global view of sea surface temperature.

This is a visualization of data produced in multiple stages.

Then What About DNA Microarrays?

Raw microarray images

geneID 15’ 60’ 360’

ssr3571 0.97 1.05 0.96

ssr3570 0.99 1.11 0.91

ssr3532 1.46 1.15 1.21

ssr3467 1.08 1.51 0.98

ssr3465 0.51 0.76 1.16

ssr3451 0.80 1.01 1.12

Expression levels Heat map visualization

Raw expression levels undergo statistical normalization beforeserving as empirical evidence.

What About the fMRI Image?

1. An MRI scanner records the blood oxygen level at each voxel in a 3D grid over an extended period of time.

2. Computers convert the raw time series to data using algorithms that remove the effects of noise, motion, etc.

3. These data undergo further processing to visualize neuronal activity with respect to an anatomical image.

This is a visualization of the data after several stages of processing.

So, What Are Scientific Data?

• collect data for a purpose;

• process raw measurements to produce data that answer scientific questions;

• structure and interpret data in light of scientific theories.

The general characteristics seem tied to intent or purpose. In particular, scientists

What other distinctions come to mind?

Scientific Data Through History

• Early data consisted of drawings and tables of numbers and other properties.

• As science progressed, x-rays, clinical reports, and other modalities served as data.

• Currently, informatics solutions are changing our definitions of scientific data.

The nature of data changes with scientific progress, technological advancements, and problem-specific needs.

Should we be cautious about the term ‘scientific data’?

Tycho Brahe’s Data

Early data were often recorded as tables of numbers. The Rudolphine Tables recorded by Brahe•provided precise astronomical records,•recorded the positions of stars and planets, and•enabled Kepler’s discovery of his laws of planetary motion.

Galileo’s Moon

Data were also recorded as drawings, particularly in astronomy and anatomy.Through his use of the newly invented telescope, Galileo reported data that•presented a visual record of the moon’s surface,•revealed evidence of craters and mountains for the first time, and•challenged the pervading view that celestial objects are perfect spheres.

From Sidereus Nuncius

Wilhelm Roentgen’s X-rays

Technology both refined the senses and expanded them.Roentgen discovered x-rays which let scientists •view internal anatomy noninvasively,•determine the structure of crystals,•analyze the elemental composition of solid materials, and •detect interesting astronomical phenomena.

Rosalind Franklin’s DNA X-rays

Photo 51

A theory of wave interactions with atoms supports x-ray diffraction.

Franklin used this method to image the geometry of DNA molecules.

These images led Watson and Crick to the double-helix model.

Mass Spectrometry

Mass spectrometers produce data from chemical compounds.

The charge to mass ratios indicate the compound’s fragmentation pattern.

The graph suggests composition, structure, and other properties.

Isotope distributions of a peptide

The Informatics Effect: Data and the Human Genome Project

A large scale endeavor, the Human Genome Project was greatly aided by informatics technology, such as•databases for storing DNA sequences;•sequence alignment tools for genome reconstruction;•sequence annotation tools for labeling and relating important regions of the genome; and•visualization tools that provide overviews of large areas of the genome.

Planned for 2015, the LSST will scan the sky semiweekly and yield 60PB of raw images that are analyzed for•events that would benefit from collaborative monitoring,•variable objects (e.g,. gamma ray bursts), and•moving objects (e.g., asteroids).Further processing tools will•measure properties of faint objects,•map the x,y coordinates of the images to celestial coordinates, and•classify objects based on static and dynamic behavior.

The Informatics Effect:Data and the Large Synoptic Survey Telescope

Scientific Data

Scientific activity depends upon the ability to collect, store, retrieve, analyze, and visualize data.

These data often result from the processing of raw measurements by informatics tools.

Researchers in all areas are recognizing the vital role of informatics solutions that drive the data lifecycle.

Advances in these solutions can ultimately lead to advances in scientific knowledge.

top related