bioinformatics lectures at rice lecture 2: high throughput technologies in genomics by li zhang

13
Bioinformatics Lectures at Rice Lecture 2: High throughput technologies in genomics By Li Zhang

Upload: darlene-riley

Post on 17-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Bioinformatics Lectures at Rice Lecture 2: High throughput technologies in genomics By Li Zhang

Bioinformatics Lectures at Rice

Lecture 2: High throughput technologies in genomics

By Li Zhang

Page 2: Bioinformatics Lectures at Rice Lecture 2: High throughput technologies in genomics By Li Zhang

Microarrays

•Biology: The biological problems•Technology: Microarray mechanism; experimental procedures•Statistical methods: data analysis, checking quality, exploration, discovery.

Page 3: Bioinformatics Lectures at Rice Lecture 2: High throughput technologies in genomics By Li Zhang

Microarray technology

• Microarray technology measure copy number of molecules in a mixture on a small slide.

• Thousands or millions of different kinds of molecules can be measured simultaneously, thus creating large volumes of data per biological sample.

• The molecules can be DNA, RNA or protein.

Page 4: Bioinformatics Lectures at Rice Lecture 2: High throughput technologies in genomics By Li Zhang

Major types of microarrays

• Two color short oligo arrayshttp://www.youtube.com/watch?v=VNsThMNjKhM&feature=related

• Single color short oligo arraysSynthesized by photolithography:http://www.youtube.com/watch?v=ui4BOtwJEXs&feature=related (Eric Lander)

• Bead arrays

Page 5: Bioinformatics Lectures at Rice Lecture 2: High throughput technologies in genomics By Li Zhang

The experimental procedure to produce microarray data

Affymetrix Gene expression Analysis Sample preparation protocol:

RNA isolationcDNA synthesiscRNA synthesisHybrdizationAmplificationScanhttp://www.digizyme.com/competition/examples/genechip.html

Page 6: Bioinformatics Lectures at Rice Lecture 2: High throughput technologies in genomics By Li Zhang

Targets of Microarray measurements

• mRNA gene expression• SNP genotyping• DNA copy number (aneuploidy, chromosomal

aberration,LOH) • DNA methylation• ChIP-chip. Protein-DNA binding site• Nucleosome binding site

Page 7: Bioinformatics Lectures at Rice Lecture 2: High throughput technologies in genomics By Li Zhang

Some key aspects of microarray technology

•Parellel. The technology is design to measure a larger number of different molecules.•Almost comprehensive. It can work for some or most of the molecules, but not for all, which will result in some missing data. •Noise and bias. The signals can be affected by unwanted source, e.g., cross-hybridization, which creates biases. Contamination also may have asymmetrical distribution. •Nonlinear response. Saturation causes non-linear behavior. •Evolving annotation. Identity of the molecules may change, reflecting new knowledge through time. •No units. The numbers are often on relative scale, which means the data have are not been calibrated.

Page 8: Bioinformatics Lectures at Rice Lecture 2: High throughput technologies in genomics By Li Zhang

Next generation sequencing techniques

Page 9: Bioinformatics Lectures at Rice Lecture 2: High throughput technologies in genomics By Li Zhang

Sequence by synthesis on an array

• Illumina/SOLiD/454 Life scienceshttp://www.youtube.com/watch?v=g0vGrNjpyA8 (1.5 hr video,

from a meeting in 2010)Illumina’s animation. (

http://www.youtube.com/watch?v=l99aKKHcxC4&feature=related) (3 min)

Solid’s animation.http://www.youtube.com/watch?v=nlvyF8bFDwM

Complete Genomics ( Nanoball sequencing).

Page 10: Bioinformatics Lectures at Rice Lecture 2: High throughput technologies in genomics By Li Zhang

Nano-ball of Complete Genomics

Page 11: Bioinformatics Lectures at Rice Lecture 2: High throughput technologies in genomics By Li Zhang

Some key aspects of next generation sequencing technology

• Compared with microarrays, NGS has less noise, no cross hybridization, and no saturation.

• Bias remains a problem. Some sequences simply cannot be dealt with properly. These include high GC sequences, repeats, etc.

• Mapping to the genome can be challenging. But paired-ends help a lot.

• Biases partly come from PCR amplification, whose efficiency differ depending on the sequences.

Page 12: Bioinformatics Lectures at Rice Lecture 2: High throughput technologies in genomics By Li Zhang

3rd Generation sequencing

• Single molecule, with no PCR amplification. • No fluorescence dyes, hence less reagent cost.• Longer sequences• Remaining problem: erratic base calling.Ion torrent (http://www.youtube.com/watch?v=yVf2295JqUg)

Pacific Biosciences (http://www.youtube.com/watch?v=v8p4ph2MAvI)

Nano-pores (http://www.youtube.com/watch?v=8kPfQNzR4FI&feature=results_main&playnext=1&list=PL0AC36A831CCB8690)

Page 13: Bioinformatics Lectures at Rice Lecture 2: High throughput technologies in genomics By Li Zhang

Challenges ahead

• Complexity of human diseases• Heterogeneity• Biological samples are fragile, subject to

degradation, contamination. • Biases, batch effects, standards.