study of gene expression: statistics, biology, and microarrays ker-chau li statistics department...
TRANSCRIPT
![Page 1: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/1.jpg)
![Page 2: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/2.jpg)
Study of Gene Expression:Statistics, Biology, and
MicroarraysKer-Chau Li
Statistics Department
UCLA
![Page 3: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/3.jpg)
PART I. Cellular Biology
Macromolecules: DNA, mRNA, protein
![Page 4: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/4.jpg)
Why Biology?
![Page 5: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/5.jpg)
Human Genome Project
Begun in 1990, the U.S. Human Genome Project is a 13-year effort coordinated by the U.S. Department of Energy and the National Institutes of Health. The project originally was planned to last 15 years, but effective resource and technological advances have accelerated the expected completion date to 2003. Project goals are to ■ identify all the approximate 30,000 genes in human DNA, ■ determine the sequences of the 3 billion chemical base pairs that make up human DNA, ■ store this information in databases, ■ improve tools for data analysis, ■ transfer related technologies to the private sector, and ■ address the ethical, legal, and social issues (ELSI) that may arise from the project. Recent Milestones:■ June 2000 completion of a working draft of the entire human genome ■ February 2001 analyses of the working draft are published
Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001
![Page 6: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/6.jpg)
• Gene number, exact locations, and functions • Gene regulation • DNA sequence organization• Chromosomal structure and organization • Noncoding DNA types, amount, distribution, information content, and functions • Coordination of gene expression, protein synthesis, and post-translational events • Interaction of proteins in complex molecular machines• Predicted vs experimentally determined gene function• Evolutionary conservation among organisms• Protein conservation (structure and function)• Proteomes (total protein content and function) in organisms• Correlation of SNPs (single-base DNA variations among individuals) with health and disease• Disease-susceptibility prediction based on gene sequence variation• Genes involved in complex traits and multigene diseases• Complex systems biology including microbial consortia useful for environmental restoration• Developmental genetics, genomics
Future Challenges: What We Still Don’t Know
Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001
![Page 7: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/7.jpg)
Medicine and the New Genomics
• Gene Testing
• Gene Therapy
• Pharmacogenomics
Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001
•improved diagnosis of disease •earlier detection of genetic predispositions to disease •rational drug design •gene therapy and control systems for drugs •personalized, custom drugs
Anticipated Benefits
![Page 8: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/8.jpg)
Anticipated Benefits
Molecular Medicine
• improved diagnosis of disease• earlier detection of genetic predispositions to disease• rational drug design• gene therapy and control systems for drugs• pharmacogenomics "custom drugs"
Microbial Genomics
• rapid detection and treatment of pathogens (disease-causing microbes) in medicine• new energy sources (biofuels)• environmental monitoring to detect pollutants• protection from biological and chemical warfare• safe, efficient toxic waste cleanup
Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001
![Page 9: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/9.jpg)
Agriculture, Livestock Breeding, and Bioprocessing
• disease-, insect-, and drought-resistant crops• healthier, more productive, disease-resistant farm animals• more nutritious produce• biopesticides• edible vaccines incorporated into food products
• new environmental cleanup uses for plants like tobacco
Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001
Anticipated Benefits
![Page 10: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/10.jpg)
![Page 11: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/11.jpg)
Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001
![Page 12: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/12.jpg)
What is a gene ?
![Page 13: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/13.jpg)
![Page 14: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/14.jpg)
SNP and Genetic Disease
![Page 15: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/15.jpg)
![Page 16: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/16.jpg)
![Page 17: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/17.jpg)
![Page 18: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/18.jpg)
Mitochondrial ATP Synthase E. coli ATP Synthase
These images depicting models of ATP Synthase subunit structure were provided by John Walker. Some equivalent subunits from different organisms have different names.
![Page 19: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/19.jpg)
![Page 20: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/20.jpg)
PART II. Microarray
Genome-wide expression profiling
![Page 21: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/21.jpg)
Differential Gene expression:tissues, organs
![Page 22: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/22.jpg)
![Page 23: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/23.jpg)
Next Step in Genomics
• Transcriptomics involves large‑scale analysis of messenger RNAs (molecules that are transcribed from active genes) to follow when, where, and under what conditions genes are expressed. • Proteomics—the study of protein expression and function—can bring researchers closer than gene expression studies to what’s actually happening in the cell. • Structural genomics initiatives are being launched worldwide to generate the 3‑D structures of one or more proteins from each protein family, thus offering clues to function and biological targets for drug design. • Knockout studies are one experimental method for understanding the function of DNA sequences and the proteins they encode. Researchers inactivate genes in living organisms and monitor any changes that could reveal the function of specific genes. • Comparative genomics—analyzing DNA sequence patterns of humans and well‑studied model organisms side‑by‑side—has become one of the most powerful strategies for identifying human genes and interpreting their function.
Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001
![Page 24: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/24.jpg)
Microarray
![Page 25: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/25.jpg)
MicroArray
• Allows measuring the mRNA level of thousands of genes in one experiment -- system level response
• The data generation can be fully automated by robots
• Common experimental themes:– Time Course– Mutation/Knockout Response
![Page 26: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/26.jpg)
MicroArray Technique:
Synthesize GeneSpecific DNA Oligos
Attach oligo toSolid Support
Tissue or Cell
extract mRNA
Amplificationand Labeling
Hybridize
Scan and Quantitate
Reverse-transcriptionColor : cy3, cy5 green, red
![Page 27: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/27.jpg)
Exploring the Metabolic and Genetic Control ofGene Expression on a Genomic Scale
Joseph L. DeRisi, Vishwanath R. Iyer, Patrick O. Brown*
![Page 28: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/28.jpg)
A B C D E …..
A -- 2.1 0.8 1.3 0.5
B 0.2 -- -0.5 2.3 0.22
… -1.2 -- 0.3 -1.1
Expression level
Time0
1
Change of Condition
Or:
Time Course:
![Page 29: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/29.jpg)
PART III. Statistics
Low-level analysis
Comparative expression
Feature extraction
Classification,clustering
Pearson correlation
Liquid association
![Page 30: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/30.jpg)
Image analysis
• Convert an image into a number representing the ratio of the levels of expression between red and green channels
• Color bias• Spatial, tip, spot effects• Background noises• cDNA, oligonucleotide arrays,
![Page 31: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/31.jpg)
Genome-wide expression profileA basic structure
cond1 cond2 …….. condp
Gene1 x11 x12 …….. x1p
Gene2 x21 x22 …….. x2p
… … ...
… … ...
Genen xn1 xn2 …….. xnp
![Page 32: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/32.jpg)
Cond1, cond2, …, condp denote various environmental conditions, time points, cell types, etc. under which mRNA samples are taken
Note : numerous cells are involved Data quality issues : 1. chip (manufacturer) 2. mRNA sample (user)It is important to have a homogeneous sampleso that cellular signals can be amplified- Yeast Cell Cycle data : ideally all cells are engaged in the same activities- synchronization
![Page 33: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/33.jpg)
Example 1
Comparative expression
Normal versus cancer cells
ALL versus AML
![Page 34: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/34.jpg)
E.Lander’s group at MIT
• Cancer classification (leukemia)• ALL; AML (arising from lymphoid or myeloid precursors)• Require different treatments• Traditional methods ; nuclear morphology;• Enzyme-based histochemical analysis(1960)• Antibodies (1970)• Genome wide expression comparision
![Page 35: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/35.jpg)
ALL (acute lymphoblastic leukemia)AML(acute myeloid leukemia)
![Page 36: Study of Gene Expression: Statistics, Biology, and Microarrays Ker-Chau Li Statistics Department UCLA kcli@stat.ucla.edu](https://reader035.vdocuments.us/reader035/viewer/2022062423/56649d085503460f949da4d9/html5/thumbnails/36.jpg)
Gene selection
• For each gene (row) compute a score defined by
sample mean of X - sample mean of Y divided by standard deviation of X + standard deviation of Y
• X=ALL, Y=AML
• Genes (rows) with highest scores are selected.
• Works ????• 34 new leukemia samples• 29 are predicated with 100% accuracy; 5 weak predication cases