from motif search to gene expression analysis. finding tf targets using a bioinformatics approach...
TRANSCRIPT
![Page 1: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/1.jpg)
From motif search to gene expression analysis
![Page 2: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/2.jpg)
Finding TF targets using a bioinformatics approach
Scenario 1 : Binding motif is known (easier case)
Scenario 2 : Binding motif is unknown (hard case)
![Page 3: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/3.jpg)
Are common motifs the right thing to search for ?
![Page 4: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/4.jpg)
Solutions:
-Searching for motifs which are enriched in one set but not in a random set
- Use experimental information to rank the sequences according to their binding affinity and search for enriched motifs at the top of the list
![Page 5: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/5.jpg)
Sequencing the regions in the genome to which a protein (e.g. transcription factor) binds to.
ChIP-Seq
![Page 6: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/6.jpg)
ChIP –SEQ
BestBinders
WeakBinders
Finding the p53 binding motif in a set of p53 target sequences which are ranked according to binding affinity
![Page 7: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/7.jpg)
Ranked sequences list
Candidate k-mers
CTACGC
ACTTGA
ACGTGA
ACGTGC
CTGTGC
CTGTGA
CTGTAC
ATGTGC
ATGTGA
CTATGC
CTGTGC
CTGTGA
CTGTGACTGTGA
CTGTGA
CTGTGA
CTGTGA
- a word search approach to search for enriched motif in a ranked list
CTGTGA
CTGTGA
![Page 8: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/8.jpg)
The total number of input sequences
The number of sequences containing the motif
The number of sequences at
the top of the list
The number of sequences containing the motif among the top sequences
Ranked sequences list
CTGTGA
CTGTGA
CTGTGA
CTGTGA
CTGTGA
CTGTGA
CTGTGA
CTGTGA
uses the minimal hyper geometric statistics (mHG) to find enriched
motifs
![Page 9: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/9.jpg)
The enriched motifs are combined to get a PSSM which represents the binding
motif
![Page 10: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/10.jpg)
![Page 11: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/11.jpg)
P[ED]XK[RW][RK]X[ED]
Protein Motifs
Protein motifs are usually 6-20 amino acids long andcan be represented as a consensus/profile:
or as PWM
![Page 12: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/12.jpg)
Gene Expression Analysis
![Page 13: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/13.jpg)
Gene Expression
13
proteinRNADNA
![Page 14: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/14.jpg)
Gene Expression
14
AAAAAAAAAAAAAA
AAAAAAAAAAAAAA
AAAAAAA
AAAAAAAAAAAAAA
AAAAAAAAAAAAAA
AAAAAAAAAAAAAA
AAAAAAAAAAAAAA
AAAAAAAAAAAAAA
AAAAAAAAAAAAAA
AAAAAAAmRNA gene1
mRNA gene2
mRNA gene3
![Page 15: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/15.jpg)
Studying Gene Expression 1987-2013
15
Microarray (first high throughput gene expression experiments)
DNA chips
RNA-seq (Next Generation Sequencing)
![Page 16: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/16.jpg)
Classical versus modern technologies to study gene expression
16
Classical Methods (Spotted microarray, DNA chips)-Require prior knowledge on the RNA transcriptGood for studying the expression of known genes
New generation RNA sequencing-Do not require prior knowledge Good for discovering new transcripts
![Page 17: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/17.jpg)
17
Experimental Protocol Two channel cDNA arrays
http://www.bio.davidson.edu/courses/genomics/chip/chip.html
![Page 18: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/18.jpg)
18
One channel DNA chips
• Each sequence is represented by a probe set colored with one fluorescent dye
• Target hybridizes to complimentary probes only• The fluorescence intensity is indicative of the
expression of the target sequence
![Page 19: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/19.jpg)
19
Affymetrix Chip
![Page 20: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/20.jpg)
RNA-seq
20
![Page 21: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/21.jpg)
21
Clustering the data according to expression profiles.
Gen
es
Expression in different conditions
NEXT…
![Page 22: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/22.jpg)
22
WHY?What can we learn from the
clusterers?
• Identify gene function– Similar expression can infer similar function
• Diagnostics and Therapy– Different genes expression can indicate a disease
state– Genes which change expression in a disease can be
good candidates for drug targets
![Page 23: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/23.jpg)
23Ramaswamy et al, 2003 Nat Genet 33:49-54
Samples were taken from patients with adenocarcinoma.Hundreds of genesthat differentiate betweencancer tissues in differentstages of the tumor were found.The arrow shows an exampleof a tumor cells which were not detected correctly byhistological or other clinical parameters.
A molecular signature of metastasis in primary solid tumors
![Page 24: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/24.jpg)
24
HOW?Different clustering approaches
• Unsupervised - Hierarchical Clustering - K-means
• Supervised Methods-Support Vector Machine (SVM)
![Page 25: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/25.jpg)
Clustering
Clustering organizes things that are close into groups.
- What does it mean for two genes to be close?
- Once we know this, how do we define groups?
![Page 26: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/26.jpg)
What does it mean for two genes to be close?
26
We need a mathematical definition of distance between the expression pattern of two genes
Gene 1
Gene 2
Gene1= (E11, E12, …, E1N)’Gene2= (E21, E22, …, E2N)’
Euclidean distance= Sqrt of Sum of (E1i -E2i)2, i=1,…,N
For example distance between gene 1 and 201 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22
![Page 27: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/27.jpg)
Clustering the genes according to expression
27
Generate a tree based on the distances between genes(similar to a phylogenetic tree)
Each gene is a leaf on the treeDistances reflect the similarity of their expression pattern
Hierarchical Clustering
Gen
es
Expression in different conditions
Gene Cluster
![Page 28: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/28.jpg)
28
a b c d
a 0 4 2 4
b 4 0 4.47 2.82
c 2 4.47 0 4.47
d 4 2.82 4.47 0
Clustering the genes according to gene expression
Distance Table
Distances (Euclidian distance)*
Genes
Dab = 4Dac = 2Dad = 4Dbc = 4.47Dbd = 2.82 Dcd = 4.47
• Can be calculated using different distance metrics
GENE a 1, -1, 1, 1, 1,-1,-1,-1GENE b 1, 1, -1, 1, 1, 1,-1, 1GENE c 1, -1, 1, -1, 1,-1,-1,-1GENE d -1, 1, -1, 1, 1, 1,-1,-1
![Page 29: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/29.jpg)
29
Analyzing the clusters of genes
Cluster 2
Cluster 3
Cluster 4
![Page 30: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/30.jpg)
30
What can we learn from clusters with similar gene expression ??
![Page 31: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/31.jpg)
31
EXAMPLE- hnRNP A1 and SRp40
HnRNPA1 and SRp40 are not clear homologs based on blast e-value but have a very similar gene expression pattern in different tissues
![Page 32: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/32.jpg)
32
Are hnRNP A1 and SRp40 functionally homologs ??
SF SFSF
SFSF
SF SF
SFSF
SFSFSF
SRP40
hnRNP A1
YES!!!!
![Page 33: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/33.jpg)
33
What else can we learn from clusters with similar gene expression
??
• Similar expression between genes
– The genes have similar function
– One gene controls the other
– All genes are controlled by a common regulatory genes
![Page 34: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/34.jpg)
34
How can gene expression help in diagnostics?
![Page 35: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/35.jpg)
How can gene-expression help in diagnostics ?
Different patients (BRCA1 or BRCA2)
RESEARCH QUESTION
Can we distinguish BRCA1 from BRCA2– cancers based solely on their gene expression profiles?
HERE we want to cluster the patients not the genes !!!
Gen
es
![Page 36: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/36.jpg)
36
How can gene expression be applied for diagnostic?
Patient 1
patient 2
patient 3
patient4
patient 5
Gen1 + - - + +Gen2 + + - + -Gen3 - + + + -Gen4 + + + - -Gen5 - - + - +
5 Breast Cancer Patient
![Page 37: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/37.jpg)
37
How can gene expression be applied for diagnostic?
patinet1
patient 2
patient4
patient 3
patient 5
Gen1 + - + - +Gen3 - + + + -Gen4 + + - + -Gen2 + + + - -Gen5 - - - + +
InformativeGenes
BRCA1 BRCA2
Two-Way clustering = clustering the patients and genes
![Page 38: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/38.jpg)
Supervised approachesfor diagnostic based on expression data
Support Vector Machine SVM
![Page 39: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/39.jpg)
• SVM would begin with a set of samples from patients which have been diagnosed as either BRCA1 (red dots) or BRCA2 (blue dots).
Each dot represents a vector of the expression pattern taken from the microarray experiment of a patient.
![Page 40: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/40.jpg)
40
How do SVM’s work with expression data?The SVM is trained on data which was classified based on histology.
?
After training the SVM to separated the BRCA1 from BRAC2 tumorsgiven the expression data, we can then apply it to diagnose anunknown tumor for which we have the equivalent expression data .
![Page 41: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/41.jpg)
Projects 2013-14
![Page 42: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/42.jpg)
Key dates12.12 lists of suggested projects published **You are highly encouraged to choose a project yourself or find a relevant project which can help in your research
9.1 Submission project overview (one page)-Title-Main question-Major Tools you are planning to use to answer the questions
Final week – meetings on projects 12.3 Poster submission 19.3 Poster presentation
Instructions for the final projectIntroduction to Bioinformatics 2013-14
![Page 43: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/43.jpg)
2. Planning your research After you have described the main question or questions of your project, you should carefully plan your next stepsA. Make sure you understand the problem and read the necessary background to proceed B. formulate your working plan, step by stepC. After you have a plan, start from extracting the necessary data and decide on the relevant tools to use at the first step. When running a tool make sure to summarize the results and extract the relevant information you need to answer your question, it is recommended to save the raw data for your records , don't present raw data in your final project. Your initial results should guide you towards your next steps.D. When you feel you explored all tools you can apply to answer your question you should summarize and get to conclusions. Remember NO is also an answer as long as you are sure it is NO. Also remember this is a course project not only a HW exercise. .
![Page 44: From motif search to gene expression analysis. Finding TF targets using a bioinformatics approach Scenario 1 : Binding motif is known (easier case) Scenario](https://reader036.vdocuments.us/reader036/viewer/2022081514/56649e565503460f94b4eb2f/html5/thumbnails/44.jpg)
3. Summarizing final project in a poster (in pairs)Prepare in PPT poster size 90-120 cmTitle of the project Names and affiliation of the students presenting
The poster should include 5 sections :Background should include description of your question (can add
figure)Goal and Research Plan: Describe the main objective and the research planResults (main section) : Present your results in 3-4 figures, describe
each figure (figure legends) and give a title to each result Conclusions : summarized in points the conclusions of your projectReferences : List the references of paper/databases/tools used for
your project
Examples of posters will be presented in class