practical bioinformaticshisto.ucsf.edu/bms270/bms270_2018/slides/slides04_distancematrices.pdfmark...
TRANSCRIPT
![Page 1: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/1.jpg)
Practical Bioinformatics
Mark Voorhies
4/6/2017
Mark Voorhies Practical Bioinformatics
![Page 2: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/2.jpg)
Loading and re-loading your functions
# Use i m p o r t th e f i r s t t ime you l o a d a module# ( And keep u s i n g i m p o r t u n t i l i t l o a d s# s u c c e s s f u l l y )import my module
my module . m y f u n c t i o n ( 4 2 )
# Once a module has been loaded , use r e l o a d to# f o r c e python to r e a d your new codefrom i m p o r t l i b import r e l o a dr e l o a d ( my module )
Mark Voorhies Practical Bioinformatics
![Page 3: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/3.jpg)
Pearson distances
Pearson similarity
s(x , y) =
∑Ni (xi − xoffset)(yi − yoffset)√∑N
i (xi − xoffset)2√∑N
i (yi − yoffset)2
Pearson distanced(x , y) = 1 − s(x , y)
Euclidean distance ∑Ni (xi − yi )
2
N
Mark Voorhies Practical Bioinformatics
![Page 4: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/4.jpg)
Pearson distances
Pearson similarity
s(x , y) =
∑Ni (xi − xoffset)(yi − yoffset)√∑N
i (xi − xoffset)2√∑N
i (yi − yoffset)2
Pearson distanced(x , y) = 1 − s(x , y)
Euclidean distance ∑Ni (xi − yi )
2
N
Mark Voorhies Practical Bioinformatics
![Page 5: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/5.jpg)
Pearson distances
Pearson similarity
s(x , y) =
∑Ni (xi − xoffset)(yi − yoffset)√∑N
i (xi − xoffset)2√∑N
i (yi − yoffset)2
Pearson distanced(x , y) = 1 − s(x , y)
Euclidean distance ∑Ni (xi − yi )
2
N
Mark Voorhies Practical Bioinformatics
![Page 6: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/6.jpg)
Comparing all measurements for two genes
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−5 0 5
−5
05
Comparing two expression profiles (r = 0.97)
TLC1 log2 relative expression
YF
G1
log2
rel
ativ
e ex
pres
sion
Mark Voorhies Practical Bioinformatics
![Page 7: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/7.jpg)
Comparing all genes for two measurements
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
−10 −5 0 5 10
−10
−5
05
Array 1, log2 relative expression
Arr
ay 2
, log
2 re
lativ
e ex
pres
sion
●
●
Mark Voorhies Practical Bioinformatics
![Page 8: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/8.jpg)
Comparing all genes for two measurements
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
−10 −5 0 5 10
−10
−5
05
Euclidean Distance
Array 1, log2 relative expression
Arr
ay 2
, log
2 re
lativ
e ex
pres
sion
●
●
Mark Voorhies Practical Bioinformatics
![Page 9: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/9.jpg)
Comparing all genes for two measurements
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
−10 −5 0 5 10
−10
−5
05
Uncentered Pearson
Array 1, log2 relative expression
Arr
ay 2
, log
2 re
lativ
e ex
pres
sion
●
●
Mark Voorhies Practical Bioinformatics
![Page 10: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/10.jpg)
Measure all pairwise distances under distance metric
Mark Voorhies Practical Bioinformatics
![Page 11: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/11.jpg)
Hierarchical Clustering
Mark Voorhies Practical Bioinformatics
![Page 12: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/12.jpg)
Hierarchical Clustering
Mark Voorhies Practical Bioinformatics
![Page 13: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/13.jpg)
Hierarchical Clustering
Mark Voorhies Practical Bioinformatics
![Page 14: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/14.jpg)
Hierarchical Clustering
Mark Voorhies Practical Bioinformatics
![Page 15: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/15.jpg)
Hierarchical Clustering
Mark Voorhies Practical Bioinformatics
![Page 16: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/16.jpg)
It’s hard work at times, but you have to be realistic. If you have alarge database with many variables and your goal is to get a goodunderstanding of the interrelationships, then, unless you get lucky,
this complex structure is bound to require some hard work tounderstand.
Bill Cleveland and Rick Beckerhttp://stat.bell-labs.com/project/trellis/interview.html
Mark Voorhies Practical Bioinformatics
![Page 17: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/17.jpg)
Using JavaTreeView
Mark Voorhies Practical Bioinformatics
![Page 18: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/18.jpg)
Adjust pixel settings for global view
Mark Voorhies Practical Bioinformatics
![Page 19: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/19.jpg)
Adjust pixel settings for global view
Mark Voorhies Practical Bioinformatics
![Page 20: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/20.jpg)
Select annotation columns
Mark Voorhies Practical Bioinformatics
![Page 21: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/21.jpg)
Select annotation columns
Mark Voorhies Practical Bioinformatics
![Page 22: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/22.jpg)
Select URL for gene annotations
Mark Voorhies Practical Bioinformatics
![Page 23: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/23.jpg)
Select URL for gene annotations
Mark Voorhies Practical Bioinformatics
![Page 24: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/24.jpg)
Activate and detach annotation window
Mark Voorhies Practical Bioinformatics
![Page 25: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/25.jpg)
Activate and detach annotation window
Mark Voorhies Practical Bioinformatics
![Page 26: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/26.jpg)
Activate and detach annotation window
Mark Voorhies Practical Bioinformatics
![Page 27: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/27.jpg)
Clustering exercises – Negative controls
Write functions to reproduce the shuffling controls in figure 3 of theEisen paper (removing correlations among genes and/or arrays).
def s h u f f l e G e n e s ( s e l f , s eed = None ) :””” S h u f f l e e x p r e s s i o n mat r i x by row . ”””import randomi f ( seed != None ) :
random . seed ( seed )i n d i c e s = range ( l e n ( s e l f . genes ) )random . s h u f f l e ( i n d i c e s )genes = [ s e l f . geneName [ i ] f o r i i n i n d i c e s ]s e l f . geneName = genesa nno t a t i o n s = [ s e l f . geneAnn [ i ] f o r i i n i n d i c e s ]s e l f . geneAnn = genesnum = [ s e l f . num [ i ] f o r i i n i n d i c e s ]s e l f . num = num
Mark Voorhies Practical Bioinformatics
![Page 28: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/28.jpg)
Clustering exercises – Negative controls
Write functions to reproduce the shuffling controls in figure 3 of theEisen paper (removing correlations among genes and/or arrays).
def s h u f f l e G e n e s ( s e l f , s eed = None ) :””” S h u f f l e e x p r e s s i o n mat r i x by row . ”””import randomi f ( seed != None ) :
random . seed ( seed )i n d i c e s = range ( l e n ( s e l f . genes ) )random . s h u f f l e ( i n d i c e s )genes = [ s e l f . geneName [ i ] f o r i i n i n d i c e s ]s e l f . geneName = genesa nno t a t i o n s = [ s e l f . geneAnn [ i ] f o r i i n i n d i c e s ]s e l f . geneAnn = genesnum = [ s e l f . num [ i ] f o r i i n i n d i c e s ]s e l f . num = num
Mark Voorhies Practical Bioinformatics
![Page 29: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/29.jpg)
Clustering exercises – Negative controls
Write functions to reproduce the shuffling controls in figure 3 of theEisen paper (removing correlations among genes and/or arrays).
def s hu f f l eRows ( s e l f , s eed = None ) :”””Permute r a t i o v a l u e s w i t h i n rows . ”””import randomi f ( seed != None ) :
random . seed ( seed )f o r i i n s e l f . num :
random . s h u f f l e ( i )
def s h u f f l e C o l s ( s e l f , s eed = None ) :”””Permute r a t i o v a l u e s w i t h i n columns . ”””import randomi f ( seed != None ) :
random . seed ( seed )# Transpose t h e e x p r e s s i o n m a t r i xc o l s = [ ]f o r c o l i n x range ( l e n ( s e l f . num [ 0 ] ) ) :
c o l s . append ( [ row [ c o l ] f o r row i n s e l f . num ] )# S h u f f l ef o r i i n c o l s :
random . s h u f f l e ( i )# Transpose back to o r i g i n a l o r i e n t a t i o ns e l f . num = [ ]f o r row i n x range ( l e n ( c o l s ) ) :
s e l f . num . append ( [ c o l [ row ] f o r c o l i n row ] )
Mark Voorhies Practical Bioinformatics
![Page 30: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/30.jpg)
Clustering exercises – Negative controls
Write functions to reproduce the shuffling controls in figure 3 of theEisen paper (removing correlations among genes and/or arrays).
def s hu f f l eRows ( s e l f , s eed = None ) :”””Permute r a t i o v a l u e s w i t h i n rows . ”””import randomi f ( seed != None ) :
random . seed ( seed )f o r i i n s e l f . num :
random . s h u f f l e ( i )
def s h u f f l e C o l s ( s e l f , s eed = None ) :”””Permute r a t i o v a l u e s w i t h i n columns . ”””import randomi f ( seed != None ) :
random . seed ( seed )# Transpose t h e e x p r e s s i o n m a t r i xc o l s = [ ]f o r c o l i n x range ( l e n ( s e l f . num [ 0 ] ) ) :
c o l s . append ( [ row [ c o l ] f o r row i n s e l f . num ] )# S h u f f l ef o r i i n c o l s :
random . s h u f f l e ( i )# Transpose back to o r i g i n a l o r i e n t a t i o ns e l f . num = [ ]f o r row i n x range ( l e n ( c o l s ) ) :
s e l f . num . append ( [ c o l [ row ] f o r c o l i n row ] )
Mark Voorhies Practical Bioinformatics
![Page 31: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/31.jpg)
Clustering exercises – Negative controls
Write functions to reproduce the shuffling controls in figure 3 of theEisen paper (removing correlations among genes and/or arrays).
def s hu f f l eRows ( s e l f , s eed = None ) :”””Permute r a t i o v a l u e s w i t h i n rows . ”””import randomi f ( seed != None ) :
random . seed ( seed )f o r i i n s e l f . num :
random . s h u f f l e ( i )
def s h u f f l e C o l s ( s e l f , s eed = None ) :”””Permute r a t i o v a l u e s w i t h i n columns . ”””import randomi f ( seed != None ) :
random . seed ( seed )# Transpose t h e e x p r e s s i o n m a t r i xc o l s = [ ]f o r c o l i n x range ( l e n ( s e l f . num [ 0 ] ) ) :
c o l s . append ( [ row [ c o l ] f o r row i n s e l f . num ] )# S h u f f l ef o r i i n c o l s :
random . s h u f f l e ( i )# Transpose back to o r i g i n a l o r i e n t a t i o ns e l f . num = [ ]f o r row i n x range ( l e n ( c o l s ) ) :
s e l f . num . append ( [ c o l [ row ] f o r c o l i n row ] )
Mark Voorhies Practical Bioinformatics
![Page 32: Practical Bioinformaticshisto.ucsf.edu/BMS270/BMS270_2018/slides/Slides04_DistanceMatrices.pdfMark Voorhies Practical Bioinformatics. Homework 1 Explore di erent clustering methods](https://reader033.vdocuments.us/reader033/viewer/2022042414/5f2e83dc388774014f19ab1a/html5/thumbnails/32.jpg)
Homework
1 Explore different clustering methods and/or distance methods2 Try additional shufflings of the data: how do they affect your
ability to cluster the data? C.f. figure 3 the Eisen paper
Permute the columnsIndependently permute the columns of each row
Mark Voorhies Practical Bioinformatics