visualisation of multiple sequence alignments vizbi 2011 des higgins conway institute university...

59
Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Upload: warren-daniel

Post on 17-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Visualisation of Multiple Sequence Alignments

VIZBI 2011

Des HigginsConway Institute

University College Dublin

Ireland

Page 2: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Multiple Alignment?

• Align 3 or more sequences together– Homologous residues lined up in columns

Whale myoglobin ----VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTLamprey globin GSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTP---EFFPKFKGLTTLupin globin ---GALTESQAALVKSSWEEF--NIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

• Needed because of – Orthologues from different speciesBut mainly:– Paralogues from Gene duplications

• Multi-gene families– e.g. humans have approx. 500 protein kinases

Page 3: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Human Protein Kinases

The human kinome comprises 40 atypical PKs and 478 classical PKs. The latter

consist of 388 serine/threonine kinases, 90

tyrosine kinases and 50 sequences which lack a functional catalytic site.

(Manning et al., Science, 2002)

Page 4: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLSTHorse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSNHuman alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTLamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTTLupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE *: : : * . : .: * : * : .  Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRLHorse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRLHuman alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKLHorse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKLWhale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEFLamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKVLupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV . .:: *. : . : *. * . : . Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQGLamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

Globin Multiple Alignment

1. Visualise the residues/gaps?

Page 5: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLSTHorse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSNHuman alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTLamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTTLupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE *: : : * . : .: * : * : .  Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRLHorse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRLHuman alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKLHorse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKLWhale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEFLamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKVLupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV . .:: *. : . : *. * . : . Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQGLamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

Globin Multiple Alignment

Page 6: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLSTHorse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSNHuman alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTLamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTTLupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE *: : : * . : .: * : * : .  Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRLHorse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRLHuman alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKLHorse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKLWhale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEFLamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKVLupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV . .:: *. : . : *. * . : . Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQGLamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

Globin Multiple Alignment

Alpha helices

Page 7: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLSTHorse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSNHuman alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTLamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTTLupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE *: : : * . : .: * : * : .  Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRLHorse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRLHuman alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKLHorse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKLWhale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEFLamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKVLupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV . .:: *. : . : *. * . : . Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQGLamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

Globin Multiple Alignment

Haem binding Histidines

Page 8: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLSTHorse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSNHuman alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTLamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTTLupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE *: : : * . : .: * : * : .  Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRLHorse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRLHuman alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKLHorse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKLWhale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEFLamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKVLupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV . .:: *. : . : *. * . : . Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQGLamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

Globin Multiple Alignment

2. Visualise the sequence groupings?

Horse beta

Human beta Horse alpha Human alpha Whale myoglobin Lamprey cyanohaemoglobin Lupin leghaemoglobin

Page 9: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland
Page 10: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

So: What is the Problem?

• What if N >> 100,000?

• e.g. SSU rRNA– www.arb-silva.de– 1,471,257 seqs

• e.g. ABC transporters– PFAM– ABC_tran PF00005– 127,458 seqs

• Metagenomics

Page 11: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

•Sequence 10,000 vertebrate genomes!

=>5,000,000 protein kinases, GPCRs

Page 12: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

SequenceJuxtaposer: Fluid Navigation For Large-Scale Sequence Comparison In Context James Slack Kristian Hildebrandy Tamara Munzner Katherine St. John. Proc. German Conference on Bioinformatics 2004, pp 37-42

Poster D03 VIZBI, 2011

Sequence Surveyor: scalable multiple sequence alignment overview visualisation. Danielle Albers, Colin Dewey, Michael Gleicher

Poster D09 VIZBI, 2011

JProfileGrid: visualising very large multiple sequence alignments.

Alberto Roca, Aaron Abajian, David Vigerust

Page 13: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

This talk

• How to make huge multiple alignments

• How to cluster > 100,000 sequences

• MDS/PCA on big datasets

Page 14: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Multiple Sequence Alignment

• NP complete

• Mainly use: “Progressive Alignment”– Greedy heuristic– Use a tree/clustering of the seqs

• Barton and Sternberg (1988)Feng and Doolittle (1987)Higgins and Sharp (1988) Hogeweg and Hesper (1984)Willlie Taylor (1987)

Page 15: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLSTHorse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSNHuman alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTLamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTTLupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE *: : : * . : .: * : * : .  Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRLHorse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRLHuman alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKLHorse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKLWhale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEFLamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKVLupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV . .:: *. : . : *. * . : . Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQGLamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA--- : : .: . .. . :

Horse beta

Human beta Horse alpha Human alpha Whale myoglobin Lamprey cyanohaemoglobin Lupin leghaemoglobin

“Guide Tree”

Page 16: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLSTHorse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSNHuman alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTLamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTTLupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE *: : : * . : .: * : * : .  Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRLHorse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRLHuman alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKLHorse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKLWhale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEFLamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKVLupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV . .:: *. : . : *. * . : . Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQGLamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA--- : : .: . .. . :

Horse beta

Human beta Horse alpha Human alpha Whale myoglobin Lamprey cyanohaemoglobin Lupin leghaemoglobin

Page 17: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLSTHorse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSNHuman alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTLamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTTLupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE *: : : * . : .: * : * : .  Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRLHorse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRLHuman alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKLHorse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKLWhale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEFLamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKVLupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV . .:: *. : . : *. * . : . Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQGLamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA--- : : .: . .. . :

Horse beta

Human beta Horse alpha Human alpha Whale myoglobin Lamprey cyanohaemoglobin Lupin leghaemoglobin

Page 18: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLSTHorse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSNHuman alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTLamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTTLupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE *: : : * . : .: * : * : .  Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRLHorse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRLHuman alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKLHorse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKLWhale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEFLamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKVLupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV . .:: *. : . : *. * . : . Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQGLamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA--- : : .: . .. . :

Horse beta

Human beta Horse alpha Human alpha Whale myoglobin Lamprey cyanohaemoglobin Lupin leghaemoglobin

Page 19: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Clustal

• 66,000 citations

• Clustal1-Clustal4 – 1988, Paul Sharp, Dublin

• Clustal V 1992– EMBL Heidelberg, – Rainer Fuchs

– Alan Bleasby • Clustal W, Clustal X 1994-2005

– Toby Gibson, EMBL, Heidelberg– Julie Thompson, ICGEB, Strasbourg

• Clustal W and Clustal X 2.0 2007– University College Dublin

www.clustal.org

Page 20: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Complexity

• Guide tree constructionO(N2)

• Later Progressive AlignmentO(N)

• Guide tree construction is limiting>10,000 seq alignment is tough

Page 21: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

PartTree

• MAFFT Package• Select n sequences where n << N• UPGMA on n sequences• Cluster the remainder (N-n) with their

closest clusters

Katoh, K., Toh, H., 2007. PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. Bioinformatics 23, 372–374.

Page 22: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Embedding?

• Replace each sequence by a Vector– Vector-Vector distances

• MUCH faster than • Seq. – Seq. distances

• Vectors very fast/simple to cluster• e.g. cluster 10,000 vectors of length 150

• <<1 min on 1 processor • UPGMA

• e.g. cluster 300,000 vectors of length 300 • 6 mins• k-means, k = 300

Page 23: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Embedding papers

• FastMap • Faloutsos, C., Lin, K. (1995) FastMap: A Fast Algorithm for

Indexing Data-Mining and Visualisation of Traditional and Multimedia Datasets, Proc. 1995 ACM SIGMOD International Con. on Management of Data, pp.163–174.

• Sparsemap• G. Hristescu and M. Farach-Colton. Cluster-preserving

embedding of proteins. Technical Report 99-50, Computer Science Department, Rutgers University, 1999.

Page 24: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

mBED

• Select k seqs “randomly”– k << N– k α logN

• Use distance to each of these k “references” – k long vector for each sequence

• Use heuristics – avoid duplicates – find outliers

• Very fast and simple– Complexity O(kN) i.e. O(NlogN)

• Blackshields G, Sievers F, Shi W, Wilm A, Higgins DG. (2010)Sequence embedding for fast construction of guide trees for multiple sequence alignment. Algorithms Mol Biol. 14;5:21.

Page 25: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

N

N

mBED

k seeds

k

N

Page 26: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

MDS visualisation?

• Do PCA on Embedded sequences

• 3994 H3N2 HA sequences– 1967 (blue)

- 2008 (orange)

Page 27: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Guide Tree Quality• 1000 random

guide trees

• 1000 sparsemap trees

• Clustal tree

• mBED

Page 28: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Clustal Ω

• Release first version by April 2011• Scalable

– mBed– Gordon Blackshields

• Accurate– HMM-HMM alignment– HHalign– Johannes Söding, Munich.

• Re-use old alignments– Kevin Karplus– UCSC

Page 29: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

• Align 120,000 abc transporters– 6 hours on 1 core

• More accurate than – MUSCLE or MAFFT

• Coming soon...

Fabian SieversAndreas WilmDavid Dineen

Page 30: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

MDS/PCA etc.

• Dimension reduction• Treat alignment columns as variables

– PCA • Principal Components Analysis

– CA• Correspondence Analysis, Jean Paul Benzécri

• Use NxN distance matrix– MDS– PCOORD

Page 31: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Use CA, PCA for Sequences?

• every alignment column: – 20 binary

variables

– Or several physicochemical properties

Page 32: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

d = 0.05

EC_4_117 EC_4_0

EC_1_1 EC_1_19

EC_4_1 EC_4_91 EC_4_89 EC_4_90 EC_4_92 EC_4_93

EC_4_98 EC_4_99 EC_4_97 EC_4_95 EC_4_94 EC_4_96

EC_4_113 EC_4_114

EC_4_108 EC_4_109 EC_4_110 EC_4_111 EC_4_112 EC_4_106 EC_4_107 EC_4_102 EC_4_100 EC_4_101 EC_4_104 EC_4_103 EC_4_105 EC_4_116

EC_4_88

EC_1_0

EC_36_5 EC_36_2 EC_36_4 EC_36_3 EC_36_6 EC_36_0 EC_36_1

EC_1_15 EC_1_16

EC_4_44 EC_4_115

EC_1_13 EC_1_14

EC_4_87

EC_4_46

EC_1_17 EC_1_18

EC_4_25 EC_4_24 EC_4_23 EC_4_21 EC_4_22 EC_4_18 EC_4_19 EC_4_16 EC_4_17 EC_4_20 EC_4_42 EC_4_43 EC_4_40 EC_4_41

EC_4_39 EC_4_45

EC_4_36 EC_4_37 EC_4_38

EC_4_34 EC_4_35 EC_4_32 EC_4_33 EC_4_29 EC_4_26 EC_4_30 EC_4_31 EC_4_27

EC_4_28

EC_1_2

EC_1_4 EC_1_7 EC_1_5 EC_1_6 EC_1_3 EC_1_9 EC_1_8

EC_1_10 EC_1_11 EC_1_12

EC_4_83 EC_4_84 EC_4_85 EC_4_86 EC_4_49 EC_4_81 EC_4_79 EC_4_80

EC_4_78 EC_4_77 EC_4_76 EC_4_48 EC_4_47

EC_4_74 EC_4_75 EC_4_72 EC_4_73 EC_4_68 EC_4_69 EC_4_63 EC_4_66 EC_4_64

EC_4_65 EC_4_67 EC_4_70 EC_4_71

EC_4_50 EC_4_82 EC_4_52 EC_4_51 EC_4_54 EC_4_53 EC_4_55 EC_4_56 EC_4_57 X5PTP_EC_4 EC_4_58 EC_4_62 EC_4_61 EC_4_59 EC_4_60

EC_4_6 EC_4_7 EC_4_5 EC_4_2 EC_4_3 EC_4_4 EC_4_15 EC_4_14

EC_4_13

EC_4_12 EC_4_11 EC_4_8 EC_4_9 EC_4_10

d = 0.05

Chymotrypsin

Elastase

Tripsin

d = 0.1

X3N

X7A

X10N

X14W

X16S X18I

X54V

X66T X70R

X82E

X82G

X87L

X92I

X93I

X93F X95N

X98W

X98Y

X132Y

X137C X154T

X154V

X155S

X155T

X162S

X165N

X180Q

X181A X183L

X196Y X204S

X228K

X229D

X229S

X232Q

X232M X243Q

X265S

X273K

X275G

Chymotrypsin

Elastase

Tripsin

0 e+

004

e-

048

e-

04

Eigenvalues

15 Chymotrypsins

31 Trypsins10 Elastases

Trypsin-like serine proteases

•Correspondence Analysis•Supervise:

•Between Groups Analysis•Dolédec and Chessel (1987)(similar to PLS discriminant analysis)

Page 33: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

d = 0.05

EC_4_117 EC_4_0

EC_1_1 EC_1_19

EC_4_1 EC_4_91 EC_4_89 EC_4_90 EC_4_92 EC_4_93

EC_4_98 EC_4_99 EC_4_97 EC_4_95 EC_4_94 EC_4_96

EC_4_113 EC_4_114

EC_4_108 EC_4_109 EC_4_110 EC_4_111 EC_4_112 EC_4_106 EC_4_107 EC_4_102 EC_4_100 EC_4_101 EC_4_104 EC_4_103 EC_4_105 EC_4_116

EC_4_88

EC_1_0

EC_36_5 EC_36_2 EC_36_4 EC_36_3 EC_36_6 EC_36_0 EC_36_1

EC_1_15 EC_1_16

EC_4_44 EC_4_115

EC_1_13 EC_1_14

EC_4_87

EC_4_46

EC_1_17 EC_1_18

EC_4_25 EC_4_24 EC_4_23 EC_4_21 EC_4_22 EC_4_18 EC_4_19 EC_4_16 EC_4_17 EC_4_20 EC_4_42 EC_4_43 EC_4_40 EC_4_41

EC_4_39 EC_4_45

EC_4_36 EC_4_37 EC_4_38

EC_4_34 EC_4_35 EC_4_32 EC_4_33 EC_4_29 EC_4_26 EC_4_30 EC_4_31 EC_4_27

EC_4_28

EC_1_2

EC_1_4 EC_1_7 EC_1_5 EC_1_6 EC_1_3 EC_1_9 EC_1_8

EC_1_10 EC_1_11 EC_1_12

EC_4_83 EC_4_84 EC_4_85 EC_4_86 EC_4_49 EC_4_81 EC_4_79 EC_4_80

EC_4_78 EC_4_77 EC_4_76 EC_4_48 EC_4_47

EC_4_74 EC_4_75 EC_4_72 EC_4_73 EC_4_68 EC_4_69 EC_4_63 EC_4_66 EC_4_64

EC_4_65 EC_4_67 EC_4_70 EC_4_71

EC_4_50 EC_4_82 EC_4_52 EC_4_51 EC_4_54 EC_4_53 EC_4_55 EC_4_56 EC_4_57 X5PTP_EC_4 EC_4_58 EC_4_62 EC_4_61 EC_4_59 EC_4_60

EC_4_6 EC_4_7 EC_4_5 EC_4_2 EC_4_3 EC_4_4 EC_4_15 EC_4_14

EC_4_13

EC_4_12 EC_4_11 EC_4_8 EC_4_9 EC_4_10

d = 0.05

Chymotrypsin

Elastase

Tripsin

d = 0.1

X3N

X7A

X10N

X14W

X16S X18I

X54V

X66T X70R

X82E

X82G

X87L

X92I

X93I

X93F X95N

X98W

X98Y

X132Y

X137C X154T

X154V

X155S

X155T

X162S

X165N

X180Q

X181A X183L

X196Y X204S

X228K

X229D

X229S

X232Q

X232M X243Q

X265S

X273K

X275G

Chymotrypsin

Elastase

Tripsin

0 e+

004

e-

048

e-

04

Eigenvalues

Trypsin

Page 34: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

d = 0.05

EC_4_117 EC_4_0

EC_1_1 EC_1_19

EC_4_1 EC_4_91 EC_4_89 EC_4_90 EC_4_92 EC_4_93

EC_4_98 EC_4_99 EC_4_97 EC_4_95 EC_4_94 EC_4_96

EC_4_113 EC_4_114

EC_4_108 EC_4_109 EC_4_110 EC_4_111 EC_4_112 EC_4_106 EC_4_107 EC_4_102 EC_4_100 EC_4_101 EC_4_104 EC_4_103 EC_4_105 EC_4_116

EC_4_88

EC_1_0

EC_36_5 EC_36_2 EC_36_4 EC_36_3 EC_36_6 EC_36_0 EC_36_1

EC_1_15 EC_1_16

EC_4_44 EC_4_115

EC_1_13 EC_1_14

EC_4_87

EC_4_46

EC_1_17 EC_1_18

EC_4_25 EC_4_24 EC_4_23 EC_4_21 EC_4_22 EC_4_18 EC_4_19 EC_4_16 EC_4_17 EC_4_20 EC_4_42 EC_4_43 EC_4_40 EC_4_41

EC_4_39 EC_4_45

EC_4_36 EC_4_37 EC_4_38

EC_4_34 EC_4_35 EC_4_32 EC_4_33 EC_4_29 EC_4_26 EC_4_30 EC_4_31 EC_4_27

EC_4_28

EC_1_2

EC_1_4 EC_1_7 EC_1_5 EC_1_6 EC_1_3 EC_1_9 EC_1_8

EC_1_10 EC_1_11 EC_1_12

EC_4_83 EC_4_84 EC_4_85 EC_4_86 EC_4_49 EC_4_81 EC_4_79 EC_4_80

EC_4_78 EC_4_77 EC_4_76 EC_4_48 EC_4_47

EC_4_74 EC_4_75 EC_4_72 EC_4_73 EC_4_68 EC_4_69 EC_4_63 EC_4_66 EC_4_64

EC_4_65 EC_4_67 EC_4_70 EC_4_71

EC_4_50 EC_4_82 EC_4_52 EC_4_51 EC_4_54 EC_4_53 EC_4_55 EC_4_56 EC_4_57 X5PTP_EC_4 EC_4_58 EC_4_62 EC_4_61 EC_4_59 EC_4_60

EC_4_6 EC_4_7 EC_4_5 EC_4_2 EC_4_3 EC_4_4 EC_4_15 EC_4_14

EC_4_13

EC_4_12 EC_4_11 EC_4_8 EC_4_9 EC_4_10

d = 0.05

Chymotrypsin

Elastase

Tripsin

d = 0.1

X3N

X7A

X10N

X14W

X16S X18I

X54V

X66T X70R

X82E

X82G

X87L

X92I

X93I

X93F X95N

X98W

X98Y

X132Y

X137C X154T

X154V

X155S

X155T

X162S

X165N

X180Q

X181A X183L

X196Y X204S

X228K

X229D

X229S

X232Q

X232M X243Q

X265S

X273K

X275G

Chymotrypsin

Elastase

Tripsin

0 e+

004

e-

048

e-

04

Eigenvalues

Trypsin

Wallace IM, Higgins DG.(2007) Supervised multivariate analysis of sequence groups to identify specificity determining residues. BMC Bioinformatics. 8:135.

Page 35: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland
Page 36: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

MDS

• Multidimensional Scaling• Fit distances to a NxN distance matrix• Use euclidean distances?

– “Classical scaling”= Principal Co-Ordinates Analysis

• PCOORD, John Gower– Gower, J. C. (1966). Some distance properties of latent root and vector

methods used in multivariate analysis. Biometrika 53, 325-328.– Higgins, D.G. (1992) Sequence ordinations: a multivariate analysis approach to

analysing large sequence data sets. CABIOS, 8, 15-22.

– Complexity at least O(N2)

Page 37: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Large scale MDS?

• SC-MDS• Jengnan Tzeng, Henry Horng-Shing Lu, and Wen-Hsiung Li (2008)

Multidimensional scaling for large genomic data sets BMC Bioinformatics. 2008; 9: 179.

• mBED• Blackshields et al., (2010)

• PCOORD or MDS on a subset of the sequences• add the rest later

• Landmark MDS + Nystrom approximation• V. de Silva, J.B. Tenenbaum, “Sparse multidimensional scaling using

landmark points.” (2004) Technical report, Stanford University.

Easily do MDS on >100,000 seqs

Page 38: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

• 307,434 lentivirus (HIV etc) sequences from UniProt.

Page 39: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

H3N2 flu sequences

• Weifeng Shi

• 8167 HA sequences – human H3N2 influenza viruses

• DNAdist in Phylip – K2P (Kimura two parameter) model

• Python: MatplotlIb

Page 40: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

1960s

Page 41: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

1970s

Page 42: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

1980s

Page 43: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

1990s

Page 44: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

2000

Page 45: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

2001

Page 46: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

2002

Page 47: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

2003

Page 48: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

2004

Page 49: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

2005

Page 50: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

2006

Page 51: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

2007

Page 52: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

2008

Page 53: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

2009

Page 54: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

2010

Page 55: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

BGA, CIAAedin CulhaneIan JefferyStephen MaddenIain WallaceGuy Perriere, Lyons

Clustal OmegaFabian SieversAndreas WilmDavid DineenJohannes Soeding, MunichRodrigo Lopez, EBI

mBEDGordon BlackshieldsMark Larkin

Flu MDSWeifeng Shi

Page 56: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Supervised PCA or CA?

Malate Dehydrogenases

Lactate Dehydrogenases

Page 57: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

ADE-4 http://pbil.univ-lyon1.fr/ADE-4/

Thioulouse J., Chessel D., Dolédec S., & Olivier J.M. (1997) ADE-4: a multivariate analysis and graphical display software. Statistics and Computing, 7, 1, 75-83.

Page 58: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

• MADE4 – Culhane, A., Thiolouse, J., Perriere, G., Higgins, D.G. (2005)

MADE4: an R package for multivariate analysis of gene expression data. Bioinformatics. 21(11):2789-2790.

Between Group Analysis BGA

Dolédec, S. & Chessel, D. (1987)

Acta Oecologica, Oecologica Generalis, 8, 3, 403-426.Supervised Correspondence Analysis or PCA

CO-Inertia Analysis CIA

Dolédec, S. & Chessel, D. (1994) Freshwater Biology, 31, 277-294.

Thioulouse, J. & Lobry, J.R. (1995) CABIOS, 11, 321-329

2 datasets; Simultaneous CA or PCA

Page 59: Visualisation of Multiple Sequence Alignments VIZBI 2011 Des Higgins Conway Institute University College Dublin Ireland

Very large datasets• e.g. 381,602 tRNA

from RF00005

• 40 mins embeddingPlus 6 mins to cluster with k-means– k = 300