fisica computazionale applicata alle macromolecole
DESCRIPTION
Fisica Computazionale applicata alle Macromolecole. Pier Luigi Martelli Università di Bologna [email protected] 051 2094005 338 3991609. Predizione della struttura proteica. 3D structure prediction of proteins. New folds. Existing folds. Building by homology. Ab initio prediction. - PowerPoint PPT PresentationTRANSCRIPT
Fisica Computazionale applicata alle Macromolecole
Pier Luigi Martelli
Università di [email protected]
051 2094005338 3991609
Predizione della struttura proteica
New folds Existing folds
ThreadingAb initio
prediction
Building by homology
Homology (%)
0 10 20 30 40 50 60 70 80 90 100
3D structure prediction of proteins
“Comparative modelling” di proteine
Da: Martì-Renom et al. (2000) Annu. Rev. Biophys. Biomol. Struct. 29:291
“Comparative modelling” di proteine
Da: Sanchez et al. (2000) Nature Struct. Biol. (Suppl) 7:986
Modelling per omologia
Modelli affidabili solo per il 45% delle
proteine di Swiss Prot(MODBASE)
http://alto.compbio.ucsf.edu/modbase
E’ possibile abbassare la soglia di identità di
sequenza?
Su larga scala?
Selection of Templates
Alignment of the Target sequence with Template
Modelling of the Target on the Template
Evaluation of the Model
Comparative Modelling
THE TEMPLATE: 1f13
TGL3 MAALGVQSINWQKAFNRQAHHTDKFSSQELILRRGQNFQVLMIMNKGLGSNERLEFIDTT 601F13A VHLFKERWDTNKVDHHTDKYENNKLIVRRGQSFYVQIDFSRPYDPRRDLFRVEYVIGRYP 60 : : : . : .: : :..:*: :. * : . . : .
TGL3 GPYPSESAMTKAVFPLSNGSSGGWSAVLQASNGNTLTISISSPASAPIGRYTMALQIFSQ 1201F13A QENKGTYIPVPIVSELQSGKWGAKIVMREDRSVRLSIQSSPKCIVGKFRMYVAVWTPYGV 120 . . * *..*. *. .: : . . * .. . : *. . :.
TGL3 GGISSVKLGTFILLFNPWLNVDSVFMGNHAEREEYVQEDAGIIFVGSTNRIGMIGWNFGQ 1801F13A LRTSRNPETDTYILFNPWCEDDAVYLDNEKEREEYVLNDIGVIFYGEVNDIKTRSWSYGQ 180 * :***** : *:*::.*. ****** :* *:** *..* * .*.:**
TGL3 FEEDILSICLSILDRSLNFRRDAATDVASRNDPKYVGRVLSAMINSNDDNGVLAGNWSGT 2401F13A FEDGILDTCLYVMDR-------AQMDLSGRGNPIKVSRVGSAMVNAKDDEGVLVGSWDNI 233 **:.**. ** ::** * *::.*.:* *.** ***:*::**:***.*.*..
TGL3 YTGGRDPRSWDGSVEILKNWKKSGFSPVRYGQCWVFAGTLNTALRSLGIPSRVITNFNSA 3001F13A YAYGVPPSAWTGSVDILLEYRSSENPVRYGQCWVFAGVFNTFLRCLGIPARIVTNYFSAH 293 *: * * :* ***:** :::.* . . . . . * . *.:
TGL3 HDTDRNLSVDVYYDPMGNPLDKGSDSVWNFHVWNEGWFVRSDLGPPYGGWQVLDATPQER 3601F13A DNDANLQMDIFLEEDGNVNSKLTKDSVWNYHCWNEAWMTRPDLPVGFGGWQAVDSTPQEN 353 .: . . : . . .*****:* ***.*:.*.** :****.:*:****.
Sequence alignment of TGL3_HUMAN with 1f13
TGL3 SQGVFQCGPASVIGVREGDVQLNFDMPFIFAEVNADRITWLYDNTTGKQWKNSVNSHTIG 4201F13A SDGMYRCGPASVQAIKHGHVCFQFDAPFVFAEVNSDLIYITAKKDGTHVVENVDATHIGK 413 *:*:::****** .::.*.* ::** **:*****:* * .: : :* :*
TGL3 RYISTKAVGSNARMDVTDKYKYPEGSDQERQVFQKALGKLKPNTPFAATSSMGLETEEQE 4801F13A LIVTKQIGGDGMMDITDTYKFQEGQEEERLALETALMYGAKKPLNT--------EGVMKS 465 ::.: *.. . .::. : : * * :.
TGL3 PSIIGKLKVAGMLAVGKEVNLVLLLKNLSRDTKTVTVNMTAWTIIYNGTLVHEVWKDSAT 5401F13A RSNVDMDFEVENAVLGKDFKLSITFRNNSHNRYTITAYLSANITFYTGVPKAEFKKETFD 525 * :. . .:**:.:* : ::* *:: *:*. ::* :*.*. *. *::
TGL3 MSLDPEEEAEHPIKISYAQYERYLKSDNMIRITAVCKVPDESEVVVERDIILDNPTLTLE 6001F13A VTLEPLSFKKEAVLIQAGEYMGQLLEQASLHFFVTARINETRDVLAKQKSTVLTIPEIII 585 ::*:* . :..: *. .:* * .: ::: ...:: : :*:.::. : . . :
TGL3 VLNEARVRKPVNVQMLFSNPLDEPVRDCVLMVEGSGLLLGNLKIDVPTLGPKERSRVRFD 6601F13A KVRGTQVVGSDMTVTVEFTNPLKETLRNVWVHLDGPGVTRPMKKMFREIRPNSTVQWEEV 645 :. ::* . . : . : . * : .. : :* . : *:. : .
TGL3 ILPSRSGTKQLLADFSCNKFPAIKAMLSIDVAE 6931F13A CRPWVSGHRKLIASMSSDSLRHVYGELDVQIQR 678 * ** ::*:*.:*.:.: : . *.::: .
sequence identity 34%
Building the Model: MODELLER
http://salilab.org/modeller/modeller.html
THE TARGET: TGL3_HUMAN
Evaluating the Model: PROCHECK
http://biotech.ebi.ac.uk:8400/
http://www.expasy.ch/swissmod/SWISS-MODEL.html
Servers:
Servers:
http://www.salilab.org/modbase/
Modelling a bassa identità
•Scelta del template in base a dati sperimentali
La determinazione sperimentale della funzione o della presenza di metalli o gruppi prostetici riduce moltissimo il numero di fold possibili
Modelling a bassa identità
•Scelta del template in base a dati sperimentali
•Allineamento multiplo di proteine della stessa famiglia
La determinazione dei residui maggiormente conservati fissa alcuni residui importanti (nell’ambito della famiglia) la cui posizione deve essere mantenuta
Modelling a bassa identità
•Scelta del template in base a dati sperimentali
•Allineamento multiplo di proteine della stessa famiglia
•Utilizzo di predittori (struttura secondaria, accessibilità al solvente, stato di legame delle cisteine, segmenti transmembrana….)
TARGET PDDAEMQGTIRSLDENVRSKAKDYMRRIVSSICGIYGATCEVKFMEDVYPTTVNN-----TEMPLATE PASATLNADVRYARNEDFDAAMKTLEERAQQKKLP---EADVKVIVTR-----GRPAFNA
TARGET PEVTDEVMKILSSISTV------VETEPVLGAEDFSRFLQKAPGTYFFLGTRNEKKGCIYTEMPLATE GEGGKKLVDKAVAYYKEAGGTLGVEERTGGGTDAAYAALSG---KPVIES--LGLPGFGY
La predizione di caratteristiche strutturale del target aiuta l’allineamento col template
-elica -strand
Alcool deidrogenasi da Sulfolobus solfataricusDati sperimentali
•Contiene 2 atomi di zinco per monomero•Attiva come tetramero
Strutture presenti nella banca dati•Alcool deidrogenasi a 2 atomi di zinco, dimeriche
2OHX (fegato di cavallo)ID: 24%
•Alcool deidrogenasi a 1 atomo di zinco, tetrameriche1YKF (Thermoanaerobacterium brockii)ID: 23%
Monomeri simili (RMSD < 0.2 nm). Differenze in:• loop che coordina il secondo atomo di zinco• aree di tetramerizzazione
1 10 20 30 40 50 60 70 80 90 100 110ADH1_SULSO -----------MRAVRLVEIGKP-LSLQEIGVPKPKGPQVLIKVEAAGVCHSDVH-MRQGRFGNLRIVEDLGVKLPVTLGHEIAGKIEEVGDEVVG—-YSKGDLVAVNPWQG--EGNCYYCRIGEEHLCDSPR-------ADHE_HORSE ---STAGKVIKCKAAVLWEEKKP-FSIEEVEVAPPKAHEVRIKMVATGICRSDDH-VVSGTLV--------T-PLPVIAGHEAAGIVESIGEGVTT--VRPGDKV-IP-LFTPQCGKCRVCKHPEGNFCLKND-LSMPRGADHS_HORSE ---STAGKVIKCKAAVLWEQKKP-FSIEEVEVAPPKAHEVRIKMVAAGICRSDDH-VVSGTLV--------A-PLPVIAGHEAAGIVESIGEGVTT--VRPGDKV-IP-LFIPQCGKCSVCKHPEGNLCLKN--LSMPRGADH_GADCA ---ATVGKVIKCKAAVAWEANKP-LVIEEIEVDVPHANEIRIKIIATGVCHTDLYHLFEGKHK--------DG-FPVVLGHEGAGIVESVGPGVTE--FQPGEKV-IP-LFISQCGECRFCQSPKTNQCVKGWANES-PDADH7_HUMAN --MGTAGKVIKCKAAVLWEQKQP-FSIEEIEVAPPKTKEVRIKILATGICRTDDH-VIKGTMV--------S-KFPVIVGHEATGIVESIGEGVTT--VKPGDKV-IP-LFLPQCRECNACRNPDGNLCIRSDIT-G-RGADHX_HUMAN -----ANEVIKCKAAVAWEAGKP-LSIEEIEVAPPKAHEVRIKIIATAVCHTDAY-TLSGADP--------EGCFPVILGHEGAGIVESVGEGVTK--LKAGDTV-IP-LYIPQCGECKFCLNPKTNLCQKIRVTQG-KGADHB_HUMAN ---STAGKVIKCKAAVLWEVKKP-FSIEDVEVAPPKAYEVRIKMVAVGICRTDDH-VVSGNLV--------T-PLPVILGHEAAGIVESVGEGVTT--VKPGDKV-IP-LFTPQCGKCRVCKNPESNYCLKND-LGNP--ADH1_PEA MS-NTVGQIIKCRAAVAWEAGKP-LVIEEVEVAPPQAGEVRLKILFTSLCHTDVY-FWEAKGQ--------TPLFPRIFGHEAGGIVESVGEGVTH--LKPGDHA-LP-VFTGECGECPHCKSEESNMCDLLRINTD-RGADH3_ECOLI ---------MKSRAAVAFAPGKP-LEIVEIDVAPPKKGEVLIKVTHTGVCHTDAF-TLSGDDP--------EGVFPVVLGHEGAGVVVEVGEGVTS--VKPGDHV-IP-LYTAECGECEFCRSGKTNLCVAVRETQG-KGADH3_SOLTU MS-TTVGQVIRCKAAVAWEAGKP-LVMEEVDVAPPQKMEVRLKILYTSLCHTDVY-FWEAKGQ--------NPVFPRILGHEAAGIVESVGEGVTE--LAPGDHV-LP-VFTGECKDCAHCKSEESNMCSLLRINTD-RGADH2_BACST -----------MKAAVVNEFKKA-LEIKEVERPKLEEGEVLVKIEACGVCHTDLH-AAHGDWP-------IKPKLPLIPGHEGVGIVVEVAKGVKS--IKVGDRVGIP-WLYSACGECEYCLTGQETLCPHQL-------ADH1_ZYMMO -----------MKAAVITK-DHT-IEVKDTKLRPLKYGEALLEMEYCGVCHTDLH-VKNGDFG---------DETGRITGHEGIGIVKQVGEGVTS--LKAGDRASVA-WFFKGCGHCEYCVSGNETLCRNVE-------ADHP_ECOLI -----------MKAAVVTK-DHH-VDVTYKTLRSLKHGEALLKMECCGVCHTDLH-VKNGDFG---------DKTGVILGHEGIGVVAEVGPGVTS--LKPGDRASVA-WFYEGCGHCEYCNSGNETLCRSVK-------ADH2_EMENI --MAAPEIPKKQKAVIYDNPGTVSTKVVELDVPEPGDNEVLINLTHSGVCHSDFG-IMTNTWKILP----FPTQPGQVGGHEGVGKVVKLGAGAEASGLKIGDRVGVK-WISSACGQCPPCQDGADGLCFNQK-------ADH_MYCTU --------MSTVAAYAAMSATEP-LTKTTITRRDPGPHDVAIDIKFAGICHSDIH-TVKAEWG--------QPNYPVVPGHEIAGVVTAVGSEVTK--YRQGDRVGVG-CFVDSCRECNSCTRGIEQYCKPGAN------......... ............................................................................................................................................
120 130 140 150 160 170 180 190 200 210 220 230ADH1_SULSO ------WLGINF DG----------AYAEYVIVPHYKYMYKLRRLNAVEAAPLTCSGITTY-RAVRKASLDPTKTLLVVGAGGGLGTMAVQI-AKAVSGATIIGVDVREEAVEAAKRAGADYVINASMQ----D---PLAADHE_HORSE TMQ-DGTSRFT-CRGKPIHHFLGTSTFSQYTVVDEISVAKIDAASPLEKVCLIGCGFSTGYGSAVKVAKVTQGSTCAVFGLGG-VGLSVIMG-CKAAGAARIIGVDINKDKFAKAKEVGATECVNPQDYK---K--PIQEADHS_HORSE TMQ-DGTSRFT-CRGKPIHHFLGTSTFSQYTVVDEISVAKIDAASPLEKVCLVGCGFSTGYGSAVKVAKVTQGSTCAVFGLGG-VGLSVIMG-CKAAGAARIIGVDINKDKFAKAKEVGATECVNPQDYK---K--PIQEADH_GADCA VMS-PKETRFT-CKGRKVLQFLGTSTFSQYTVVNQIAVAKIDPSAPLDTVCLLGCGVSTGFGAAVNTAKVEPGSTCAVFGLGA-VGLAAVMG-CHSAGAKRIIAVDLNPDKFEKAKVFGATDFVNPNDHS---E--PISQADH7_HUMAN VLA-DGTTRFT-CKGKPVHHFMNTSTFTEYTVVDESSVAKIDDAAPPEKVCLIGCGFSTGYGAAVKTGKVKPGSTCVVFGLGG-VGLSVIMG-CKSAGASRIIGIDLNKDKFEKAMAVGATECISPKDST---K--PISEADHX_HUMAN LMP-DGTSRFT-CKGKTILHYMGTSTFSEYTVVADISVAKIDPLAPLDKVCLLGCGISTGYGAAVNTAKLEPGSVCAVFGLGG-VGLAVIMG-CKVAGASRIIGVDINKDKFARAKEFGATECINPQDFS---K--PIQEADHB_HUMAN TLQ-DGTRRFT-CRGKPIHHFLGTSTFSQYTVVDENAVAKIDAASPLEKVCLIGCGFSTGYGSAVNVAKVTPGSTCAVFGLGG-VGLSAVMG-CKAAGAARIIAVDINKDKFAKAKELGATECINPQDYK---K--PIQEADH1_PEA VMLNDNKSRFS-IKGQPVHHFVGTSTFSEYTVVHAGCVAKINPDAPLDKVCILSCGICTGLGATINVAKPKPGSSVAIFGLGA-VGLAAAEG-ARISGASRIIGVDLVSSRFELAKKFGVNEFVNPKEH----DK-PVQQADH3_ECOLI LMP-DGTTRFS-YNGQPLYHYMGCSTFSEYTVVAEVSLAKINPEANHEHVCLLGCGVTTGIGAVHNTAKVQPGDSVAVFGLGA-IGLAVVQG-ARQAKAGRIIAIDTNPKKFDLARRFGATDCINPNDYD---K--PIKDADH3_SOLTU VMINDGQSRFS-INGKPIYHFVGTSTFSEYTVVHVGCVAKINPLAPLDKVCVLSCGISTGLGATLNVAKPTKGSSVAIFGLGA-VGLAAAEG-ARIAGASRIIGVDLNASRFEQAKKFGVTEFVNPKDY----SK-PVQEADH2_BACST ------NGGYS-VDG----------GYAEYCKAPADYVAKIPDNLDPVEVAPILCAGVTTY-KALKVSGARPGEWVAIYGIGG-LGHIALQY-AKAMG-LNVVAVDISDEKSKLAKDLGADIAINGLKE----D---PVKADH1_ZYMMO ------NAGYT-VDG----------AMAEECIVVADYSVKVPDGLDPAVASSITCAGVTTY-KAVKVSQIQPGQWLAIYGLGG-LGNLALQY-AKNVFNAKVIAIDVNDEQLAFAKELGADMVINPKNE----D---AAKADHP_ECOLI ------NAGYS-VDG----------GMAEECIVVADYAVKVPDGLDSAAASSITCAGVTTY-KAVKLSKIRPGQWIAIYGLGG-LGNLALQY-AKNVFNAKVIAIDVNDEQLKLATEMGADLAINSHTE----D---AAKADH2_EMENI ------VSGYY-TPG----------TFQQYVLGPAQYVTPIPDGLPSAEAAPLLCAGVTVY-ASLKRSKAQPGQWIVISGAGGGLGHLAVQIAAKGMG-LRVIGVDHG-SKEELVKASGAEHFVDITKFPTGDKFEAISSADH_MYCTU ----FTYNSIG-KDGQP-----TQGGYSEAIVVDENYVLRIPDVLPLDVAAPLLCAGITLY-SPLRHWNAGANTRVAIIGLGG-LGHMGVKL-GAAMG-ADVTVLSQSLKKMEDGLRLGAKSYYATADP---------D-......... ............................................................................................................................................
240 250 260 270 280 290 300 310 320 330 340 347ADH1_SULSO EIRRITE-SK-GVDAVIDLNNSEKTLSVYPKALAKQ-GKYVMVGLFG---ADLHYHAPLITLS-EIQFVGS-LVG--NQSDFLGIMRLAEAG--KVKPMITKTMKLEEANEAIDNLENFKAIGRQVLIP---ADHE_HORSE VLTEMSN-G--GVDFSFEVIGRLDTMVTALSCCQEAYGVSVIVGVPPD--SQNLSMNPMLLLS-GRTWKGAIFGGFKSKDSVPKLVADFMAKKFALDPLITHVLPFEKINEGFDLLRSGESI-RTILTF---ADHS_HORSE VLTEMSN-G--GVDFSFEVIGRLDTMVAALSCCQEAYGVSVIVGVPPD--SQNLSMNPMLLLS-GRTWKGAIFGGFKSKDSVPKLVADFMAKKFALDPLITHVLPFEKINEGFDLLRSGKSI-RTILTF---ADH_GADCA VLSKMTN-G--GVDFSLECVGNVGVMRNALESCLKGWGVSVLVG-WTD--LHDVATRPIQLIA-GRTWKGSMFGGFKGKDGVPKMVKAYLDKKVKLDEFITHRMPLESVNDAIDLMKHGKCI-RTVLSLE--ADH7_HUMAN VLSEMTG-N--NVGYTFEVIGHLETMIDALASCHMNYGTSVVVGVPPS--AKMLTYDPMLLFT-GRTWKGCVFGGLKSRDDVPKLVTEFLAKKFDLDQLITHVLPFKKISEGFELLNSGQSI-RTVLTF---ADHX_HUMAN VLIEMTD-G--GVDYSFECIGNVKVMRAALEACHKGWGVSVVVGVAAS--GEEIATRPFQLVT-GRTWKGTAFGGWKSVESVPKLVSEYMSKKIKVDEFVTHNLSFDEINKAFELMHSGKSI-RTVVKI---ADHB_HUMAN VLKEMTD-G--GVDFSFEVIGRLDTMMASLLCCHEACGTSVIVGVPPA--SQNLSINPMLLLT-GRTWKGA-VYGGFKSKEGIPKLVADFMAKKFSLDALITHVLPFEKINEGFDLLHSGKSIRTVLTF---ADH1_PEA VIAEMTN-G--GVDRAVECTGSIQAMISAFECVHDGWGVAVLVGVPSK--DDAFKTHPMNFLN-ERTLKGTFYGNYKPRTDLPNVVEKYMKGELELEKFITHTVPFSEINKAFDYMLKGESI-RCIIKMEE-ADH3_ECOLI VLLDINK-W--GIDHTFECIGNVNVMRAALESAHRGWGQSVIIGVAVA--GQEISTRPFQLVT-GRVWKGSAFGGVKGRSQLPGMVEDAMKGDIDLEPFVTHTMSLDEINDAFDLMHEGKSI-RTVIRY---ADH3_SOLTU VIAEMTD-G--GVDRSVECTGHIDAMISAFECVHDGWGVAVLVGVPHK--EAVFKTHPMNFLN-ERTLKGTFFGNYKPRSDIPSVVEKYMNKELELEKFITHTLPFAEINKAFDLMLKGEGL-RCIITMED-ADH2_BACST AIHDQVG-G---VHAAISVAVNKKAFEQAYQSVKRG-GTLVVVGLPN---ADLPIPIFDTVLN-GVSVKGS-IVG--TRKDMQEALDFAARG--KVRPIV-ETAELEEINEVFERMEKGKINGRIVLKLKEDADH1_ZYMMO IIQEKVG-G---AHATVVTAVAKSAFNSAVEAIRAG-GRVVAVGLPP---EKMDLSIPRLVLD-GIEVLGS-LVG--TREDLKEAFQFAAEG--KVKPKV-TKRKVEEINQIFDEMEHGKFTGRMVVDFTHHADHP_ECOLI IVQEKTG-G---AHAAVVTAVAKAAFNSAVDAVRAG-GRVVAVGLPP---ESMSLDIPRLVLD-GIEVVGS-LVG--TRQDLTEAFQFAAEG--KVVPKV-ALRPLADINTIFTEMEEGKIRGRMVIDFRH-ADH2_EMENI HVKSLTTKG-LGAHAVIVCTASNIAYAQSLLFLRYN-GTMVCVGIPENEPQRIASAYPGLFIQKHVHVTGS-AVG--NRNEAIETMEFAARG--VIKAHF-REEKMEALTEIFKEMEEGKLQGRVVLDLS--ADH_MYCTU TFRKLR--G--GFDLILNTVSANLDLGQYLNLLDVD-GTLVELGIPEH--PMAVPAFALALMR--RSLAGSNIGG---IAETQEMLNFCAEH--GVTPEI-ELIEPDYINDAYERVLASDVRYRFVIDISAL......... ....................................................................................................................................
Allineamento di 87 ADH a 2 atomi di Zn per monomero
•38 residui sono conservati in più del 90% delle sequenze
•12 residui sono sempre conservati
Tra questi i residui coinvolti nel coordinare i due centri metallici
1 10 20 30 40 50 60 70 80 90 100 110ADH1_SULSO ----------MRAVRLVEIGKP--LSLQEIGVPKPKGPQVLIKVEAAGVCHSDVHMRQGRFGNLRIVEDLGVKLPVTLGHEIAGKIEEVGDEVVG--YSKGDLVAVNPWQG-EGNCYYCRIGEEHLCDS-----------ADH_CLOBE ----------MKGFAMLGINKLG---WIEKERPVAGSYDAIVRPLAVSPCTSDIHTVFEGA--------LGDRKNMILGHEAVGEVVEVGSEVKD--FKPGDRVIVPCTTPDWRSLEVQAGFQQHSN-------------ADH_THEBR ----------MKGFAMLSIGKVG---WIEKEKPAPGPFDAIVRPLAVAPCTSDIHTVFEGA--------IGERHNMILGHEAVGEVVEVGSEVKD--FKPGDRVVVPAITPDWRTSEVQRGYHQHSG-------------ADH1_SOLTU MSTTVGQVIRCKAAVAWEAGKP--LVMEEVDVAPPQKMEVRLKILYTSLCHTDVYFWEAKG--------QNPVFPRILGHEAAGIVESVGEGVTE--LGPGDHV-LPVFTGECKDCAHCKSEESNMCSL-----------ADH2_LYCES MSTTVGQVIRCKAAVAWEAGKP--LVMEEVDVAPPQKMEVRLKILYTSLCHTDVYFWEAKG--------QNPVFPRILGHEAAGIVESVGEGVTD--LAPGDHV-LPVFTGECKDCAHCKSEESNMCSL-----------ADH1_ASPFL ----MSIPEMQWAQVAEQKGGP--LIYKQIPVPKPGPDEILVKVRYSGVCHTDLHALKGDW-------PLPVKMPLVGGHEGAGVVVARGDLVT--EFEIGDHAGLKWLNGSCLACEFCKQADEPLCPN-----------ADH1_EMENI ----MCIPTMQWAQVAEKVGGP--LVYKQIPVPKPGPDQILVKIRYSGVCHTDLHAMMGHW-------PIPVKMPLVGGHEGAGIVVAKGELVH--EFEIGDQAGIKWLNGSCGECEFCRQSDDPLCAR-----------ADH1_KLULA --MAASIPETQKGVIFYENGGE--LQYKDIPVPKPKANELLINVKYSGVCHTDLHAWKGDW-------PLPTKLPLVGGHEGAGVVVAMGENVKG--WKIGDFAGIKWLNGSCMSCEYCELSNESNCPE-----------ADH1_KLUMA ----MAIPETQKGVIFYEHGGE--LQYKDIPVPKPKPNELLINVKYSGVCHTDLHAWQGDW-------PLDTKLPLVGGHEGAGIVVAMGENVTG--WEIGDYAGIKWLNGSCMSCEECELSNEPNCPK-----------ADH1_YEAST -----SIPETQKGVIFYESHGK--LEHKDIPVPKPKANELLINVKYSGVCHTDLHAWHGDW-------PLPVKLPLVGGHEGAGVVVGMGENVKG--WKIGDYAGIKWLNGSCMACEYCELGNESNCPH-----------ADH1_CANAL --MSEQIPKTQKAVVFDTNGGQ--LVYKDYPVPTPKPNELLIHVKYSGVCHTDLHARKGDW-------PLATKLPLVGGHEGAGVVVGMGENVKG--WKIGDFAGIKWLNGSCMSCEFCQQGAEPNCGE-----------ADH1_PICST ----MSVPTTQKAVVFESNGGP--LLYKDIPVPTPKPNEILINVKYSGVCHTDLHAWKGDW-------PLDTKLPLVGGHEGAGVVVGIGSNVTG--WELGDYAGIKWLNGSCLNCEFCQHSDEPNCAK-----------ADH_SCHPO ----MTIPDKQLAAVFHTHGGPENVKFEEVPVAEPGQDEVLVNIKYTGVCHTDLHALQGDW-------PLPAKMPLIGGHEGAGVVVKVGAGVTR--LKIGDRVGVKWMNSSCGNCEYCMKAEETICPH-----------ADH2_EMENI -MAAPEIPKKQKAVIYDNPGTVS-TKVVELDVPEPGDNEVLINLTHSGVCHSDFGIMTNTWK----ILPFPTQPGQVGGHEGVGKVVKLGAGAEASGLKIGDRVGVKWISSACGQCPPCQDGADGLCFN-----------ADH_ALCEU ------MTAMMKAAVFVEPGRIE---LADKPIPDIGPNDALVRITTTTICGTDVH-ILKGE--------YPVAKGLTVGHEPVGIIEKLGSAVTG--YREGQRVIAGAICPNFNSYAAQDGVASQDCSYLMASGQCGCHG......... ............................................................................................................................................
120 130 140 150 160 170 180 190 200 210 220ADH1_SULSO -PRWLG----INFDG------------------AYAEYVIVPHYKYMYKLRRLNAVEAAPLT--CSGITTYRAVRKASLDPTKTLLVVGAGGGLGTMAVQIAKAVSGATIIGVDVREEAVEAAKRAGADYVINASMQ---ADH_CLOBE -GMLAGWKFSNFKDG------------------VFGEYFHVNDADMNLAILPKDMPLENAVMITDMMTSGFHGAELADIQMGSSVVVIGIG-AVGLMGIAGAKLRGAGRIIGVGSRPICVEAAKFYGATDILNYKNG---ADH_THEBR -GMLAGWKFSNVKDG------------------VFGEFFHVNDADMNLAHLPKEIPLEAAVMIPDMMTTGFHGAELADIELGATVAVLGIG-PVGLMAVAGAKLRGAGRIIAVGSRPVCVDAAKYYGATDIVNYKDG---ADH1_SOLTU -LRINTDRGVMINDGQSRFSINGKPIYHFVGTSTFSEYTVVHVGCVAKINPLAPLDKVCVLS--CGISTLGATLNVAKPTKGSSVAIFGLG-AVGLAAAEGARIAGASRIIGVDLNASRFEQAKKFGVTEFVNPKDY---ADH2_LYCES -LRINTDRGVMLNDGKSRFSINGNPIYHFVGTSTFSEYTVVHVGCVAKINPLAPLDKVCVLS--CGISTLGASLNVAKPTKGSSVAIFGLG-AVGLAAAEGARIAGASRIIGVDLNASRFEQAKKFGVTEFVNPKDY---ADH1_ASPFL -ASLSG----YTVDG------------------TFQQYAIGKATHASKLPKNVPLDAVAPVL--CAGITVYKGLKESGVRPGQTVAIVGAGGGLGSLALQYA-KAMGIRVVAIDGGEEKQAMCEQLGAEAYVDFTKT---ADH1_EMENI -AQLSG----YTVDG------------------TFQQYALGKASHASKIPAGVPVDAAAPVL--CAGITVYKGLKEAGVRPGQTVAIVGAGGGLGSLAQQYA-KAMGIRVVAVDGGDEKRAMCESLGTETYVDFTKS---ADH1_KLULA -ADLSG----YTHDG------------------SFQQYATADAVQAAKIPVGTDLAEVAPVL--CAGVTVYKALKSANLKAGDWVAISGAAGGLGSLAVQYA-KAMGYRVLGIDAGEEKAKLFKDLGGEYFIDFTKS---ADH1_KLUMA -ADLSG----YTHDG------------------SFQQYATADAVQAARIPKNVDLAEVAPIL--CAGVTVYKALKSAHIKAGDWVAISGACGGLGSLAIQYA-KAMGYRVLGIDAGDEKAKLFKELGGEYFIDFTKT---ADH1_YEAST -ADLSG----YTHDG------------------SFQQYATADAVQAAHIPQGTDLAQVAPIL--CAGITVYKALKSANLMAGHWVAISGAAGGLGSLAVQYA-KAMGYRVLGIDGGEGKEELFRSIGGEVFIDFTKE---ADH1_CANAL -ADLSG----YTHDG------------------SFEQYATADAVQAAKIPAGTDLANVAPIL--CAGVTVYKALKTADLAAGQWVAISGAGGGLGSLAVQYA-RAMGLRVVAIDGGDEKGEFVKSLGAEAYVDFTKD---ADH1_PICST -ADLSG----YTHDG------------------SFQQYATADAVQAARLPKGTDLAQAAPIL--CAGITVYKALKTAQIQPGNWVCISGAGGGLGSLAIQYA-KAMGFRVIAIDGGEEKGEFVKSLGAEAYVDFTVS---ADH_SCHPO -IQLSG----YTVDG------------------TFQHYCIANATHATIIPESVPLEVAAPIM--CAGITCYRALKESKVGPGEWICIPGAGGGLGHLAVQYA-KAMAMRVVAIDTGDDKAELVKSFGAEVFLDFKKE---ADH2_EMENI -QKVSG----YYTPG------------------TFQQYVLGPAQYVTPIPDGLPSAEAAPLL--CAGVTVYASLKRSKAQPGQWIVISGAGGGLGHLAVQIAAKGMGLRVIGVDHGS-KEELVKASGAEHFVDITKFPTGADH_ALCEU YKATAGWRFGNMIDG------------------TQAEYVLVPDAQANLTPIPDGLTDEQVLMCPDIMSTGFKGAENANIRIGHTVAVFAQG-PIGLCATAGARLCGATTIIAIDGNDHRLEIARKMGADVVLNFRNC---......... ............................................................................................................................................
230 240 250 260 270 280 290 300 310 320 330 340 347ADH1_SULSO -----DPLAEIRRITESKGVDAVIDLNNSEKTLSVYPKALAKQGKYVMVGLFGADLHYHAPLIT----LSEIQFVG-SLVGNQSDFLGIMRLAEAGK-----VKPMITKTMKLEEANEAIDNLENFKAIGRQVLIP--ADH_CLOBE -----HIVDQVMKLTNGEGVDRVIMAGGGSETLSQAVSMVKPGGIISNINYHGSGDALLIPRVEWGCGMAHKTIKGGLCPGGRLRAEMLRDMVVYNRVDL--SKLVTHVYHGFDHIEEALLLMKDKPKDLIKAVVIL-ADH_THEBR -----PIESQIMNLTEGKGVDAAIIAGGNADIMATAVKIVKPGGTIANVNYFGEGEVLPVPRLEWGCGMAHKTIKGGLCPGGRLRMERLIDLVFYKRVDP--SKLVTHVFRGFDNIEKAFMLMKDKPKDLIKPVVILAADH1_SOLTU ---SKPVQEVIAEMTDGGVDRSVECTGHIDAMISAFECVHDGWGVAVLVGVPHKEAVFKTHPMN---LLNERTLKG-TFFGNYKPRSDIPSVVEKYMNKELELEKFITHTLPFAEINKAFDLMLKGEGLRCIITMED-ADH2_LYCES ---SKPVQEVIAEMTDGGVDRSVECTGHIDAMISAFECVHDGWGVAVLVGVPHKEAVFKTHPLN---FLNERTLKG-TFFGNYKPRSDIPCVVEKYMNKELELEKFITHTLPFAEINKAFDLMLKGEGLRCIITMAD-ADH1_ASPFL ---QDLVADVKAATPEGLGAHAVILLAVAEKPFQQAAEYV-SRGTVVAIGLPAG-AFLRAPVFN--TVVRMINIKG-SYVGNRQDGVEAVDFFARGL-----IKAPFK-TAPLQDLPKIFELMEQGKIAGRYVLEIPEADH1_EMENI ---KDLVADVRHGR-GCLGAHAVILLAVSEKPFQQATEYVRSRGTIVAIGLPPD-AYLKAPVIN--TVVRMITIKG-SYVGNRQDGVEALDFFARGL-----IKAPFK-TAPLKDLPKIYELMEQGRIAGRYVLEMPEADH1_KLULA ----KNIPEEVIEAT-KGGAHGVINVSVSEFAIEQSTNYVRSNGTVVLVGLPRD-AKCKSDVFN--QVVKSISIVG-SYVGNRADTREAIDFFSRGL-----VKAPIH-VVGLSELPSIYEKMEKGAIVGRYVVDTSKADH1_KLUMA ----KDMVAEVIEAT-NGGAHAVINVSVSEAAISTSVLYTRSNGTVVLVGLPRD-AQCKSDVFN--QVVKSISIVG-SYVGNRADTREALDFFSRGL-----VKAPIK-ILGLSELASVYDKMVKGQIVGRIVVDTSKADH1_YEAST ----KDIVGAVLKAT-DGGAHGVINVSVSEAAIEASTRYVRANGTTVLVGMPAG-AKCCSDVFN--QVVKSISIVG-SYVGNRADTREALDFFARGL-----VKSPIK-VVGLSTLPEIYEKMEKGQIVGRYVVDTSKADH1_CANAL ----KDIVEAVKKAT-DGGPHGAINVSVSEKAIDQSVEYVRPLGKVVLVGLPAH-AKVTAPVFD--AVVKSIEIKG-SYVGNRKDTAEAIDFFSRGL-----IKCPIK-IVGLSDLPEVFKLMEEGKILGRYVLDTSKADH1_PICST ----KDIVKDIQTAT-DGGPHAAINVSVSEKAIAQSCQYVRSTGTVVLVGLPAG-AKVVAPVFD--AVVKSISIRG-SYVGNRADSAEAIDFFTRGL-----IKCPIK-VVGLSELPKVYELMEAGKVIGRYVVDTSKADH_SCHPO ----ADMIEAVKACT-NGGAHGTLVLSTSPKSYEQAAGFARPGSTMVTVSMPAG-AKLGADIFW--LTVKMLKICG-SHVGNRIDSIEALEYVSRGL-----VKPYYK-VQPFSTLPDVYRLMHENKIAGRIVLDLSKADH2_EMENI DKFEAISSHVKSLTTKGLGAHAVIVCTASNIAYAQSLLFLRYNGTMVCVGIPENEPQRIASAYPGLFIQKHVHVTG-SAVGNRNEAIETMEFAARGV-----IKAHFR-EEKMEALTEIFKEMEEGKLQGRVVLDLS-ADH_ALCEU ----DVVDEVMKLTG-GRGVDASIEALGTQATFEQSLRVLKPGGTLSSLGVYSSD--LTIPLSAFAAGLGDHKINTALCPGGKERMRRLINVIESGRVDL--GALVTHQYR-LDDIVAAYDLFANQRDGVLKIAIKPH.......... .........................................................................................................................................
Allineamento di 24 ADH tetrameriche
Allineamento tra il target e due template
TargetADH a 2 atomi ZnADH tetramerica
-elica
-strand
L’allineamento considera: posizioni conservate, struttura secondaria, accessibilità al solvente.
Zinco strutturale
Zinco catalitico
Dominio di legame del coenzima Dominio catalitico
Modello del monomero
Modello del tetramero
Casadio R, Martelli PL, Giordano A, Rossi M, Raia CA A low-resolution 3D model of the tetrameric alcohol dehydrogenase from Sulfolobus solfataricus Protein eng 15:215-223 (2002)
ModelloStruttura a raggi X (1JVB)
RMSD = 0.25 nm
Casadio et al, Protein eng 15:215 (March 2002) Esposito et al., JMB 318:463 (April 2002)
Conferme: la struttura della proteina è stata risolta
Carbossipeptidasi da Sulfolobus solfataricusDati sperimentali
•Contiene 1 atomi di zinco per monomero•Attiva in forma oligomerica, ignoto il numero di monomeri
Strutture presenti nella banca dati•Carbossipeptidasi a 1 atomo di zinco
1OBR (Thermoactinomyces vulgaris)
ID: 16% simmetria compatibile con
esameri
•Carbossipeptidasi a 2 atomi di zinco 1CG2 (Pseudomonas spirullum)ID: 21% simmetria compatibile con
tetrameri
1CG2:Glu178 1CG2:Asp119
1OBR:Glu72
1OBR:His691CG2:His90
1OBR:His204
1OBR
1CG2
Sovrapposizione strutturale dei domini catalitici
RMSD = 0.25 nm
-elica -strand
Allineamento tra il target e 1OBR
L’allineamento considera: leganti dello zinco, struttura secondaria, accessibilità al solvente.
Modello di CPSso basato su 1OBR
His 108
Glu 327
His 245
Zinco
Asp 109
Coordinano lo zincoAcqua
Coordina l’acqua
Allineamento tra il target e 1CG2
-elica -strand
L’allineamento considera: leganti dello zinco, struttura secondaria, accessibilità al solvente.
His 168Asp 109His 108
Glu 142
Zinco
Acqua
Modello di CPSso basato su 1CG2
Coordinano lo zinco
Coordina l’acqua
His 168Asp 109His 108
His 245Asp 109His 108
Coordinano lo zinco
Mutagenesi sitospecifica
H108A InattivoD109L InattivoH245A AttivoH168A Inattivo
Aggregati
Modello basato su 1obr
Simmetria 6-merica
Modello basato su 1cg2
Simmetria 4-merica
Diffrazione a Raggi X a Basso Angolo
Occhipinti E, Martelli PL, Spinozzi F, Corsi F, Formantici C, Molteni L, Amenitsch H, Mariani P, Tortora P, Casadio R 3D structure of Sulfolobus solfataricus carboxypeptidase developed by molecular modeling is confirmed by site-directed mutagenesis and small angle X-ray scattering Biophys J 85:1165-1175 (2003)
Conclusioni
Il modelling a bassa identità di sequenza può dare buoni risultati se tutte le informazioni disponibili (sia sperimentali che derivanti da predizioni) sono utilizzate per la scelta del template e per l’allineamento.
Queste procedure sono in gran parte ANCORA non automatiche
A low resolution 3D Model of VDAC the sequence from Neurospora crassa)
2omf_.seq/ AEIYNKDGNK VDLYGKAVGL HYFSKGNGEN SYGGNGDMTY ARLGFKGETQ 2omf_.str/ CCCCCCCCEE EEEEEEEEEE EEECCCCCCC CCCCCCCCCE EEEEEEEEEE protx.str/ *******CCC CCCCEEEEEE EEEC****** ********CE EEEEEEEECC protx.seq/ *******KGY NFGLWKLDLK TKTS****** ********SG IEFNTAGHSN 2omf_.seq/ I*NSDLTGYG QWEYNFQGNN SEGADAQTGN KTRLAFAGLK YADVGSFDYG 2omf_.str/ C*CCCEEEEE EEEEEEECCC CCCCCCCCCC EEEEEEEEEE ECCCEEEEEE protx.str/ CCCCCEEEEE EEEEEEC*** ********** EEEEEEEEEC CCCCCEEEEE protx.seq/ QESGKVFGSL ETKYKVK*** ********** DYGLTLTEKW NTDNTLFTEV 2omf_.seq/ RNYGVVYDAL GYTDMLPEFG GDTAYSDDFF VGRVGGVATY RNSNFFGLVD 2omf_.str/ ECCCCCCCCC CCCCCCCCCC CCCCCCCCCC CCCCCCEEEE EECCCCCCCC protx.str/ EEEECC**** ********** ********** **CCEEEEEE EEECCCCCCC protx.seq/ AVQDQL**** ********** ********** **LEGLKLSL EGNFAPQSGN 2omf_.seq/ GLNFAVQYLG KNER****** *********D TARRSNGDGV GGSISYEYE* 2omf_.str/ CEEEEEEEEC CCCC****** *********C CCCCCCCCEE EEEEEEEEC* protx.str/ EEEEEEEEEE EEEECCCCCC CCCCCCCEEE EEEEEEEEEE EEEEEEECCC protx.seq/ KNGKFKVAYG HENVKADSDV NIDLKGPLIN ASAVLGYQGW LAGYQTAFDT 2omf_.seq/ **GFGIVGAY GAADRTNLQE AQPLGNGKKA EQWATGLKYD ANNIYLAANY 2omf_.str/ **CEEEEEEE EEEECCCCCC CCCCCCCCEE EEEEEEEEEE ECCEEEEEEE protx.str/ CCEEEEEEEE EEEEEEEEEE EEECCCCCCC EEEEEEEEEE CEEEEEEEEE protx.seq/ QQSKLTTNNF ALGYTTKDFV LHTAVNDGQE FSGSIFQRTS DKLDVGVQLS 2omf_.seq/ GETRNATPIT NKFTNTSGFA NKTQDVLLVA QYQFDFGLRP SIAYTKSKAK 2omf_.str/ EEEECCCCCC CCCCCCCCCC CEEEEEEEEE EEECCCCEEE EEEEEEEEEE protx.str/ EEECC***** ********** *CCCEEEEEE EEECCCCEEE EEEEEEC*** protx.seq/ WASGT***** ********** *SNTKFAIGA KYQLDDDARV RAKVNNA*** 2omf_.seq/ DVEGIGDVDL VNYFEVGATY YFNKNMSTYV DYIINQIDSD NKLGVGSDDT 2omf_.str/ CCCCCCCEEE EEEEEEEEEE ECCCCEEEEE EEEEECCCCC CCCCCCCCCE protx.str/ *********E EEEEEEEEEE EC***EEEEE EEEEECCC** *****CCCCE protx.seq/ *********S QVGLGYQQKL RT***GVTLT LSTLVDGK** *****NFNAG 2omf_.seq/ VAVGIVYQF* *** 2omf_.str/ EEEEEEEEE* *** protx.str/ EEEEEEEEEE EC* protx.seq/ GHKIGVGLEL EA*
Structural alignment of VDAC with the template
Prediction with HMM
A low resolution 3D model of VDAC:location of mutated residues
Casadio et al., FEBS Lett 520:1-7 (2002)
Threading Thread the Sequence ….ACDGGTKLMAG…… into
Model 1
Model 2
Model 3
Score 1
Score 2
Score 3
The best scoring model is chosen as candidate fold for the sequence
TOPITS (PredictProtein) Burkhard Rost (Columbia Univ.)http://cubic.bioc.columbia.edu/predictprotein/
FRSVR David Eisenberg (UCLA)http://fold.doe-mbi.ucla.edu/
3DPSSM Michael Sternberg (Imperial Cancer Res. Fund)http://www.sbg.bio.ic.ac.uk/~3dpssm/
GenTHREADER David Jones (Brunel Univ.)http://bioinf.cs.ucl.ac.uk/psipred/
THREADING SERVERSTHREADING SERVERS
FoRc
HoMo
1D
….the art of being humble
Ab initio methods:
•Knowledge based potentials
•Contact map predictions
Prediction of Contact MapsPrediction of Contact Maps
Contact definition
F 297
F 156 V 299
V 271
I 240V 238
I 269
Contact definition:
•C-C distance < 0.8 nm
•Sequence gap > 7 residues
From 3D Structure
F 297
F 156 V 299
V 271
I 240V 238
I 269
Computation of Contact MapsComputation of Contact Maps
To Contact MapTTCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYANT
TCCPSIVARSNFNVCRLPGTPEAICATYTGCIIIPGATCPGDYAN
RMSD = 2.5 Å
N
C
Contact mapMARC
1QHJ (1.9 Å)
We can build the correct structure from the correct contact map
Model
(A) An alignment of 5 (hypothetical) sequences as they are represented in a HSSP file (Sander and Schneider, 1991). i and j stand for the positions of the two residues making or not making contact (A and D in the leading sequence or sequence 1). (B) Single sequence coding. The position representing the couple (AD) in the vector is set to 1.0 while the other positions are set to 0. (C) Multiple sequence coding. For each sequence in the alignment (1 to 5 in the scheme in A) a couple of residues in position i and j is counted. The final input coding representing the frequency of each couple in the alignment is normalized to the number of the sequences
Representation of the input coding based on ordered couples.
The neural network architecture for prediction of contact maps
T0087: 310 residues A=20 % (FR/NF)
N
C
T0110: 128 residues A=30% (NF)
N
C
EVA
Evaluation of Automatic protein structure prediction
[ Burkhard Rost, Andrej Sali, http://maple.bioc.columbia.edu/eva/ ]
CASPCommunity Wide Experiment on the Critical Assessment of Techniques for Protein Structure Predictionhttp://PredictionCenter.llnl.gov/casp5/
3D - Crunch
Very Large Scale Protein Modelling Project
http://www.expasy.org/swissmod/SM_LikelyPrecision.html
Model Accuracy Evaluation
Bioinformatics I
Protein Structure Resources
PDB http://www.pdb.orgPDB – Protein Data Bank of experimentally solved structures (RCSB)
CATH http://www.biochem.ucl.ac.uk/bsm/cath/Hierarchical classification of protein domain structures
SCOP http://scop.mrc-lmb.cam.ac.uk/scop/Alexey Murzin’s Structural Classification of proteins
DALI http://www2.ebi.ac.uk/dali/Lisa Holm and Chris Sander’s protein structure comparison server
SS-Prediction and Fold Recognition
PHD http://cubic.bioc.columbia.edu/predictprotein/Burkhard Rost’s Secondary Structure and Solvent Accessibility Prediction Server
3DPSSM http://www.sbg.bio.ic.ac.uk/~3dpssm/ Fold Recognition Server using 1D and 3D Sequence Profiles coupled with Secondary Structure and Solvation Potential Information.
Bioinformatics I
Protein Homology Modeling Resources
SWISS MODEL: http://www.expasy.ch/swissmod/
Deep View - SPDBV:homepage: http://www.expasy.ch/spdbv/Tutorials http://www.usm.maine.edu/~rhodes/SPVTut/
http://www.bbsrc.ac.uk/molbiol/
WhatIf http://www.cmbi.kun.nl/whatif/Gert Vriend’s protein structure modeling analysis program WhatIf
Modeller: http://guitar.rockefeller.edu/modeller/Andrej Sali's homology protein structure modelling by satisfaction of spatial restraints
FAMS: http://physchem.pharm.kitasato-u.ac.jp/FAMS/fams.htmlFull Automatic Modelling System (FAMS); Kitasato University; Tokyo, Japan
3D-JIGSAW: http://www.bmm.icnet.uk/people/paulb/3dj/form.htmlComparative Modelling Server; Imperial Cancer Research Fund; London, UK
CPHmodels: http://www.cbs.dtu.dk/services/CPHmodels/Centre for Biological Sequence Analysis; The Technical University of Denmark; Denmark
SDSC1: http://cl.sdsc.edu/hm.htmlSDSC Structure Homology Modelling Server; San Diego Supercomputing Centre
Bioinformatics I