contest beta test of bioinformatics toolbox in matlab hidden markov model for profile analysis of...

13
Contest Beta Test of Bioin formatics Toolbox in Matla b Hidden Markov Model for profile a nalysis of GPCR sequences ShannChing Chen

Upload: todd-thompson

Post on 17-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

PFAM Database HMM Profile of GPCR Class C

TRANSCRIPT

Page 1: Contest Beta Test of Bioinformatics Toolbox in Matlab Hidden Markov Model for profile analysis of GPCR sequences ShannChing Chen

Contest Beta Test of Bioinformatics Toolbox in Matlab

Hidden Markov Model for profile analysis of GPCR sequences

ShannChing Chen

Page 2: Contest Beta Test of Bioinformatics Toolbox in Matlab Hidden Markov Model for profile analysis of GPCR sequences ShannChing Chen

GPCR Database

Class A Rhodopsin like Class B Secretin like Class C Metabotropic glutamate / pheromone Class D Fungal pheromone Class E cAMP receptors (Dictyostelium) Frizzled/Smoothened family

Page 3: Contest Beta Test of Bioinformatics Toolbox in Matlab Hidden Markov Model for profile analysis of GPCR sequences ShannChing Chen

PFAM DatabaseHMM Profile of GPCR Class C

Page 4: Contest Beta Test of Bioinformatics Toolbox in Matlab Hidden Markov Model for profile analysis of GPCR sequences ShannChing Chen

Test Data Set

Class A Rhodopsin like Class B Secretin like Class C Metabotropic glutamate / pheromone

GPCR PFAM

Class A 1122 64

Class B 217 37

Class C 131 31

Page 5: Contest Beta Test of Bioinformatics Toolbox in Matlab Hidden Markov Model for profile analysis of GPCR sequences ShannChing Chen

Commands gethmmprof

% get PFAM HMM profile

gethmmalignment% get sequences used in the PFAM HMM profile

hmmprofestimate% estimate parameters of a profile HMM

getgenpept% retrieves sequence information from the NCBI GenPept database

Page 6: Contest Beta Test of Bioinformatics Toolbox in Matlab Hidden Markov Model for profile analysis of GPCR sequences ShannChing Chen

Some Problemssome sequences cannot by retrieved by getgenpept(ACCESSNUM)

>> getgenpept('Q93564')

Imply database

inconsistence

GPCRDB/SWISSPROT

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Search&db=protein&term=Q93564&dopt=GenBank&mode=file

Suggestion:

getswissport(ACCESSNUM)

??? Error using ==> /usr/local/matlab6p5/toolbox/bioinfo/bioinfo/private/getncbidataCan not interpret NCBI url data.

Error in ==> /usr/local/matlab6p5/toolbox/bioinfo/bioinfo/getgenpept.mOn line 33 ==> gbout = getncbidata(accessnum,'database','protein',varargin{:});getncbidata(accessnum,'database','protein',varargin{:});

Page 7: Contest Beta Test of Bioinformatics Toolbox in Matlab Hidden Markov Model for profile analysis of GPCR sequences ShannChing Chen

Some sequences are missing in the tests

GPCR PFAMClass A 1122 64Class B 217 37Class C 131 31

Class C131 31

Page 8: Contest Beta Test of Bioinformatics Toolbox in Matlab Hidden Markov Model for profile analysis of GPCR sequences ShannChing Chen

Some sequences are missing in the tests

GPCR PFAM MY InterSection UnionClass A 1122 64 1112 60 1116Class B 217 37 88 25 100Class C 131 31 34 12 53

Class C131 31

34

Page 9: Contest Beta Test of Bioinformatics Toolbox in Matlab Hidden Markov Model for profile analysis of GPCR sequences ShannChing Chen

Some sequences are missing in the tests

GPCR PFAM MY Intersection UnionClass A 1122 64 1112 60 1116Class B 217 37 88 25 100Class C 131 31 34 12 53

Class C131 31

3412

Page 10: Contest Beta Test of Bioinformatics Toolbox in Matlab Hidden Markov Model for profile analysis of GPCR sequences ShannChing Chen

Pfam Union My Intersection Q93564/579-850 392.064735 347.395797 264.384048 214.767834 MGR1_RAT/592-845 430.324765 490.156635 507.207180 366.073191 MGR5_RAT/578-831 454.253026 488.499817 504.205146 382.278028 MGR4_HUMAN/587-852 462.158205 534.799902 569.530580 436.156992 MGR7_RAT/590-855 456.809927 537.349099 576.424828 436.974014 MGR6_RAT/579-844 441.637205 508.217074 539.677731 415.271713 MGR2_HUMAN/567-824 474.183794 499.470152 514.069902 406.403925 MGR3_RAT/576-833 474.823802 503.952638 523.120933 403.944131 MGR_DROME/626-881 463.670698 434.844043 447.045332 383.043601 MGR1_CAEEL/681-934 431.536823 385.918002 390.459708 346.138481 O73636/604-859 509.878222 449.879078 232.315669 160.114672 O93552/570-825 499.175160 441.637844 207.251644 146.385214 O93553/585-840 535.782160 477.453567 252.858455 174.934421 O73638/590-844 505.409106 449.807293 253.060067 180.272999 O73639/613-868 498.097160 441.382544 238.116480 164.883093 O73640/609-867 490.355389 430.260647 225.572317 154.554065 O73637/600-856 479.058648 416.663951 230.852568 160.374462 CASR_HUMAN/612-867 477.146160 495.730963 435.172173 261.112868 O35268/595-850 493.412160 418.848716 190.366365 140.977472 O35266/504-757 498.585902 425.435595 192.551769 147.385798 O35189/588-839 484.897441 410.564569 177.347350 137.381278 O35267/404-655 478.584441 404.729878 178.483848 131.359061 O35265/275-530 481.888222 406.476280 180.506104 139.005235 O35269/510-762 476.080804 407.558095 181.848329 138.513683 O70409/593-848 485.920160 399.185334 201.261584 144.217952 O70410/620-874 419.333919 331.677518 172.271933 115.508341 Q20073/1026-1271 283.524022 146.811483 41.312498 29.265243 O75205/55-302 250.699080 103.702354 -1.632265 10.306800 O96954/137-418 313.661421 165.987772 75.331302 78.768892 GBR2_RAT/481-753 352.997624 279.991531 303.315709 241.765650 GBR1_HUMAN/593-867 310.464308 286.723440 324.437816 222.652110 BOSS_DROME -42.948487 109.350882 116.872645 -26.158734 BOSS_DROVI -50.399883 117.725196 134.150758 -15.832355 CASR_BOVIN 469.962081 493.745459 434.163431 256.921606 CASR_MOUSE 472.270365 494.962668 435.198261 257.240773 CASR_RAT 471.901365 494.982860 435.450371 257.396804 GBR1_MOUSE 307.748882 288.446423 329.215963 223.455430 GBR1_RAT 265.090364 260.119139 300.878203 187.018309 GBR2_HUMAN 355.852179 281.269532 305.503043 242.184040 MGR1_HUMAN 428.469326 489.348371 505.671282 363.345798 MGR1_MOUSE 426.393159 486.225029 503.275573 362.141585 MGR2_RAT 477.043532 503.908054 518.100194 407.338969 MGR3_HUMAN 465.240190 500.331401 519.156127 396.403286 MGR3_MOUSE 472.056956 501.604222 520.899501 401.373438 MGR4_RAT 462.028808 534.771011 568.913938 434.143238 MGR5_HUMAN 451.654868 485.449344 500.737734 378.856552 MGR6_HUMAN 429.971744 505.674784 539.636493 406.840972 MGR7_HUMAN 454.131180 534.670351 573.746080 434.295267 MGR8_HUMAN 458.613275 556.385737 596.067761 432.862807 MGR8_MOUSE 460.825275 559.364026 599.715567 435.011221 MGR8_RAT 460.940275 560.014303 600.531703 435.748855 Q90WL6 450.195335 476.755926 417.802864 243.027962 Q90ZF3 430.562323 490.551502 506.874414 364.893734

131

31

34

12

Class C

Pfam Union OAJ1_HUMAN/52-300 -186.612063 -112.162229 OL15_MOUSE/41-290 -178.345641 -119.201224 OLF6_RAT/44-293 -195.415970 -101.239046 OLF1_CHICK/41-290 -182.367313 -110.731995 GU27_RAT/22-271 -187.321125 -101.499059 MRGF_RAT/61-291 -198.150421 -117.267656 OPSB_HUMAN/51-303 -187.612752 -101.159913 OPS3_DROME/75-338 -211.224282 -133.483526 OPSD_LOLFO/51-315 -205.895209 -117.262574 OPS1_DROME/67-329 -209.269114 -111.675883 V2R_HUMAN/54-325 -203.950419 -126.516322 FSHR_BOVIN/379-626 -181.742988 -102.965039 TRFR_HUMAN/42-320 -168.667767 -108.577338 NTR1_HUMAN/80-364 -209.554125 -132.159324 NY1R_HUMAN/57-320 -193.652723 -116.647730 GLP2_RAT/175-443 -205.422264 -125.760343 GLR_HUMAN/138-407 -180.718411 -97.289324 GIPR_HUMAN/134-399 -203.089354 -120.706433 GLP1_RAT/141-409 -198.857712 -112.053221 CALR_PIG/146-415 -191.393060 -107.696115 CALR_RAT/145-435 -198.771887 -119.516172 CGRR_HUMAN/138-391 -193.500975 -107.340185 DIHR_ACHDO/130-393 -181.314215 -102.871218 DIHR_MANSE/83-351 -192.033661 -111.663686 CRF2_XENLA/115-368 -167.516292 -116.302817 CRF1_RAT/116-370 -176.533708 -106.044739 YQ44_CAEEL/267-539 -210.951110 -111.353016 YOW3_CAEEL/219-477 -177.623461 -102.753965 O95966/400-659 -159.442058 -95.656561 Q10922/771-1017 -148.153491 -91.036382

ClassA

ClassB

Class C

Page 11: Contest Beta Test of Bioinformatics Toolbox in Matlab Hidden Markov Model for profile analysis of GPCR sequences ShannChing Chen

Other Problems/Suggestions hmmprofalign cannot align sequences with

wobbling amino acids (Ex. X, B, Z…) Preprocessing is needed before aligning the sequence

with the profile. Modify ‘hmmprofalign’ to make it more flexible

In Class C,align sequence GBR1_HUMAN with PFAM HMM Profile

'GBR1_HUMAN/593-867'

---SVSVLSSLG-[…]VPKMRRLITRGEWQLFISVSVLSSLG-[…]VPKMRRLITRGEWQ'GBR1_HUMAN/1-961'

Page 12: Contest Beta Test of Bioinformatics Toolbox in Matlab Hidden Markov Model for profile analysis of GPCR sequences ShannChing Chen

Summary

Perform Hidden Markov Model for profile analysis of GPCR sequences in ClassA, ClassB and ClassC

Suggestions about ‘getswissport’ and ‘hmmprofalign’

Not sure about how to determine the representative sets to train the HMM profile, although the HMM profile works pretty well

Page 13: Contest Beta Test of Bioinformatics Toolbox in Matlab Hidden Markov Model for profile analysis of GPCR sequences ShannChing Chen

PFAM DatabaseHMM Profile of GPCR Class C