ncbi fieldguide ncbi molecular biology resources a field guide part 2 august 2-3, 2005

NCBI Molecular Biology Resources

A Field Guidepart 2

August 2-3, 2005

Access

Entrez

Sequence

Structure

eWhy do we need similarity searching?

To identify and annotate sequences with…• incomplete (or no) annotations (GenBank)• incorrect annotations

To assemble genomes To explore evolutionary relationships by…

• finding homologous molecules

• developing phylogenetic trees NOTE: Similar sequences may NOT have similar function!

Searching with Sequences

eBasic Local Alignment Search

• Widely used similarity search tool• Heuristic approach based on Smith Waterman algorithm• Finds best local alignments• Provides statistical significance• All combinations (DNA/Protein) query and database.

– DNA vs DNA

– DNA translation vs Protein

– Protein vs Protein

– Protein vs DNA translation

– DNA translation vs DNA translation

• www, standalone, and network clients

Global vs Local AlignmentSeq 1

Global alignment

Local alignment

eGlobal vs. Local Alignment

Human: 15 IAKYNFHGTAEQDLPFCKGDVLTIVAVTKDPNWYKAKNKVGREGIIPANYVQKREGVKAGTKLSLMPWFH 84 +A + + + DL F K D+L I+ T+ W+ GR G IP+NYV + + +++ PW+ Worm: 63 VALFQYDARTDDDLSFKKDDILEILNDTQGDWWFARHKATGRTGYIPSNYVAREKSIES------QPWYF 125

Human: 85 GKITREQAERLLYPP--ETGLFLVRESTNYPGDYTLCVSCDGKVEHYRI-MYHASKLSIDEEVYFENLMQ 151 GK+ R AE+ L E G FLVR+S + D +L V + V+HYRI + H I F L Worm: 126 GKMRRIDAEKCLLHTLNEHGAFLVRDSESRQHDLSLSVRENDSVKHYRIQLDHGGYF-IARRRPFATLHD 194

Human: 152 LVEHYTSDADGLCTRLIKPKVMEGTVAAQDEFYRSGWALNMKELKLLQTIGKGEFGDVMLGDYRGN-KVA 220 L+ HY +ADGLC L P Y W ++ + ++L++ IG G+FG+V G + N VAWorm: 195 LIAHYQREADGLCVNLGAPCAKSEAPQTTTFTYDDQWEVDRRSVRLIRQIGAGQFGEVWEGRWNVNVPVA 264

Human: 221 VKCIK-NDATAQAFLAEASVMTQLRHSNLVQLLGVIVEEKGGLYIVTEYMAKGSLVDYLRSRGRSVLGGD 289 VK +K A FLAEA +M +LRH L+ L V ++ + IVTE M + +L+ +L+ RGR Worm: 265 VKKLKAGTADPTDFLAEAQIMKKLRHPKLLSLYAVCTRDE-PILIVTELMQE-NLLTFLQRRGRQCQMPQ 332 Human: 290 CLLKFSLDVCEAMEYLEGNNFVHRDLAARNVLVSEDNVAKVSDFGLT----KEASSTQDTG-KLPVKWTA 353 L++ S V M YLE NF+HRDLAARN+L++ K++DFGL KE TG + P+KWTA Worm: 333 -LVEISAQVAAGMAYLEEMNFIHRDLAARNILINNSLSVKIADFGLARILMKENEYEARTGARFPIKWTA 401

Human: 354 PEALREKKFSTKSDVWSFGILLWEIYSFGRVPYPRIPLKDVVPRVEKGYKMDAPDGCPPAVYEVMKNCWH 423 PEA +F+TKSDVWSFGILL EI +FGR+PYP + +V+ +V+ GY+M P GCP +Y++M+ CW Worm: 402 PEAANYNRFTTKSDVWSFGILLTEIVTFGRLPYPGMTNAEVLQQVDAGYRMPCPAGCPVTLYDIMQQCWR 471

Human: 424 LDAAMRPSFLQLREQLEHI 443 D RP+F L+ +LE +Worm: 472 SDPDKRPTFETLQWKLEDL 492

human M--------------SAIQ----------------------AAWPSGT------------ECIAKYNFHG M S .. AA SG. . .A ... .worm MGSCIGKEDPPPGATSPVHTSSTLGRESLPSHPRIPSIGPIAASSSGNTIDKNQNISQSANFVALFQYDA 1 20 40 60

440 450human REQLEHI--------KTHELHL . .:: . : ...worm QWKLEDLFNLDSSEYKEASINF 500

Align program (Lipman and Pearson)

BLASTp

Nucleotide WordsGTACTGGACATGGACCCTACAGGAAQuery:

Word Size = 11GTACTGGACAT

TACTGGACATG

ACTGGACATGG

CTGGACATGGA

TGGACATGGAC

GGACATGGACC

GACATGGACCC

ACATGGACCCT

...........

Make a lookuptable of words

Minimum word size = 7blastn default = 11megablast default = 28

Protein WordsGTQITVEDLFYNIATRRKALKNQuery:

Word Size = 3

Neighborhood Words

LTV, MTV, ISV, LSV, etc.

Make a lookuptable of words

Word Size can be 2 or 3 (default = 3)

Initial Matches and Extensions

Protein BLAST requires two neighboring matches within 40 aa

GTQITVEDLFYNI

<---- SEI YYN ---->

ATCGCCATGCTTAATTGGGCTT

<--- CATGCTTAATT ----->

neighborhood words

exact word match one match

two matches

Nucleotide BLAST requires one exact match

An alignment that BLAST can’t find

1 GAATATATGAAGACCAAGATTGCAGTCCTGCTGGCCTGAACCACGCTATTCTTGCTGTTG || | || || || | || || || || | ||| |||||| | | || | ||| |

1 GAGTGTACGATGAGCCCGAGTGTAGCAGTGAAGATCTGGACCACGGTGTACTCGTTGTCG

61 GTTACGGAACCGAGAATGGTAAAGACTACTGGATCATTAAGAACTCCTGGGGAGCCAGTT

| || || || ||| || | |||||| || | |||||| ||||| | |

61 GCTATGGTGTTAAGGGTGGGAAGAAGTACTGGCTCGTCAAGAACAGCTGGGCTGAATCCT

121 GGGGTGAACAAGGTTATTTCAGGCTTGCTCGTGGTAAAAAC

|||| || ||||| || || | | |||| || |||

121 GGGGAGACCAAGGCTACATCCTTATGTCCCGTGACAACAAC

An Alignment BLAST Can Make

Solution: compare protein sequences; BLASTX

Score = 290 bits (741), Expect = 7e-77Identities = 147/331 (44%), Positives = 206/331 (61%), Gaps = 8/331 (2%)Frame = +3

BLAST 2 Sequences (blastx) output:

Scoring Systems - Nucleotides

A G C T

A +1 –3 –3 -3

G –3 +1 –3 -3

C –3 –3 +1 -3

T –3 –3 –3 +1

Identity matrix

CAGGTAGCAAGCTTGCATGTCA

|| |||||||||||| ||||| raw score = 19-9 = 10

CACGTAGCAAGCTTG-GTGTCA

Scoring Systems - ProteinsPosition Independent Matrices

PAM Matrices (Percent Accepted Mutation)• Derived from observation; small dataset of alignments• Implicit model of evolution• All calculated from PAM1• PAM250 widely used

BLOSUM Matrices (BLOck SUbstitution Matrices)• Derived from observation; large dataset of highly conserved blocks• Each matrix derived separately from blocks with a defined percent identity cutoff• BLOSUM62 - default matrix for BLAST

Position Specific Score Matrices (PSSMs)

PSI- and RPS-BLAST

A 4R -1 5 N -2 0 6D -2 -2 1 6C 0 -3 -3 -3 9Q -1 1 0 0 -3 5E -1 0 0 2 -4 2 5G 0 -2 0 -1 -3 -2 -2 6H -2 0 1 -1 -3 0 0 -2 8I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4X 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1 A R N D C Q E G H I L K M F P S T W Y V X

BLOSUM62

Common amino acids have low weights

Rare amino acids have high weights

Negative for less likely substitutions

Positive for more likely substitutions

Gapped Alignments

• Gapping provides more biologically realistic alignments• Statistical behavior is not completely understood for gapped alignments• Gapped BLAST parameters must be found by simulations for each matrix

• Affine gap costs = -(a+bk)a = gap open penalty b = gap extend penaltyA gap of length 1 receives the score -(a+b)

Scores

V D S – C Y

V E T L C F

BLOSUM62 +4 +2 +1 -12 +9 +3 7

PAM30 +7 +2 0 -10 +10 +2 11

Simply add the scores for each pair of aligned residues

Different matrices produce different scores!

Local Alignment StatisticsHigh scores of local alignments between two random sequencesfollow the Extreme Value Distribution

(applies to ungapped alignments)

E = Kmne-S E = mn2-S’

K = scale for search space = scale for scoring system S’ = bitscore = (S - lnK)/ln2

Expect ValueE = number of database hits you expect to find by

chancesize of database

your score

expected number of

random hits

Advanced BLAST Options: Nucleotide

Example Entrez Queriesnucleotide all[Filter] NOT mammalia[Organism]green plants[Organism]biomol mrna[Properties]gbdiv est[Properties] AND rat[organism]

Other Advanced–e 10000 expect value-v 2000 descriptions-b 2000 alignments

eAdvanced BLAST Options: Protein

Matrix Selection•PAM30 -- most stringent•BLOSUM45 -- least stringent

Example Entrez Queriesproteins all[Filter] NOT mammalia[Organism]green plants[Organism]srcdb refseq[Properties]Other Advanced–e 10000 expect value-v 2000 descriptions-b 2000 alignments

Limit by taxonMus musculus[Organism]Mammalia[Organism]Viridiplantae[Organism]

sp|P27476|NSR1_YEAST NUCLEAR LOCALIZATION SEQUENCE BINDING PROTEIN (P67) Length = 414 Score = 40.2 bits (92), Expect = 0.013 Identities = 35/131 (26%), Positives = 56/131 (42%), Gaps = 4/131 (3%)

Query: 362 STTSLTSSSTSGSSDKVYAHQMVRTDSREQKLDAFLQPLSKPLS---SQPQAIVTEDKTD 418 S++S SSS+S SS + + ++S + + S S S+ + E K Sbjct: 29 SSSSSESSSSSSSSSESESESESESESSSSSSSSDSESSSSSSSDSESEAETKKEESKDS 88

FilteredUnfiltered

Low Complexity Filtering

>gi|20140146|sp|Q96RF0|SNXI_HUMAN Sorting nexin 18 Length = 628

Score = 1048 bits (2710), Expect = 0.0 Identities = 528/628 (84%), Positives = 528/628 (84%)

Query: 1 MALRARALYDFRSENPGEISLREHEVLSLCSEQDIEGWLEGVNSRGDRGLFPASYVQVIR 60 MALRARALYDFRSENPGEISLREHEVLSLCSEQDIEGWLEGVNSRGDRGLFPASYVQVIRSbjct: 1 MALRARALYDFRSENPGEISLREHEVLSLCSEQDIEGWLEGVNSRGDRGLFPASYVQVIR 60

Query: 61 XXXXXXXXXXXXXXXXXXXNVPPGGFEXXXXXXXXXXXXXXXXXXXXXXXXXXXXXSTFQ 120 NVPPGGFE STFQSbjct: 61 APEPGPAGDGGPGAPARYANVPPGGFEPLPVAPPASFKPPPDAFQALLQPQQAPPPSTFQ 120

Low Complexity Filter

low complexity sequence

Neighbors: Precomputed BLAST

Nucleotide

Protein

Entrez Related Sequences produces a list of sequences sorted by BLAST score, but with no alignment details.

eBlink – Protein BLAST Alignments

• Lists only 200 hits • List is nonredundant

Blink – Best Hits

eMegablast: NCBI’s Genome

Annotator

• Long alignments of similar DNA

sequences

• Greedy algorithm

• Concatenation of query sequences

• Faster than blastn; less sensitive

eMegaBLAS

AI217550AI251192AI254381BE645079

C:\seq\hs.4.fsa

> 1133045 gnl|UG|Hs#S1133045 qd43b11.x1 Homo sapiens cDNA, 3' end CATGTAAGCCATTTATTGGTTTGTTTTAAAAATATGTATTTTATTTATACATGAAGTTTGGTGAGAAGTGCTCGATTAGTTCAGACAACATCTGGCACTTGATGTCTGTCCTTCCCTCCTTTTTCCTACTCTCTTCTCCCCTCCTGCTGGTCATTGTGCAGTTCTGGAAATTAAAAAGGTGACAGCCAGGCTAAAAGCTAAGGGTTGGGTCTAGCTCACCTCCCACCCCCAACCACACCGTCTGCAGCCAGCCCCAGGCACCTGTCTCAAAGCTCCCGGGCTGTCCACACACACAAAAACCACAGTCTCCTTCCGGCCAGCTGGGCTGGCAGCCCGACCTGC> 1141828 gnl|UG|Hs#S1141828 qv37f11.x1 Homo sapiens cDNA, 3' end GAGAAGACGACAGAAGGGGAGAAGAGAGTAGGAAAAAGGAGGGAAGGACAGACATCAAGTGCCAGATGTTGTCTGAACTAATCGAGCACTTCTCACCAAACTTCATGTATAAATAAAATACATATTTTTAAAACAAACCAATAAATGGCTTACATCAAAAAAAAAAAAAAAAAAAAAAAAGTCGTATCGATGT> 1145899 gnl|UG|Hs#S1145899 qv33c06.x1 Homo sapiens cDNA, 3' endGAGAAGACGACAGAAGGGGAGAAGAGAGTAGGAAAAAGGAGGGAAGGACAGACATCAAGTGCCAGATGTTGTCTGAACTAATCGAGCACTTCTCACCAAACTTCATGTATAAATAAAATACATATTTTTAAAACAAACCAATAAATGGCTTACATCAAAAAAAAAAAAAAAAAAAAAAAAGTCGTATCGATGT> 2291670 gnl|UG|Hs#S2291670 7e65f04.x1 Homo sapiens cDNA, 3' end TTTCATGTAAGCCATTTATTGGTTTGTTTTAAAAATATGTATTTTATTTATACATGAAGTTTGGTGAGAAGTGCTCGATTAGTTCAAACAACATCTGGCACTTGATGTCTGTCCTTCCCTCCTTTTTCCTACTCTCTTCTCCCCTCCTGCTGGTCATTGTGCAGTTCTGGAAATTAAAAAGGTGACAGCCAGGCTAAAAGCTAAGGGTTGGGTCTAGCTCACCTCCCACCCCCAACCACACCGTCTGCAGCCAGCCCCAGGCACCTGTCTCAAAGCTCCCGGGCTGTCCACACACACAAAAACCACAGTCTCCTTCCGGCCAGCTGGGCTGGCAGCCCGACCTGCCTCCCAACCGCATTCCTGCCTGTGTAGCAGGCGGTGAGCACCCAGAAGGGGCACATACCTCTCCAAGCCTTGAAAGCAAAGCATGGAGATCTACAAAAATAGGATTTCCACTTGGAGAAATGTCGCTGGGACAGT

Discontiguous Megablast

• Uses discontiguous word matches

• Better for cross-species comparisons

eTemplates for Discontiguous

MegaBLAST

W = 11, t = 16, coding: 1101101101101101W = 11, t = 16, non-coding: 1110010110110111W = 12, t = 16, coding: 1111101101101101W = 12, t = 16, non-coding: 1110110110110111W = 11, t = 18, coding: 101101100101101101W = 11, t = 18, non-coding: 111010010110010111W = 12, t = 18, coding: 101101101101101101W = 12, t = 18, non-coding: 111010110010110111W = 11, t = 21, coding: 100101100101100101101W = 11, t = 21, non-coding: 111010010100010010111W = 12, t = 21, coding: 100101101101100101101W = 12, t = 21, non-coding: 111010010110010010111

Ma, B., Tromp, J., Li, M., "PatternHunter: faster and more sensitive homology search", Bioinformatics 2002 Mar;18(3):440-5

Nucleotide vs. Protein BLAST

aaccgggtgacggtggtgctcggtgcgcagtggggcgacgaaggc

Human: N R V T V V L G A Q W G D E G

+ + V + V L G Q W G D E G

A.th.: S Q V S G V L G C Q W G D E G

agtcaagtatctggtgtactcggttgccaatggggagatgaaggt

Comparing ADSS from H. sapiens and A. thaliana

BLASTp finds three matching words

BLASTn finds no match, because there are no 7 bp words

Protein searches are generally more sensitive than nucleotide searches.

Translated BLAST

Query DatabaseProgram

N Pucleotide rotein

blastx

tblastn

tblastx

PPPP P P

PPPP P P PPPP P P

PPPP P PParticularly useful for nucleotide sequences withoutprotein annotations, such as ESTs or genomic DNA

Genomic BLAST

• These pages provide customized nucleotide and protein databases for each genome• If a Map Viewer is available, the BLAST hits can be viewed on the maps

BLAST the Chicken Genome

Program

Accession for human TPO mRNA

BLAST Hit on the Genome

BLASTn Hit on the Map Viewer

TBLASTN Results Using NP_000538

eLinking Protein Sequence,

Structure, and Function

sequence function (pfam, smart)

Structure

PSI-BLASTRPS-BLAST

BLASTp sequence structure

structure structure

sequence structure + function (cd)

Position Specific Substitution Rates

Active site serineWeakly conserved serine

ePosition Specific Score Matrix

(PSSM)

A R N D C Q E G H I L K M F P S T W Y V 206 D 0 -2 0 2 -4 2 4 -4 -3 -5 -4 0 -2 -6 1 0 -1 -6 -4 -1 207 G -2 -1 0 -2 -4 -3 -3 6 -4 -5 -5 0 -2 -3 -2 -2 -1 0 -6 -5 208 V -1 1 -3 -3 -5 -1 -2 6 -1 -4 -5 1 -5 -6 -4 0 -2 -6 -4 -2 209 I -3 3 -3 -4 -6 0 -1 -4 -1 2 -4 6 -2 -5 -5 -3 0 -1 -4 0 210 S -2 -5 0 8 -5 -3 -2 -1 -4 -7 -6 -4 -6 -7 -5 1 -3 -7 -5 -6 211 S 4 -4 -4 -4 -4 -1 -4 -2 -3 -3 -5 -4 -4 -5 -1 4 3 -6 -5 -3 212 C -4 -7 -6 -7 12 -7 -7 -5 -6 -5 -5 -7 -5 0 -7 -4 -4 -5 0 -4 213 N -2 0 2 -1 -6 7 0 -2 0 -6 -4 2 0 -2 -5 -1 -3 -3 -4 -3 214 G -2 -3 -3 -4 -4 -4 -5 7 -4 -7 -7 -5 -4 -4 -6 -3 -5 -6 -6 -6 215 D -5 -5 -2 9 -7 -4 -1 -5 -5 -7 -7 -4 -7 -7 -5 -4 -4 -8 -7 -7 216 S -2 -4 -2 -4 -4 -3 -3 -3 -4 -6 -6 -3 -5 -6 -4 7 -2 -6 -5 -5 217 G -3 -6 -4 -5 -6 -5 -6 8 -6 -8 -7 -5 -6 -7 -6 -4 -5 -6 -7 -7 218 G -3 -6 -4 -5 -6 -5 -6 8 -6 -7 -7 -5 -6 -7 -6 -2 -4 -6 -7 -7 219 P -2 -6 -6 -5 -6 -5 -5 -6 -6 -6 -7 -4 -6 -7 9 -4 -4 -7 -7 -6 220 L -4 -6 -7 -7 -5 -5 -6 -7 0 -1 6 -6 1 0 -6 -6 -5 -5 -4 0 221 N -1 -6 0 -6 -4 -4 -6 -6 -1 3 0 -5 4 -3 -6 -2 -1 -6 -1 6 222 C 0 -4 -5 -5 10 -2 -5 -5 1 -1 -1 -5 0 -1 -4 -1 0 -5 0 0 223 Q 0 1 4 2 -5 2 0 0 0 -4 -2 1 0 0 0 -1 -1 -3 -3 -4 224 A -1 -1 1 3 -4 -1 1 4 -3 -4 -3 -1 -2 -2 -3 0 -2 -2 -2 -3

Serine is scored differently in these two positions

Active site nucleophile

PSI-BLAST

Create your own PSSM:Confirming relationships of purinenucleotide metabolism proteins

query BLOSUM62PSSM AlignmentAlignment

PSI BLAST

>gi|113340|sp|P03958|ADA_MOUSE ADENOSINE DEAMINASE (ADENOSINE AMINOHMAQTPAFNKPKVELHVHLDGAIKPETILYFGKKRGIALPADTVEELRNIIGMDKPLSLPGFLAKFDYYVIAGCREAIKRIAYEFVEMKAKEGVVYVEVRYSPHLLANSKVDPMPWNQTEGDVTPDDVVDLVNQGLQEQAFGIKVRSILCCMRHQPSWSLEVLELCKKYNQKTVVAMDLAGDETIEGSSLFPGHVEAYEGAVKNGRTVHAGEVGSPEVVREAVDILKTERVGHGYHTIEDEALYNRLLKENMHFEVCPWSSYLTGAWDPKTTHVRFKNDKANYSLNTDDPLIFKSTLDTDYQMTKKDMGFTEEEFKRLNINAAKSSFLPEEEKKELLERLY

e value cutoff for PSSM

PSI Results: Initial BLAST Run

eFirst PSSM Search

Other purine nucleotide metabolizing enzymes not found by ordinary BLAST

eThird PSSM Search: Convergence

Just below threshold, another nucleotide metabolism enzyme

ePfam-A seeds: HMM based models representing a wide variety of functional domains derived from SWISS-PROT

Entrez Domains (CDD)

HMM based models originally concentrating on eukaryotic signalingdomains, now expanding

BLAST based alignments derived from complete proteomes of prokaryotes

NCBI curated domains based on sequence and structural alignments

Pfam pfam01234

smart00123

cd01234

COG0123

Sanger

Single Domains

Protein Families

Protein Links: Domains

Results of a CD-Search

Click on a colored bar to align your sequence to the CD

CDD Record – heme peroxidases

aligned query

red = high conservation

blue = low conservation

Curated CD Record

Curated CDs (cd12345) are based on sequence and structure alignments

Annotated features

Structural evidence

aligned query

Blink: Sequence to Structure

related structures

Related StructuresCn3D

Entrez Structure

• Derived from experimentally determined PDB records

• Add value to PDB records by:– Adding explicit chemical bonding

information– Validating and indexing the sequences– Annotating 3D domains and secondary

structure– Adding links to CDD, Taxonomy, Pubmed – Converting PDB data to ASN.1

• Structure neighbors determined byVector Alignment Search Tool (VAST)

MMDB: MMolecular MModeling Data Base

Structure

Structure Summary Page

Conserved Domains

VAST Neighbors for chain C (domain 0)

VAST Neighbors for domain 2

eVAST: Structure

NeighborsVector Alignment Search Tool

For each 3D domain,

locate SSEs (secondarystructure elements),

and represent them asindividual vectors.

Human IL-4

VAST uses 3D Domains only!Whole polypeptides are assigned 3D domain 0 (zero).

VAST Neighbors

3D domains!

Viewing a VAST Alignment

RMSD in Angstroms

Sequence percent identityVAST P value

Submitting a PDB File to VAST

• Redesigned interface!• This is the best way to convert PDB into MMDB format!

eEntrez PubChem

PC Substance

PC Compound

PC BioAssay

Primary database of chemical samples

Derived database of known chemicals fromPC Substance records

Primary database of bioactivity screens ofsamples in PC Substance

Links from Structure

N-acetylglucosamine

mannose

fucose

Search for thyroxine

ChemID 24KEGG 4DTP-NCI 3NIST 3 Biocyc 2BIND 2Chembank 2NIAID 1TOTAL 41

Sequence Polymorphisms

SNP OMIM

• Primary database of submitted SNPs• Curated database of reference SNPs• Contains more than just SNPs:

• True SNPs• MNP (multiple nucleotide)• Insertions• Deletions• Microsatellites• Mixed• No variation (constant)

• Clinical literature database• Curated at Johns Hopkins Univ• Links human genes and genetic disorders to human disease• Lists allelic variants that have clinical consequences

Variations in SNP are not necessarily in OMIM, and vice versa!

General Polymorphisms Human Phenotypes

Linking to SNP

Links to SNP are also available fromNucleotide and Protein

Entrez Gene - TPO

Entrez SNP

primary data: ss#

SNP UID: rs#

Find Non-synonymous SNPs

#7 AND coding nonsynon[Function Class]

Function Class

Non-synonymous TPO SNPs

Link to Map Viewer

View all SNPs in locus

Link to related 3D structures

GeneView in dbSNP

Links to OMIM

Links to SNP are also available fromNucleotide and Protein

Entrez Gene - TPO

OMIM Record

Explore a Disease SNP

Curated CD Record

For More Information…

•General Help info@ncbi.nlm.nih.gov•BLAST blast-help@ncbi.nlm.nih.gov

E-mail addresses

The (free!) NCBI Newsletter

The NCBI Handbook

http://www.ncbi.nih.gov/Education/index.html

The NCBI Education Page

http://www.ncbi.nih.gov/About/newsletter.html

Follow the link from the NCBI Home Page

ncbi fieldguide ncbi molecular biology resources a field guide part 2 august 2-3, 2005

local alignment human

cw worm

rgr worm

worm qwkledlfnldsseykeasinf

human reqlehi

v ivte

v v hyri h i f

ig g fg v g n va worm

Documents

ncbi fieldguide ncbi molecular biology resources march 2007...

empathy fieldguide

erdas fieldguide

xcs v10 0 fieldguide

molecular biology databases ncbi, ddbl, embl and others

ncbi molecular biology resources

ncbi fieldguide mapviewer genome resources and sequence...

2012 fieldguide lores-2

ncbi fieldguide ncbi molecular biology resources march 2007...

ncbi field guide ncbi molecular biology resources march 2007...

sc7.70 fieldguide

ncbi fieldguide ncbi molecular biology resources january 12,...

fieldguide (1)

cgc barbados fieldguide

ncbi fieldguide ncbi molecular biology resources january...

ncbi - middlebury...

ncbi fieldguide a field guide part 2 august 30, 2005...

ncbi fieldguide ncbi molecular biology resources a field...

the ncbi c++ software development toolkit · the ncbi data...

ncbi fieldguide ncbi molecular biology resources a field...