sequence databases and retrieval systems guy perrière [ replaced by manolo gouy ]
DESCRIPTION
Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ] Pôle Bio-Informatique Lyonnais Laboratoire de Biométrie et Biologie Évolutive UMR CNRS n° 5558 Université Claude Bernard – Lyon 1. In the beginning. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/1.jpg)
Sequence databases and retrieval systems
Guy Perrière
[ replaced by Manolo Gouy ]
Pôle Bio-Informatique LyonnaisLaboratoire de Biométrie et Biologie Évolutive
UMR CNRS n° 5558Université Claude Bernard – Lyon 1
![Page 2: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/2.jpg)
In the beginning
First paper compilation in 1965 (Atlas of Protein Sequences).
Development of real databanks at the begin-ning of the 80’s: Fast access. Make possible analyses that require a lot of
data:– Codon usage.
– Molecular phylogeny.
![Page 3: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/3.jpg)
General databanks
Nucleotide sequences: EMBL/GenBank/DDBJ.
Protein sequences: Simple translations of coding regions:
– GenPept (from GenBank).
– TrEMBL (from EMBL). Systems containing additional data:
– SWISS-PROT.
– PIR.
![Page 4: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/4.jpg)
EMBL
Created in 1980 at the European Molecular Biology Laboratory in Heidelberg.
Maintained since 1994 at the European Bioinformatics Institute (EBI) near Cambridge.
Web server:http://www.ebi.ac.uk/embl
![Page 5: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/5.jpg)
GenBank
Set up in 1979 at the Los Alamos National Laboratory in New Mexico, US.
Maintained since 1992 at the National Cen-ter for Biotechnology Information (NCBI) in Bethesda.
Web server:http://www.ncbi.nlm.nih.gov/Genbank/index.html
![Page 6: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/6.jpg)
DDBJ
Active since 1984 at the National Institute of Genetics (NIG) in Mishima, Japan.
Web server:http://www.ddbj.nig.ac.jp
![Page 7: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/7.jpg)
EMBL / GenBank / DDBJ
The International Nucleotide Sequence Database Collaboration : EMBL / GenBank / DDBJ
New sequences are exchanged daily between the three centers :--> the three banks have an identical content.
Data mainly provided by direct submissions from the authors through Internet: Web forms. Email.
![Page 8: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/8.jpg)
Data growth
GenBankEMBLPIRSWISS-PROT
5
6
7
8
9
10
1103
/83
06/8
4
09/8
5
12/8
6
03/8
8
06/8
9
09/9
0
12/9
1
03/9
3
06/9
4
09/9
5
12/9
6
03/9
8
06/9
9
09/0
0
12/0
1
03/0
3
log
(num
ber
of r
esid
ues)
![Page 9: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/9.jpg)
GenBank/EMBL size (April 2003)
31109 nucleotides. 24106 sequences. 1.8 million genes (proteins and RNA). 313,000 bibliographic references. 100 gigabytes on disk. Growth of 63 % in 12 months.
![Page 10: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/10.jpg)
Taxonomic sampling (April 2003)
There are 135,560 species for which at least one sequence is available.
Nine species (0.007 %) correspond to 62 % of the total.
77,900 species are represented by only one sequence!
Homo sapiensMus musculusZea maysRattus norvegicusBrassica oleraceaArabidopsis thalianaDanio rerioDrosophila melanogasterOryza sativa
27.3%20.1%3.0 %2.9 %2.3 %2.0 %2.0 %1.4 %0.9 %
The nine most represented species in GenBank/EMBL
![Page 11: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/11.jpg)
Distribution format
The banks are distributed as a set of text files called divisions ( 292 for EMBL).
A division contains sequences related to: A taxon (e.g., bacteria, invertebrates,
mammals). A class of sequences (EST, HTG, GSS).
Within a division, each sequence is called an entry.
![Page 12: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/12.jpg)
Entry structure
Information is introduced in structured fields.
The format differs in its form between EMBL and GenBank/DDBJ …
but not in substance.
![Page 13: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/13.jpg)
ID, AC, SV and DT fields
Contain identifiers and the creation and the last modification dates for the entries.
ID BSAMYL standard; DNA; PRO; 2680 BP.XXAC V00101; J01547XXSV V00101.1XXDT 13-JUL-1983 (Rel. 03, Created)DT 12-NOV-1996 (Rel. 49, Last updated, Version 11)
![Page 14: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/14.jpg)
DE, KW, OS and OC fields
Definition, Keywords, Taxonomy.
DE Bacillus subtilis amylase gene.XXKW amyE gene; amylase; amylase-alpha;KW regulatory region; signal peptide.XXOS Bacillus subtilisOC Bacteria; Firmicutes; Bacillus/Clostridium group;OS Bacillus/Staphylococcus group; Bacillus.
The NCBI maintains a unified taxonomy, largely based on sequence information.
![Page 15: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/15.jpg)
RN, RX, RA and RT fields
contain bibliographic information.
RN [1]RP 1-2680RX MEDLINE; 83143299.RA Yang M., Galizzi, A., Henner, D.J.;RT "Nucleotide sequence of the amylase gene fromRT Bacillus subtilis";RL Nucleic Acids Res. 11:237-249(1983).…
![Page 16: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/16.jpg)
FT field
contains the descriptions of functional regions. key location and qualifiersFT promoter 369..374FT /note="put. promoter sequence P2 [3] (amyR1)"FT RBS 414..419FT /note="rRNA-binding site rbs-1 [3]"FT CDS 498..2480FT /gene="amyE"FT /db_xref="SWISS-PROT:P00691"FT /product="alpha-amylase precursor"FT /EC_number="3.2.1.1”FT /protein_id="CAA23437.1"FT /translation="MFAKRFKTSLLPLFAGFLLLFHLVLAGPAAFT ASAETANKSNELTAPSIKSGTILHAWNWSFNTLKHNMKDIHDAG...
![Page 17: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/17.jpg)
Intron/exon structure
FT CDS join(242..610,3397..3542,5100..5351)FT /codon_start=1FT /db_xref="SWISS-PROT:P01308"FT /note="precursor"FT /gene="INS"FT /product="insulin"...
Sequence
Subsequence
![Page 18: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/18.jpg)
SQ field
Contains the sequence iselfSQ Sequence 2680 BP; 825 A; 520 C; 642 G; 693 T; 0 other; gctcatgccg agaatagaca ccaaagaaga actgtaaaaa cgggtgaagc agcagcgaat 60 agaatcaatt gcttgcgcct ttgcggtagt ggtgcttacg atgtacgaca gggggattcc 120 ccatacattc ttcgcttggc tgaaaatgat tcttcttttt atcgtctgcg gcggcgttct 180 gtttctgctt cggtatgtga ttgtgaagct ggcttacaga agagcggtaa aagaagaaat 240 (...) gatggtttct tttttgttca taaatcagac aaaacttttc tcttgcaaaa gtttgtgaag 2580 tgttgcacaa tataaatgtg aaatacttca caaacaaaaa gacatcaaag agaaacatac 2640 cctgcaagga tgctgatatt gtctgcattt gcgccggagc 2680//
![Page 19: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/19.jpg)
Errors in databanks
There are a lot of errors in the nucleotide sequence databanks: In annotations:
– Inaccuracies, omissions, and even mistakes.
– Inconsistencies between entries. In the sequences themselves:
– Sequencing errors.
– Cloning vectors inserted.
![Page 20: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/20.jpg)
Redundancy
Another major pro-blem is redundancy.
A lot of entries are partially or entirely duplicated:
20% of vertebrate se-quences in GenBank.
Duplicated entries are often different in their sequence.
{ {
{
Partial and completesequence duplications
![Page 21: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/21.jpg)
Protein sequence databases
Translation of Coding DNA Sequences (CDS) from EMBL/GenBank/DDBJ.
Consultation of publications or patents. Very small number of direct protein sequence
submission by authors. In SwissProt and PIR: additional annotations.
![Page 22: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/22.jpg)
SWISS-PROT
Created by Amos Bairoch in 1986 at the Department of Medical Biochemistry in Geneva.
Maintained by the Swiss Institute of Bioinformatics (SIB) and funded by GeneBio, and, very recently, by NIH.
Web server:http://www.expasy.ch/sprot/sprot-top.html
![Page 23: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/23.jpg)
SWISS-PROT characteristics
Almost no redundancy. Cross-references with 60 other databanks. High-quality annotations:
Systematic control by a team of annotators. Help from a set of > 200 volunteer experts.
Embedded in Expasy, a www proteomics server (http://www.expasy.org) .
![Page 24: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/24.jpg)
Annotations
Protein function. Post-translational modifications. Structural or functional domains. Secondary and quaternary structures. Similarities with other proteins. Conflicts between positions for CDS. Disease-related mutations
![Page 25: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/25.jpg)
Associated databanks
TrEMBL, built using only annotated CDS from the EMBL data library.
ENZYME, for the international enzyme nomenclature.
PROSITE, for biologically significant sites, patterns and profiles.
SWISS-2DPAGE, for two-dimensional polyacrylamide gel electrophoresis maps.
![Page 26: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/26.jpg)
PIR
PIR (The Protein Information Resource) was created by Margaret Dayhoff in 1965.
Aims: To provide exhaustive and non-redundant
protein sequence data. To give a classification using taxonomic and
similarity data:entries grouped in super-families, families
and subfamilies.
![Page 27: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/27.jpg)
Data maintenance
Three organisms collect and organize the data introduced in PIR: The National Biomedical Research Foundation
(NBRF) in the United States. The Martinsried Institute for Protein Sequence
(MIPS) in Germany. The Japan International Protein Sequence
Information Database (JIPID) in Japan.
![Page 28: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/28.jpg)
Results
The exhaustivity is not better than what is obtained with SWISS-PROT+TrEMBL.
Still contains redundancy. Less comprehensive annotation. Low number of cross-references. PIR has recently joined forces with EBI and SIB
to establish the UniProt (United Protein Databases), the central resource of protein sequence and function.
![Page 29: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/29.jpg)
Specialized databanks
A lot of specialized databanks have been developed, which are devoted to: Complete genomes. Families of homologous genes. Non-sequence data.
These systems are under the responsibility of curators: Data quality and homogeneity control.
![Page 30: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/30.jpg)
Complete genomes
There is a large number of databanks devoted to specific organisms.
These banks are associated to sequencing or mapping projects.
For some model organisms there are often several concurrent systems.
![Page 31: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/31.jpg)
Examples
Available databanks
NRSub (Non-Redundant B. subtilis)SubtiList
ColibriEcoGene (E. coli Gene Database)ECDC (E. coli Database Collection)
CMR (Comprehensive Microbial Resource)EMGLib (Enhanced Microbial Genomes Library)Micado (Microbial Advanced Database Organization)
MYGD (MIPS Yeast Genome Database)SGD (Saccharomyces Genome Database)YPD (Yeast Proteome Database)
FlyBase
PlasmoDB (P. falciparum Database)
WormBaseWormPD (Worm Protein Database)
TAIR (The Arabidopsis Information Resource)
Organism
Bacillus subtilis
Escherichia coli
Various prokaryotes
Saccharomyces cerevisiae
Drosophila melanogaster
Plasmodium falciparum
Caenorhabditis elegans
Arabidopsis thaliana
![Page 32: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/32.jpg)
Gene family databanks
Built with automated procedures: Similarity search between sets of proteins
(BLASTP, FASTP, Smith-Waterman). Clustering into homologous families using
similarity criteria. Include various data:
Protein (and sometimes nucleotide) sequences. Multiple sequence alignments and trees. Taxonomy.
![Page 33: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/33.jpg)
ProtFam
Developed at MIPS. Built with PIR sequences. Includes four levels of classification:
Superfamilies (based on function and similarity criteria).
Families (50% similarity). Subfamilies (80% similarity). Entries (≥95% similarity).
![Page 34: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/34.jpg)
ProtFAm characteristics
Allows to visualize alignments and dendrograms for the families.
Integrates Pfam domains. Allows users to classify their own protein
sequences. Web server:
http://mips.gsf.de
![Page 35: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/35.jpg)
ProtoMap
Initially developed at the Hebrew University of Jerusalem ; now hosted at Cornell University.
Built with SWISS-PROT & TrEMBL sequences.
Combines 3 sequence similarity measures (BLASTP, FASTA and Smith-Waterman).
![Page 36: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/36.jpg)
ProtoMap characteristics
Alignments and trees are visualized with Java applets.
Users can submit sequences and classify them.
Web server:http://protomap.cornell.edu/index.html
![Page 37: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/37.jpg)
Specialized systems
HOVERGEN (Homologous Vertebrate Genes Database) : Based on GenBank CDS.
HOBACGEN (Homologous Bacterial Genes Database) for prokaryotes and yeast: Based on SWISS-PROT/TrEMBL.
HOBACGEN-CG for completely sequenced genomes: Based on SWISS-PROT/TrEMBL.
![Page 38: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/38.jpg)
Other specialized systems
COG (Clusters of Orthologous Groups), also for complete genomes: Based on GenBank CDS.
NuReBase (Nuclear Receptors Database) for mammalian nuclear receptors: Based on EMBL CDS.
RTKdb (Tyrosine Kinase Receptors): Based on EMBL CDS.
![Page 39: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/39.jpg)
Q9KPJ1
GLT1_YEAST
Q9VVA4
Q22275100
GLTS_SYNY3
O67512
Q9PA10
AAG08421
P95456
GLTB_ECOLI
100
100
85
56
100
Q9RXX2
Q9PJA4GLTB_SYNY3
GLTB_BACSU
Q9KC4697
100
Q9KPJ4
P96218
Q9S2Y9100
57
22
30
100
75
100
Are COGs real orthologs?
Reciprocalbest BLAST hit
Glutamate synthase large subunit
Escherichia coliBacillus subtilisPseudomonas
aeruginosaVibrio choleraeSynechocystis sp.
![Page 40: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/40.jpg)
Beyond protein families
ProtFam, Hovergen, Hobacgen, COGs gather protein sequences homologous on their whole length
Patterns, profiles, domains, …are covered in Terry Attwood’s lecture.
![Page 41: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/41.jpg)
Non-sequence data
Available systems
GXD (Mouse Gene Expression Database)The Stanford Microarray Database
GDB (Genome Data Base)EMG (Encyclopedia of Mouse Genome)MGD (Mouse Genome Database)INE (Integrated Rice Genome Explorer)
SWISS-2DPAGEPDD (Protein Disease Database)Sub2D (B. subtilis 2D Protein Index)
PDB (Protein Data Bank)MMDB (Molecular Modelling Data Base)NRL_3D (Non-Redundant Library of 3D Structures)SCOP (Structural Classification of Proteins)
ALFRED (Allele Frequency Database)
DIP (Database of Interacting proteins)BIND (Biomolecular Interaction Network Database)
Data
Gene expression
Mapping
Protein quantification
3D structures
Polymorphism
Molecular interactions
![Page 42: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/42.jpg)
Sequence Data retrieval
Made mainly through Internet access: With client software (e.g., Entrez, HobacFetch). By remote connections to servers providing on-
line access to the banks (INFOBIOGEN). Using World-Wide Web servers and browsers
![Page 43: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/43.jpg)
Advantages and limitations
Users do not have to cope with the usual databases problems: Storing of large amounts of data. Daily updates. Software upgrades.
Simplicity of use. Net access is sometimes very slow at peak
hours: consider using other servers besides NCBI
![Page 44: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/44.jpg)
The ACNUC retrieval system
Direct access to functional regions described in feature tables (CDS, tRNA, rRNA).
Selection of entries using various criteria: Sequence names and accession numbers. Bibliographic criteria. Keywords. Taxonomy. Organelle.
Developed at Lyon University
![Page 45: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/45.jpg)
ACNUC : possible accesses
Graphical interface distributed along with the databases themselves.
http://pbil.univ-lyon1.fr/databases/acnuc.html Web access at Pôle Bio-Informatique
Lyonnais (PBIL):http://pbil.univ-lyon1.fr/search/query.html
![Page 46: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/46.jpg)
ACNUC characteristics
Allows to query any bank in PIR, SWISS-PROT, EMBL, or GenBank formats.
Keywords and species browsing. Complex queries. Links with sequence analysis programs on
the Web server (alignment, codon usage).
![Page 47: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/47.jpg)
click
click
![Page 48: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/48.jpg)
The Query form
![Page 49: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/49.jpg)
click
Building queries to the sequence data bases
![Page 50: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/50.jpg)
![Page 51: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/51.jpg)
![Page 52: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/52.jpg)
![Page 53: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/53.jpg)
click
![Page 54: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/54.jpg)
Locally save the received sequence data.
Retrieving sequences
![Page 55: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/55.jpg)
Browsing thespecies trees
![Page 56: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/56.jpg)
![Page 57: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/57.jpg)
![Page 58: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/58.jpg)
HOVERGEN:Families of homologousvertebrate genes
![Page 59: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/59.jpg)
Access to family members
Download treeor alignment
![Page 60: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/60.jpg)
![Page 61: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/61.jpg)
SRS
Public version developed at EMBL by Etzold and Argos (1993).
Presently available on the different Web servers belonging to EMBnet: EBI (England). INFOBIOGEN (France). DKFZ (Germany). …
![Page 62: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/62.jpg)
Characteristics
Database index built with the use of ODD (Object Design and Definition).
More than 250 databanks have been indexed and are accessible through 35 SRS servers.
Allows queries to operate simultaneously on different banks.
![Page 63: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/63.jpg)
Databanks interconnection
SWISS-PROT
ENZYME
PDB
HSSP
SWISSNEW
YPDREF
YPD
PDBFINDERALI
DSSP
FSSP
NRL_3D
PMD
PIR
ProtFamFlyGene
TFSITE
TFACTOR
EMBL
TrEMBL
ECDC
TrEMBLNEW
EMNEW
EPD
GenBank MOLPROBE
OMIM
MIMMAP
REBASE
PROSITE ProDom
PROSITEDOCBlocks
SWISSDOM
![Page 64: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/64.jpg)
Entrez
Developed by Schuler et al. (1996) at NCBI. Allows to query several US-made databases:
GenBank, GenPept, NR, MMDB, MEDLINE. Access through client software (Unix, Mac or
Windows) or Web server:http://www.ncbi.nlm.nih.gov
![Page 65: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/65.jpg)
Characteristics
Introduces the concept of neighbours between sequences, references and structures.
Sequence neighbours are established using similarity criteria.
No access to multiple alignments.
Phylogeny(Taxman)
Structures(MMDB)
Refs.(PubMed)
CompleteGenomes
Nucl. Seq.(GenBank)
Prot. Seq.(GenPept)
![Page 66: Sequence databases and retrieval systems Guy Perrière [ replaced by Manolo Gouy ]](https://reader036.vdocuments.us/reader036/viewer/2022062802/568145e9550346895db2eb50/html5/thumbnails/66.jpg)
NAR 2003 database issue
http://nar.oupjournals.org/content/vol31/issue1/