vsd: a database for schizophrenia candidate genes focusing on variations

7
DATABASES VSD: A Database for Schizophrenia Candidate Genes Focusing on Variations Min Zhou, 2,3 Yong-Long Zhuang, 1 Qi Xu, 2,3 Yan-Da Li, 1n and Yan Shen 2,3n 1 Institutes of Bioinformatics, Tsinghua University, Beijing, China; 2 Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences (CAMS) and Peking Union Medical College (PUMC), Beijing, China; 3 Chinese National Human Genome Center, Beijing, China Communicated by Jaime Cuticchia Schizophrenia is a common mental disease characterized by delusions, hallucinations, and formal thought disorder. It has been demonstrated with genetic evidence that the disease is a polygenic disorder. Pharmacological, neurochemical, and clinical studies have suggested a number of schizophrenia susceptibility loci. In order to systematically search for genes with small effect in the development of schizophrenia, a database called VSD was established to provide variation data for publicly available candidate genes. Most of the genes encode neurotransmitter receptors, neurotransmitter transporters, and the enzymes involved in their metabolism. Other candidate genes extracted from published literature are also included. The variation information has been collected from publicly available mutation and polymorphism databases such as dbSNP, HGVbase, and OMIM, with single nucleotide polymorphism (SNP) being the most abundant form of collected variations. Reference sequences from NCBI’s RefSeq database are used as references when positioning variation at transcript and protein levels. The nonsynonymous SNPs (nsSNPs) that lead to amino acid changes in the functional sites or domains of proteins are distinguished since they are more likely to affect protein function and would be target SNPs for association studies. In addition to variation data, gene descriptions, enzyme information, and other biological information for each gene locus are also included. The latest version of VSD contains 23,648 variations assigned to a total of 186 genes. Five-hundred eighty-eight domains and sites annotated in the SWISS-PROT and InterPro databases are found to contain nsSNPs. VSD may be accessed via the World Wide Web (www.chgb.org.cn/vsd.htm) and will be developed as an up-to-date and comprehensive locus-specific resource for identifying susceptibility genes for schizophrenia. Hum Mutat 23:1–7, 2004. r r 2003 Wiley-Liss, Inc. KEY WORDS: database; SNP; nonsynonymous SNP; schizophrenia; neuropsychiatric DATABASES: www.chgb.org.cn/vsd.htm (VSD) INTRODUCTION Schizophrenia is a common mental disease that affects approximately 1% of the population and has a devastat- ing effect on the patients’ lives [McGuffin et al., 1994]. It is characterized by delusions, hallucinations, and formal thought disorder, together with a decline in socio- occupational functioning. The primary importance of the genetic component in schizophrenia has been demon- strated by family, twin, and adoption studies [Kendler et al., 1999]. Currently, the working hypothesis for schizophrenia is that multiple genes of small-to-moderate effects probably confer a compounding risk, through interactions with each other and with nongenetic risk factors. Molecular geneticists have employed a number of strategies aimed at identifying susceptibility genes for schizophrenia. Both linkage (parametric and nonpara- metric) and association (case-control and family-based) strategies have been applied to the search for DNA sequence variations that may be responsible for the development of the disease. So far, linkage studies have yielded some evidence for the location of genes of moderate effects, however, at present, none of these findings may be regarded as conclusive. Psychopharmacological models implicating the dopamine and 5-hydroxytryptamine (5-HT) neuro- transmitter pathways have provided a number of leads for Received 30 May 2003; accepted revised manuscript10 September 2003. n Correspondence to: Prof.Yan Shen, Department of Biotechnology & Molecular Biology, Institute of Basic Medical Sciences, CAMS and PUMC,5 Dong DanganTiao, Beijing100005, China. E-mail: [email protected] and Prof. Yan-Da Li, Institutes of Bioinformatics,Tsinghua University, Beijing100084,China. E-mail: [email protected] Grant sponsor: National HighTechnology Research and Develop- ment Program of China; Grant numbers: 2001AA221071; 2002BA711A07; Grant sponsor: China Nation Key Program on Basic Research; Grant number:1998051003. Min Zhou andYong-Long Zhuang contributed equally to this work. DOI 10.1002/humu.10289 Published online inWiley InterScience (www.interscience.wiley.com). r r 2003 WILEY-LISS, INC. HUMAN MUTATION 23:1^7 (2004)

Upload: min-zhou

Post on 11-Jun-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VSD: A database for schizophrenia candidate genes focusing on variations

DATABASES

VSD: A Database for Schizophrenia CandidateGenes Focusing on Variations

Min Zhou,2,3 Yong-Long Zhuang,1 Qi Xu,2,3 Yan-Da Li,1n and Yan Shen2,3n

1Institutes of Bioinformatics, Tsinghua University, Beijing, China; 2Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences(CAMS) and Peking Union Medical College (PUMC), Beijing, China; 3Chinese National Human Genome Center, Beijing, China

Communicated by Jaime Cuticchia

Schizophrenia is a common mental disease characterized by delusions, hallucinations, and formal thoughtdisorder. It has been demonstrated with genetic evidence that the disease is a polygenic disorder.Pharmacological, neurochemical, and clinical studies have suggested a number of schizophrenia susceptibilityloci. In order to systematically search for genes with small effect in the development of schizophrenia, a databasecalled VSD was established to provide variation data for publicly available candidate genes. Most of the genesencode neurotransmitter receptors, neurotransmitter transporters, and the enzymes involved in theirmetabolism. Other candidate genes extracted from published literature are also included. The variationinformation has been collected from publicly available mutation and polymorphism databases such as dbSNP,HGVbase, and OMIM, with single nucleotide polymorphism (SNP) being the most abundant form of collectedvariations. Reference sequences from NCBI’s RefSeq database are used as references when positioning variationat transcript and protein levels. The nonsynonymous SNPs (nsSNPs) that lead to amino acid changes in thefunctional sites or domains of proteins are distinguished since they are more likely to affect protein function andwould be target SNPs for association studies. In addition to variation data, gene descriptions, enzymeinformation, and other biological information for each gene locus are also included. The latest version of VSDcontains 23,648 variations assigned to a total of 186 genes. Five-hundred eighty-eight domains and sitesannotated in the SWISS-PROTand InterPro databases are found to contain nsSNPs. VSD may be accessed viathe World Wide Web (www.chgb.org.cn/vsd.htm) and will be developed as an up-to-date and comprehensivelocus-specific resource for identifying susceptibility genes for schizophrenia. Hum Mutat 23:1–7, 2004.rr 2003 Wiley-Liss, Inc.

KEY WORDS: database; SNP; nonsynonymous SNP; schizophrenia; neuropsychiatric

DATABASES:

www.chgb.org.cn/vsd.htm (VSD)

INTRODUCTION

Schizophrenia is a common mental disease that affectsapproximately 1% of the population and has a devastat-ing effect on the patients’ lives [McGuffin et al., 1994]. Itis characterized by delusions, hallucinations, and formalthought disorder, together with a decline in socio-occupational functioning. The primary importance of thegenetic component in schizophrenia has been demon-strated by family, twin, and adoption studies [Kendleret al., 1999]. Currently, the working hypothesis forschizophrenia is that multiple genes of small-to-moderateeffects probably confer a compounding risk, throughinteractions with each other and with nongenetic riskfactors. Molecular geneticists have employed a number ofstrategies aimed at identifying susceptibility genes forschizophrenia. Both linkage (parametric and nonpara-metric) and association (case-control and family-based)strategies have been applied to the search for DNAsequence variations that may be responsible for thedevelopment of the disease.

So far, linkage studies have yielded some evidence forthe location of genes of moderate effects, however, atpresent, none of these findings may be regarded asconclusive. Psychopharmacological models implicatingthe dopamine and 5-hydroxytryptamine (5-HT) neuro-transmitter pathways have provided a number of leads for

Received 30 May 2003; accepted revised manuscript10 September2003.

nCorrespondence to: Prof.Yan Shen, Department of Biotechnology& Molecular Biology, Institute of Basic Medical Sciences, CAMS andPUMC,5 DongDanganTiao, Beijing100005,China.E-mail: [email protected] and Prof. Yan-Da Li, Institutes ofBioinformatics,Tsinghua University, Beijing100084,China.E-mail: [email protected] sponsor: National High Technology Research and Develop-

ment Program of China; Grant numbers: 2001AA221071;2002BA711A07; Grant sponsor: China Nation Key Program on BasicResearch; Grant number:1998051003.Min Zhou andYong-Long Zhuang contributed equally to this work.

DOI10.1002/humu.10289Published online inWiley InterScience (www.interscience.wiley.com).

rr2003 WILEY-LISS, INC.

HUMANMUTATION 23:1^7 (2004)

Page 2: VSD: A database for schizophrenia candidate genes focusing on variations

schizophrenia susceptibility loci, primarily because oftheir role as sites for antipsychotic drug action. Recenttheories about the neurochemical mechanisms under-lying the disorder have emphasized a potential primaryrole of two amino acid neurotransmitters, glutamate andgamma-aminobutyric acid (GABA), in the neuropathol-ogy of schizophrenia. Several variations of thesecandidate genes have been studied, however, the resultswere equivocal. The most promising findings involvealleles of the 5-HT2A receptor gene and the dopamineD3 receptor gene [Williams et al., 1996, 1997, 1998;Dubertret et al., 1998]. Another group of candidategenes is neurodevelopmental genes. These genes maydisrupt maturation and growth of the brain, and may alsocause decreased cerebral volume and neuronal disarray,which are often observed in the brain of schizophreniapatients [Harrison, 1999; Shenton et al., 2001]. How-ever, the above results have been inconsistent [Nankoet al., 1994; Sasaki et al., 1997; Virgos et al., 2001]. Moreresearch has been performed to identify other genesassociated with schizophrenia and to determine how theyaffect the brain and whether the identified genes willcombine in an additive or interactive fashion to influencethe risk for schizophrenia.

Since each susceptibility gene does not function inisolation, in recent years there has been an increasinginterest in systematic searches for genes with small effectin polygenic disorders. Candidate gene studies based onconvincing neurobiological hypotheses and prioritized bylinkage studies could provide valuable clues for systema-tic association studies to elucidate the molecular basis ofschizophrenia. Thus, the VSD database (Database forSchizophrenia candidate genes focusing on Variation)has been constructed to provide a collection of availablevariations of candidate genes from public databases suchas HGVbase (http://hgvbase.cgr.ki.se) [Fredman et al.,2002], dbSNP (www.ncbi.nlm.nih.gov/SNP) [Sherryet al., 2001], and OMIM (www.ncbi.nlm.nih.gov/omim)[Hamosh et al., 2002]. In this study, candidate genes forgenetic association studies are selected from publicdatabases and research literatures based on pharmaco-logical, neurochemical, and clinical evidence pointing atspecific receptors, enzymes, or other molecules thatmight be involved in the etiopathogenesis of schizo-phrenia, which may be of interest to the researchers inthis field. Since NCBI RefSeq (www.ncbi.nlm.nih.gov/LocusLink/refseq.html) [Pruitt and Maglott, 2001]provides reference sequence standards for the completegenomic nucleic acids, transcripts, and proteins, it is usedto position variations in the VSD. In addition, VSDorganizes information on gene loci, including nomencla-ture, chromosome positions, gene and protein descrip-tions, enzyme and pathway information, and links toother resources such as GeneCards, LocusLink, etc.,which aim at giving a complete set of knowledgecurrently available about the genes. The VSD databaseis unique, since variations are positioned at levels ofgenomic DNA, transcript, and protein, and nonsynon-ymous SNPs (nsSNPs) that might affect protein functionare analyzed.

DATABASE STRUCTUREThe VSD database mainly contains a list of non-

redundant variation records implemented as follows.Four categories of variation are defined: 1) single-basedifferences; 2) insertion-deletion variants; 3) simpletandem repeat polymorphisms; and 4) ‘‘genetic’’ (orcomplex) changes involving alterations not described bythe preceding three alternatives. The scope of thedatabase includes disease-causing mutations as well asneutral polymorphisms. The current knowledge of thegeneral sequence variation suggests that SNP markerswith unknown influences will constitute the majority ofrecords.In order to make this database a useful resource, four

main types of information are included in each entry.

Variation Information

This is the core of the database. All variations arecategorized into different gene loci. Both disease-causingmutations and polymorphisms are maintained withinmost gene loci, with mutations derived from OMIM andpolymorphisms derived from dbSNP and HGVbase.For polymorphism data, the record identifiers adopted

are those assigned by dbSNP, in this way a nonredundantcatalog is maintained with simplified tasks of dataextraction and subsequent experimental planning bydatabase users. For each polymorphism, at least one file isincluded to provide genomic, transcript, and proteinlevel reference where possible. In addition to aboveinformation, other data fields used to describe sequencevariation information include: 25 base pairs of 50 and 30

of the polymorphism, which may help to localize thepolymorphism within a short DNA sequence; intrageniclocation of a polymorphism such as the exonic, intronic,and 50-, 30- untranslated region, and, in many cases, thedetailed codon and deduced amino acid changes;validation status; and hyperlinks to the HGVbase andLocusLink databases.

Reference Sequence Information

This information was mainly derived from the NCBIRefSeq database; the reviewed RefSeq record was used torepresent a ‘‘review article’’ for the sequence. ThemRNAs and encoded proteins are available as distinctNM_###### and NP_###### accession num-bers. This section includes the accession number oftranscript or protein records, their respective version,definition, maps, and sequence data.

Protein Feature Information

The annotations of the SWISS-PROT database(www.expasy.ch/sprot/) [Bairoch and Apweiler, 2000]and the InterPro database (www.ebi.ac.uk/interpro/)[Apweiler et al., 2001] were adopted to provideinformation on protein features. Annotations from theSWISS-PROT database consist of descriptions of thefollowing items: functions of the protein; post-transla-tional modifications (e.g., carbohydrates, phosphoryla-tion, acetylation, etc.); domains and sites (such as

2 ZHOU ETAL.

Page 3: VSD: A database for schizophrenia candidate genes focusing on variations

ATP-binding sites, zinc fingers, SH2 and SH3 domains,etc.); secondary structure (e.g., alpha helix, beta sheet,etc.); diseases associated with deficiencies in the protein,and others. Descriptions of the protein family, domain,repeat-, or post-translational modifications from theInterPro database are also included.

Enzyme Information

The current version of the enzyme information isdeveloped as a value-added resource, consisting ofclassification of enzymes, chemical compounds (containschemical structures of metabolites and other chemicalcompounds), and chemical elements (contains chemicalreactions, mostly enzymatic reactions), which intend torepresent functional aspects of proteins and smallchemical compounds.

DATABASE FEATURES

The VSD database differs from other variationdatabases in the following aspects.

VSD is a locus-specific database, concentrating onvariations within schizophrenia candidate genes. Itprovides greater depth of information because of itsspecialized knowledge. Besides variation data, corre-sponding reference sequence information and enzymedata are also included. It therefore serves as aninformation platform for the particular research commu-nity around the world, and is maintained as an accurateand up-to-date data source.

VSD maintains a current set of literature citations formost candidate genes, which represents studies related toassociation between a specific gene or variation andschizophrenia. The citations give the paper title, authors,journal citation information, PubMed IDs linked toPubMed abstracts, and a comment on the reference thatconveys useful information to researchers. These com-ments cover the basic results of studies, sometimesincluding experimental samples (population information)and the specific experimental conditions.

For reference sequence information, sequence data ofmRNAs and encoded proteins is represented; 50, 30 UTRand CDS are also shown. The sequence is furthermodified by indicating variations at the transcript andprotein levels. A graphical view of sequence information

above is shown in Figure 1. Only variations at transcriptlevel are displayed, and feature annotations for thenucleotide or protein (such as ‘‘misc_RNA’’ features,polyA signal) are also included, in this case for the CarAregion.The VSD positions variations at three different levels.

The VSD contains a map view of genomic DNA,mRNA, and protein reference sequences, and variationpositions at these three levels for most candidate genes.The Spidey program [Wheelan et al., 2001] is used toalign mRNA reference sequences to contigs. Figure 2shows an example for the COMT entry, in which thegenomic DNA sequence is displayed by a line; referencemRNA and corresponding protein are shown by a fewblocks, and their variations are labeled. By clicking onthese images, variation data and sequence informationmay be obtained.For genes that have corresponding entries in the

SWISS-PROT database, protein variants annotated inthe SWISS-PROT database are also included in theVSD. The SWISS-PROT database has made great effortsto provide a high level of annotations, mainly consistingof descriptions of protein function and features. Thosevariants localized in feature regions draw our attentionbecause of their possibilities to affect protein structureand function. Two examples (Figs. 3 and 4) show therelationships between variants and protein features forthe HTR2A and ALDH2 genes. Variants are positionedon the SWISS-PROT protein sequence; the proteinfeatures containing these variants are represented.Figure 4 shows that some variants might have someeffect on different secondary structures (strand, helix,and turn).Nonsynonymous SNPs (nsSNPs) are located in protein

features. Although genes might be implicated by tests offunctionally neutral polymorphisms if they are in tightlinkage disequilibrium with a pathogenic locus, a priorityshould be given to those variations that might impactprotein structure and/or expression with which to test forassociation. In this respect, those variations changingamino acid sequences (nsSNPs) seem attractive. Further-more, nsSNPs located in protein functional sites ordomains become the focus, since they are more likely toaffect protein function than other nsSNPs, and would beselected as target SNPs for association studies. The VSD

FIGURE 1. A sample map illustrating reference sequences, positions of variations at transcript level, and feature annotations for theCAD gene (LocusID: 236). To display the relationship between reference mRNAs and protein sequence, these two sequences areshown on the same line,with their lengths (6,972 nt and 2,225 aa),50 and 30 UTR (1^26 nt and 6,705^6,972 nt), andCDS.The varia-tionpositions at themRNA level are labeled inblue (online). In addition, featureannotations for reference sequences aregiven, alongwith the feature descriptions. For example, CarA is a region from 27 nt to1,106 nt on the mRNA reference sequence, which is a‘‘Car-bamoylphosphate synthase small subunit’’and refers toCOG0505. [Color ¢gure can be viewed in the online issue,which is availableat www.interscience.wiley.com.]

VSD: SCHIZOPHRENIAGENETICDATABASE 3

Page 4: VSD: A database for schizophrenia candidate genes focusing on variations

FIGURE 2. A sample map illustrating gene structure and variation positions at genomic DNA, transcript, and protein levels for theCOMTgene (LocusID:1312).The gene has two splicesome formats; variation positions at three levels are labeled in blue (online).[Color ¢gure can be viewed in the online issue,which is available at www.interscience.wiley.com.]

FIGURE 3. A sample map illustrating the relationship between variants and protein features for the HTR2A gene (LocusID: 3356).Protein features are exacted fromSWISS-PROTannotation.Variants onP28223 are shown in blue (online), alongwith protein func-tional regions or sites containing these variants.Text descriptions of variants and protein features are added below. For instance, the¢rst variant: ‘‘VARIANT’’ is the key name; ‘‘25 25’’ indicates the variant endpoints; the remaining portion of the line contains thedescription of this variant. DOMAIN, extent of a domain of interest on the sequence;TRANSMEM,extent of a transmembrane region;DISULFID, disul¢de bond. [Color ¢gure can be viewed in the online issue,which is available at www.interscience.wiley.com.]

FIGURE 4. A samplemap illustrating the relationship betweenvariants and protein features for theALDH2 gene (LocusID:217). Pro-tein features are extracted fromSWISS-PROTannotation.Variants onP05091are labeled in blue (online), alongwith protein second-ary structures containing these variants. CHAIN, extent of a polypeptide chain in the mature protein; HELIX, one type of secondarystructure;TURN,one typeof secondary structure; STRAND,one typeof secondary structure. [Color ¢gure canbeviewed in theonlineissue,which is available at www.interscience.wiley.com.]

4 ZHOU ETAL.

Page 5: VSD: A database for schizophrenia candidate genes focusing on variations

provides a graphical view of the relationship betweennsSNPs and protein features for most candidate geneloci. Through comparing the reference protein sequencewith the SWISS-PROT protein sequence, nsSNPs arepositioned on the SWISS-PROT protein sequence.Protein features (such as post-translational modifica-tions, domains and sites, or secondary structure) areextracted from the SWISS-PROT and InterProsequence annotations. The following compares nsSNPlocations with that of protein features. An example isgiven (Fig. 5) that displays all nsSNPs and proteinvariants annotated in the SWISS-PROT database thatare cataloged to the GRIK1 gene, i.e., three nsSNPsaffecting the 757th, 870th, and 902nd amino acidscorresponding to several protein features. In addition,text descriptions of the three nsSNPs, three variants, andprotein features are added below. However, these nsSNPslocated in the SWISS-PROT and InterPro annotationsdo not directly imply that the nsSNPs are critically

involved in the function of proteins, since many of thensSNPs specify a broad range of the sequence, such asthe ‘‘domain.’’

DATACOLLECTION AND CONTENT

The aim of the VSD database is to collect all commonDNA sequence variations within schizophrenia candi-date genes, especially those that might influence generegulation and protein function. For the sake ofpositioning variations at genomic, transcript, and proteinlevels, only those genes that have corresponding geneloci in RefSeq are included. At the time of writing thispaper, VSD had acquired 186 candidate genesfrom several sources: in light of the neurotransmitterhypothesis of schizophrenia, genes involved in neuro-transmissions of dopamine, 5-hydroxytryptaine (5-HT),and glutamate and gamma-aminobutyric acid (GABA)

FIGURE 5. A sample map illustrating the relationship between variations and protein features for the GRIK1 gene (LocusID: 2897).Variations include nsSNPs and protein variants. Protein features are extracted from SWISS-PROTand InterPro annotation.Throughcomparing the reference protein sequence with that of SWISS-PROT, nsSNPs are positioned on the SWISS-PROT protein sequence(P39086). nsSNPs and protein variants on P39086 are shown in red and blue, respectively (online), corresponding protein featuresare also represented.Text descriptions of variations and protein features are appended below (see Fig. 3). InterPro, protein featuresextracted fromInterProdatabase;CHAIN,extentof apolypeptidechain in thematureprotein; DOMAIN,extentof adomainof intereston the sequence;TRANSMEM, extent of a transmembrane region;VARSPLIC, description of sequence of variants produced by alter-native splicing. [Color ¢gure can be viewed in the online issue,which is available at www.interscience.wiley.com.]

VSD: SCHIZOPHRENIAGENETICDATABASE 5

Page 6: VSD: A database for schizophrenia candidate genes focusing on variations

have drawn particular attention. The VSD collects 141candidate genes distributed on these neurotransmittersrelated amino acid pathways (tyrosine, tryptophan,and glutamate metabolism pathways) from theKEGG/PATHWAY database (www.genome.ad.jp/kegg/)[Kanehisa et al., 2002]. In addition, candidate genes forschizophrenia have been studied for over two decadesand the results are scattered in the scientific literature.Thus far, we have gathered 54,544 documents bysearching PubMed database for ‘‘schizophrenia’’-relatedstudies, then selected 3,910 documents concerning‘‘gene’’ or ‘‘association,’’ and parsed 45 schizophreniacandidate genes from them, excluding 141 enzyme genes.These 45 candidate genes consist of 15 neurotransmitterreceptors and transporters, and 30 others—includingthose genes located in schizophrenia susceptibility locisupported by linkage studies such as PRODH2, andNOTCH4; those suggested by cytogenetic approachessuch as PCQAP, and ZNF74; and those implicated byneurodevelopmental hypothesis in the etiology ofschizophrenia such as FN1, NTF3, etc.

The scope of the VSD includes both disease-causingmutations and neutral polymorphisms. A great deal ofvariation data originate from public polymorphism andmutation databases (such as dbSNP, HGVbase, andOMIM). Therefore, by repeatedly harvesting publicdatasets as they evolve, we expect to make the VSDan updated and comprehensive variation resource forschizophrenia research.

Through processing of the collected data, the currentversion of the VSD contains 23,648 polymorphisms andmutations (23,355 from the dbSNP and HGVbase and293 from OMIM) assigned to a total of 186 genes.Among them, SNPs are the most abundant form.Summary of the introgenic location of variations in hostgenes suggests that 79.5% of all variations are locatedwithin an intronic region, 5.9% within an exonic region,and 2.1% are nsSNPs. Therefore, among a total of 186genes, 178 genes have corresponding SWISS-PROTentries. The types of annotations from the SWISS-PROT database and the number of correspondingvariants and nsSNPs are summarized in Table 1. A totalof 20 nsSNPs in disulfide-binding sites and four nsSNPsin transmembrane regions are identified. These nsSNPsseem more likely to affect protein function than othernsSNPs.

DATABASE USAGE ANDREPRESENTATION

The VSD database can be searched directly via theInternet. Users can do a simple search just by choosingdifferent subjects based on keywords for gene name,SWISS-PROTaccession number, NCBI RefSeq accessionnumber, chromosome location, dbSNP ID, OMIM ID, orEC number. The results are presented as a set of relatedgene loci, which list not only the gene symbol andchromosome location but also detailed descriptions ofthe genes. When a particular gene is selected, one canget a detailed record that mainly consists of referencesequences, variation information, and external links.

Reference sequences are derived from RefSeq, providingreference for positioning exon-located variations atmRNA and protein sequence levels. Variation informa-tion gives a list of variations categorized into this genelocus, which includes SNPs, protein variants on SWISS-PROT protein sequence, and allelic variants in OMIM.For SNPs, clicking on each SNPID will lead to the datafor that variation, which contains the dbSNP identifier,variation positions indicated within a contig or referencesequence, allelic bases, and the 25 bases of 50 and 30

flanking sequence in the context of the surroundinggenome sequence. In addition, for variations cataloguedto the human gene, a map view shows positions ofvariations at three levels, and includes a graphical viewsummarizing the relationship between nsSNPs andprotein features. External links are provided for sourceinformation in public databases, such as LocusLink,GeneCards, Unigene, dbSNP, HGVbase, and OMIM, aswell as PubMed for literature sources.Besides online search tools, graphics for three

neurotransmitter pathways, and chromosome maps ontowhich the candidate genes have been sorted are availablefor browsing. The pathways, which comprise successivereaction steps in the metabolic procedure and summarizethe protein interaction network, will allow the users toaccess the enzyme record and consequently the informa-tion for each gene locus.

TABLE 1. Functional Sites andDomains fromSWISS-PROTA¡ected byVariants and nsSNPs

SWISS-PROT feature Variants nsSNPs

ACT_SITE 1 0CARBOHYD 2 0CHAIN 231 84DISULFID 25 20DOMAIN 148 71HELIX 20 7METAL 4 0MUTAGEN 1 0NP_BIND 0 2PEPTIDE 0 4PROPEP 2 2REPEAT 22 8SIGNAL 2 2STRAND 15 3TRANSIT 2 4TRANSMEM 7 4TURN 6 2VARSPLIC 44 15

Functional sites and domains (annotated in a feature table of SWISS-PROT database) a¡ected by variants and nsSNPs are listed, along withthe number of involved variants and nsSNPs. The de¢nitions of featuresymbols are explained: ACT_SITE, amino acid(s) involved in the activityof an enzyme; CARBOHYD, glycosylation site; CHAIN, extent of a poly-peptide chain in themature protein; DISULFID, disul¢de bond; DOMAIN,extent of a domain of interest on the sequence; HELIX, one type of sec-ondary structure; METAL, binding site for a metal ion; MUTAGEN, sitewhich has been experimentally altered; NP_BIND, extent of a nucleotidephosphate binding region; PEPTIDE, extent of a released active peptide;PROPEP, extent of a propeptide; REPEAT, extent of an internal sequencerepetition; SIGNAL, extent of a signal sequence; STRAND, one type ofsecondary structure; TRANSIT, extent of a transit peptide; TRANSMEM,extent of a transmembrane region;TURN,one typeof secondary structure;andVARSPLIC, description of sequence of variants produced by alterna-tive splicing.

6 ZHOU ETAL.

Page 7: VSD: A database for schizophrenia candidate genes focusing on variations

FUTUREDIRECTIONS

As a locus-specific variation database, the VSD mayprovide help to clinicians and researchers for uncoveringthe relationships between genes and schizophrenia. Itwill focus on research developments and undertake theeffort to gather more candidate genes and correspondingvariation data, and is expected to expand continuously.Data exchange with other public variation databases willfacilitate such an expansion. Future database develop-ments including structure and content enhancementswill improve the query function and make the VSD moreuseful.

ACKNOWLEDGMENT

We thank Professor Jesse Li-Ling of China MedicalUniversity for his valuable comments and kind help onthe manuscript.

REFERENCES

Apweiler R, Attwood TK, Bairoch, A, Bateman A, Birney E,Biswas M, Bucher P, Cerutti L, Corpet F, Croning MD, DurbinR, Falquet L, Fleischmann W, Gouzy J, Hermjakob H, Hulo N,Jonassen I, Kahn D, Kanapin A, Karavidopoulou Y, Lopez R,Marx B, Mulder NJ, Oinn TM, Pagni M, Servant F, Sigrist CJ,Zdobnov EM. 2001. The InterPro database, an integrateddocumentation resource for protein families, domains andfunctional sites. Nucleic Acids Res 29:37–40.

Bairoch A, Apweiler R. 2000. The SWISS-PROT proteinsequence database and its supplement TrEMBL in 2000. NucleicAcids Res 28:45–48.

Dubertret C, Gorwood P, Ades J, Feingold J, Schwartz JC, SokoloffP. 1998. Meta-analysis of DRD3 gene and schizophrenia: ethnicheterogeneity and significant association in Caucasians. Am JMed Genet 81:318–322.

Fredman D, Siegfried M, Yuan YP, Bork P, Lehvaslaiho H, BrookesAJ. 2002. HGVbase: a human sequence variation databaseemphasizing data quality and a broad spectrum of data sources.Nucleic Acids Res 30:387–391.

Hamosh A, Scott AF, Amberger J, Bocchini C, Valle D, McKusickVA. 2002. Online Mendelian Inheritance in Man (OMIM), aknowledgebase of human genes and genetic disorders. NucleicAcids Res 30:52–55.

Harrison PJ. 1999. The neuropathology of schizophrenia. A criticalreview of the data and their interpretation. Brain 122:595–624.

Kanehisa M, Goto S, Kawashima S, Nakaya A. 2002. The KEGGdatabases at GenomeNet. Nucleic Acids Res 30:42–46.

Kendler KS, MacLean CJ, Ma Y, O’Neill FA, Walsh D, Straub RE.1999. Marker-to-marker linkage disequilibrium on chromosome

5q, 6p, and 8p in Irish high-density schizophrenia pedigrees. AmJ Med Genet 88:29–33.

McGuffin P, Asherson P, Owen M, Farmer A.1994. The strengthof the genetic effect. Is there room for an environmentalinfluence in the etiology of schizophrenia? Br J Psychiatry164:593–599.

Nanko S, Hattori M, Kuwata S, Sasaki T, Fukuda R, Dai XY,Yamaguchi K, Shibata Y, Kazamatsuri H. 1994. Neurotrophin-3gene polymorphism associated with schizophrenia. ActaPsychiatr Scand 89:390–392.

Pruitt KD, Maglott DR. 2001. RefSeq and LocusLink: NCBI gene-centered recourses. Nucleic Acids Res 29:137–140.

Sasaki T, Dai XY, Kuwata S, Fukuda R, Kunugi H, Hattori M,Nanko S. 1997. Brain-derived neurotrophic factor gene andschizophrenia in Japanese subjects. Am J Med Genet 74:443–444.

Shenton ME, Dickey CC, Frumin M, McCarley RW. 2001. Areview of MRI findings in schizophrenia. Schizophr Res 49:1–52.

Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, SmigielskiEM, Sirotkin K. 2001. dbSNP: the NCBI database of geneticvariation. Nucleic Acids Res 29:308–311.

Virgos C, Martorell L, Valero J, Figuera L, Civeira F, Joven J, LabadA, Vilella E. 2001. Association study of schizophrenia withpolymorphisms at six candidate genes. Schizophr Res 49:65–71.

Wheelan SJ, Church DM, Ostell JM. 2001. Spidey: a toolfor mRNA-to-genomic alignments. Genome Res 11:1952–1957.

Williams J, Spurlock G, McGuffin P, Mallet J, Nothen MM, Gill M,Aschauer H, Nylander PO, Macciardi F, Owen MJ. 1996.Association between schizophrenia and T102C polymorphism ofthe 5-hydroxytryptamine type 2a-receptor gene. EuropeanMulticentre Association Study of Schizophrenia (EMASS)Group. Lancet 347:1294–1296.

Williams J, McGuffin P, Nothen M, Owen MJ. 1997. Meta-analysisof association between the 5-HT2a receptor T102C polymorph-ism and schizophrenia. EMASS Collaborative Group. EuropeanMulticentre Association study of schizophrenia. Lancet349:1221.

Williams J, Spurlock G, Holmans P, Mant R, Murphy K,Jones L, Cardno A, Asherson P, Blackwood D, Muir W,Meszaros K, Aschauer H, Mallet J, Laurent C, Pekkarinen P,Seppala J, Stefanis CN, Papadimitriou GN, Macciardi F,Verga M, Pato C, Azevedo H, Crocq M-A, Gurling H,Kalsi G, Curtis D, McGuffin P, Owen MJ. 1998. A meta-analysisand transmission disequilibrium study of association between thedopamine D3 receptor gene and schizophrenia. Mol Psychiatry3:141–149.

VSD: SCHIZOPHRENIAGENETICDATABASE 7