the genecards tm project at the at the weizmann institute of science

44
GeneCards GeneCards TM TM Project Project at the at the zmann Institute of Scie zmann Institute of Scie

Upload: irma-norton

Post on 03-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The GeneCards TM Project at the at the Weizmann Institute of Science

The The GeneCardsGeneCardsTMTM Project Project at the at the Weizmann Institute of ScienceWeizmann Institute of Science

Page 2: The GeneCards TM Project at the at the Weizmann Institute of Science
Page 3: The GeneCards TM Project at the at the Weizmann Institute of Science

• For each gene - a card with displayed data and links to entries in major databases

• Genes with HUGO nomenclature symbolsand others

• Automatic data mining and integration

• Advanced human-computer interaction

http://bioinformatics.weizmann.ac.il/cards/http://bioinformatics.weizmann.ac.il/cards/

Page 4: The GeneCards TM Project at the at the Weizmann Institute of Science

gene chromosome

chromosomal location

genetic map

mutationmedical applications

protein research article

similar mouse gene

marker

RNAgenealias

disease

DNAsequence

Page 5: The GeneCards TM Project at the at the Weizmann Institute of Science

Swiss-Prot

GenBank

EMBL

DDBJ

Sanger Centre

Whitehead/MIT

WashU

GESTEC

UniGene

TIGR

GAC

Stanford

GeneMap'98

CEPH

Genethon

CHLC

Marshfield

Utah

GDB

LDB

UDB

NCBI

GENATLAS

PIRBLOCKS

PRODOM

PRINTS

PfamPDB

OMIMGeneCards

TGD

IMGT

PKR

COPE

HGMD

dbSNP

BRCA1

CFTR

TP53

HOVERGEN

Databases ContainingHuman Genome Information

UDB

Page 6: The GeneCards TM Project at the at the Weizmann Institute of Science

GeneCards: From Chaos to Order

Data is retrieved and integrated automatically

A card for each gene

o Aliases o DNA, RNA o Protein o Chromosomal location o Disorders o Medical applications o Related mouse gene o Research articles o Links to more data

Page 7: The GeneCards TM Project at the at the Weizmann Institute of Science

Data Related to Genes

Nucleotide SEQUENCE -Genomic/cDNA, -coding/regulatory VARIATION (polymorphism, mutation) Chromosomal LOCATION EXPRESSION (tissues, developmental, disease) PROTEIN - sequence, domains, 3D - subcellular location - 2D electrophoeresis Biological PATHWAYS

G E N E

DISEASE

PHARMA (diagnostics, vaccines, drugs)

ORTHOLOGS (model organisms, knockout)

Commercial DNA ARRAYS

PATENTS

Page 8: The GeneCards TM Project at the at the Weizmann Institute of Science

GeneCard: Integrated Data and Starting Point

Mining and Integration of Data

GeneCard Entries in

Data Sources of GeneCards

other Data Sources

link to link tolink to

link to

link to

other Data Sources

Data Sources of GeneCards

A Starting point for More Data

Page 9: The GeneCards TM Project at the at the Weizmann Institute of Science

HUGO nomenclature gene symbol

Accession ID to other databases

If chromosome 21

LocusLink or HUGO location

A typical GeneCard: A typical GeneCard: RUNX1RUNX1

Page 10: The GeneCards TM Project at the at the Weizmann Institute of Science

For chromosome 21 only

Information on proteins

Sequence accessions

Page 11: The GeneCards TM Project at the at the Weizmann Institute of Science

Disorders and mutations

Medical news from Doctor’s guide

Published literature

Single nucleotide polymorphisms

Homologues

Page 12: The GeneCards TM Project at the at the Weizmann Institute of Science

Additional information

Start new search

Snapshot of additional Snapshot of additional GeneCard fieldsGeneCard fields

Page 13: The GeneCards TM Project at the at the Weizmann Institute of Science

Improved Single Nucleotide Polymorphisms Summaries

Page 14: The GeneCards TM Project at the at the Weizmann Institute of Science

Current GeneCards Data Sources and Links

HUGO GDB OMIM SWISS-PROT

LocusLink UDB UniGene MGD DOTS UCSC

GenBank PubMed CroW 21 Doctor’s Guide

HUGE euGenes Genatlas ATLAS HGMD TGDB

BCGD MTDB RZPD MIPS PDB BLOCKS

HORDE dbSNP ENSEMBL SBCELEGANS

GeneLynx IMGT SOURCE

Page 15: The GeneCards TM Project at the at the Weizmann Institute of Science

Gene sourcesGene sources

HUGOHUGO

LocusLinkLocusLink

CroW 21CroW 21

MGDMGD

13,046

360

63

8,951

Page 16: The GeneCards TM Project at the at the Weizmann Institute of Science

Simple search box

resultsno results

spell corrections

query modification

outside resources

gene 1: name ... -keyword ... ... ... -keyword .

gene 2: name

-keyword ...

search keywords

How to search and findHow to search and find??

Page 17: The GeneCards TM Project at the at the Weizmann Institute of Science

Some GeneCards StatisticsSome GeneCards Statistics

27,61227,612 GeneCards (November, 2001)

13,54813,548 HUGO approved genes

2,646,1852,646,185 Accesses to GeneCards (at WIS since

January 1, 1998(

2525 Mirror sites around the world

Page 18: The GeneCards TM Project at the at the Weizmann Institute of Science
Page 19: The GeneCards TM Project at the at the Weizmann Institute of Science
Page 20: The GeneCards TM Project at the at the Weizmann Institute of Science

The Affymetrix System

Page 21: The GeneCards TM Project at the at the Weizmann Institute of Science

Genechip Procedure

HybridizationHybridization Signal detectionSignal detection Data analysisData analysisSample Sample preparationpreparation

Fluidic station Scanner Software

Page 22: The GeneCards TM Project at the at the Weizmann Institute of Science

ChipCards - A Functional Integration Tool for DNA Array Data

Tsviya Olender, Shirley Horn-Saban, Marilyn Safran, Vered Chalifa-Caspi, Michal Ronen and Doron Lancet

The Crown Human Genome CenterThe Weizmann Institute of Center, Rehovot 76100

Page 23: The GeneCards TM Project at the at the Weizmann Institute of Science

ChipCards correlates DNA array data with comprehensive information from gene-specific databases. It is currently implemented for the Affymetrix GeneChip.

ChipCards’s output is an HTML table with essential additional information for each gene including: gene symbol, functional definition, accession number, protein information, chromosomal location and EST data.

Human data is integrated with GeneCards, UDB and Unigene.

Mouse data is integrated with information about the human orthologue via GeneCards, HomoloGene and MGD.

About ChipCards

Page 24: The GeneCards TM Project at the at the Weizmann Institute of Science

Example of GeneChip output before ChipCards processing

Page 25: The GeneCards TM Project at the at the Weizmann Institute of Science

An Extract of Human Expression Data After ChipCards Processing

A snapshot of ChipCards’s result, with human Affymetrix expression data as input.Each probe set has a link to NCBI, GeneCards and UDB. Information about the cDNA sources of the geneis extracted from Unigene and is given as a separate column in the table. The same for UDB coordinates.

NCBI link GeneCards link UDB link

Page 26: The GeneCards TM Project at the at the Weizmann Institute of Science

Murine Expression Data After ChipCards Processiong

GeneCards link

A snapshot of ChipCards output for Mouse Affymetrix expression data. Each probe set is linked to NCBI and Unigene. Information about the human orthologue is integrated into the table and includes links to NCBI, GeneCards and Unigene.

NCBI link Human’s Unigene link

Human orthologes data

NCBI linkMurine’s Unigene link

Page 27: The GeneCards TM Project at the at the Weizmann Institute of Science

GeneCardfor novelgene

Unigenecluster

1

2

3

45

Assembly-basedresources

Genesequencetag

Uniquepersistent gene

identifier

Current Research - Adding Cards for Genes that Don’t Yet Have a Name

Page 28: The GeneCards TM Project at the at the Weizmann Institute of Science

Improving flexibility, allowing automated parameterized generation from partial sets of sources and/or genes, and appending to an existing database

Providing an Application Programming Interface for users of the generation software to incorporate their own data

Standardizing the format of the database to use XML

Version 3.0 Project Goals

Page 29: The GeneCards TM Project at the at the Weizmann Institute of Science

Providing a foundation for supplying a stable identifier for each GeneCard, even when no known gene symbol exists

Improving the maintainability, testability, and quality of the software

Providing a seamless migration path from Version 2.xx while maintaining the current look and feel and functionality

Project Goals (cont’d)

Page 30: The GeneCards TM Project at the at the Weizmann Institute of Science

Pros and Cons of Using OOP• Perl not originally

designed as an OOP language

• Type safety, proper encapsulation and aggregation aren’t enforced

• Can be between 20 and 50 % slower

• Allows for more robust implementations

• Greater modularity• More comprehensible

interface to modules• Better abstraction of

software components• Less namespace pollution• Greater code reusability• Software scalability• Cleaner and more compact

code

Page 31: The GeneCards TM Project at the at the Weizmann Institute of Science

Combines an object-oriented skeleton with some non object-oriented internals

•The large data structure of gene-based data is implemented as a hash of hashes, avoiding numerous costly instantiations

•All other major components, including the extractors and administration classes, are implemented as objects

The 3.0 Hybrid Solution

Page 32: The GeneCards TM Project at the at the Weizmann Institute of Science

GeneCards Architecture

GeneCards Database

Generation Software

SwissProt Extractor

Customized Extractor

UniGene Extractor

Support Functions

API

Display Software

Page 33: The GeneCards TM Project at the at the Weizmann Institute of Science

An underlying layer of support tools that manage extracting data from locally mirrored files and the internet, proxy connections,

verification, security, file management, caching, conflict detection, error handling, statistics, and XML output formating

A set of extractor classes, one for each source of information using source-specific algorithms and heuristics (adapted from pervious

versions of GeneCards). Methods include new, prepare and search

A template for building extractor classes. All such classes can create new or append to old entries, as well as generate data for all entries

(genes) at once, or one at a time

A main class that handles building sets of cards according to parameterized partial ordering rules

Generation Software Classes

Page 34: The GeneCards TM Project at the at the Weizmann Institute of Science

XML is a meta-language that supports customized tags for describing and providing semantic meaning to structured data

Typed elements are arranged within other elements to form a nested hierarchy

The data is grouped by source in the XML files, but can be retrieved by function: <GCresource>SWISSPROT <GCresource>OMIM <protein> <disorder>Colorectal Cancer <disorder>Germline Cancer </disorder> </disorder> </GCresource> </protein> <GCresource>GENECLINICS <GCresource> <disorder>Li-Fraumeni Syndrome </disorder> </GCResource>

Each extractor module is responsible for its own Document Type Definition (DTD) specification to ensure that the XML is well formed and valid

Files are stored in a hierarchical directory structure, one file per gene

The XML-Based Database

Page 35: The GeneCards TM Project at the at the Weizmann Institute of Science

Currently in the design phase

Want to maintain the current look and feel while providing the flexibility of easy customization

Will use XML Perl parser modules in cgi scripts

Search will be expanded beyond current text-based capabilities to include context-specific searches

The Display Software

Page 36: The GeneCards TM Project at the at the Weizmann Institute of Science

Procedural programs/ad-hoc flat file format

Object-oriented methodology/standardized XML

Easy to add new extractors Flexible and extensibile

Performance , Searching strategies

3.0 Project Status and Open Issues

Page 37: The GeneCards TM Project at the at the Weizmann Institute of Science

Integrated chrmosomal maps

Source-specific information

Thesaurus

Original public databases

Data mining

Semantic Integration

Megabase Integration

Data mining and integrationUnified Database (UDB)

UDB

Page 38: The GeneCards TM Project at the at the Weizmann Institute of Science

Sequence-Based Repositioning

(SBR)

Placing finished genomic sequences on UDB map.

Map fine tuning in sequenced regions.

Page 39: The GeneCards TM Project at the at the Weizmann Institute of Science

Elimination of overlaps between

contigs

Object repositioning

UDB original map SBR map

SBR (Sequence Based Repositioning)

Page 40: The GeneCards TM Project at the at the Weizmann Institute of Science

Search Results - a Map Slice

to MarkerCard

to Unigene

to GeneCard

Page 41: The GeneCards TM Project at the at the Weizmann Institute of Science

A MarkerCard

Page 42: The GeneCards TM Project at the at the Weizmann Institute of Science

GeneCards Success Stories• GeneCards as a bookmark for linkage analysis

• Mutations that were polymorphisms and not disease-causing• Adult-onset diabetes without obesity in India• Work on Chromosome 21 at the Weizmann Institute• PVT – a heart disease found in Israeli Beduins• Parkinson’s disease paper

Page 43: The GeneCards TM Project at the at the Weizmann Institute of Science

Frequently Asked Questions

• What’s special about GeneCards?

• Can I interface my own data?

• Can I access my own in-house database mirrors

instead of public internet sites?

Page 44: The GeneCards TM Project at the at the Weizmann Institute of Science

alumni:alumni:Michael RebhanShai Shen-OrrInga PeterJaime PriluskyMichal RonenHershel SaferJulie StampnitzkyLiora Yaar

currentcurrent::Avital AdatoVered Chalifa-CaspiMichal LapidotZvia OlenderNaomi RosenMarilyn Safran, headOrit ShmueliIrina SolomonDoron Lancet, PI

GeneCards/UDB Team