working with gene lists: finding data using geo & biomart june 5, 2014

Post on 17-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Working with gene lists:Finding data using GEO

& BioMart

June 5, 2014

Analyzing a gene listWith hundreds of genes but a limited budget and lab

personnel, you need to prioritize the gene list to candidate genes for follow-up

Pick ones that are “interesting”Known to be involved in other related processes but

not (yet) in your process of interestHas protein features which suggest a function in your

process, but it has not been characterizedNo known function or domain, but it shows up in

other, related high-throughput experiments suggesting a key role in your process of interest

Our approach

Analyzing gene lists by:

1. Finding overlap with other high-throughput experiments

2. Finding additional information using BioMart1. Mouse/human homologs2. Protein domain content3. GO classification

GEO (gene expression omnibus)GEO Datasets

Curated gene expression datasets i.e. there is backlog of experiments that haven’t made it

into the databaseCan search for experiments and conduct differential

gene expression queries on some datasetsCan download datasets & do offline analyses

GEO ProfilesProfiles of expression data for genes

Why search GEO?What other experiments have been done that are

similar to yours?GEO datasets

How do my genes of interest behave in other large scale experimentsGEO profiles

GEO Profile searchSearch on a gene name (C04F5.7):

GEO Dataset search

“C. elegans”: 4434

GEO Dataset searches

Query Total datasets

C. elegans datasets

C. elegans 4434 4072

C. elegans AND response 131 121

C. elegans AND host response 5 5

C. elegans AND immune 24 20

C. elegans AND antimicrobial 109 94

Once dataset identifiedDownload data

SOFT format: tab-delimited data Issues:

Not necessarily processed such that they have the ratios of experiment/control

If starting with raw data, may not be able to replicate exactly what authors did or lack expertise/software to generate a list of DE genes

Look for supplementary data from publication Usually they provide a list of all DE genes

Choice of dataset for comparison

In class demo

Biomart – EBI EnsemblUse series of menus

Data source – organism (genes, variation, ect) Filters -- reduce the number of results Attributes – what data to return

Can set up very precise and multilayered queriesCan query across multiple organisms

Simple query:Given a list of gene IDs, you can obtain attributes or

sequences for the entire listTools

ID converter – very useful, easy to use

Two sites for BioMart access

www.biomart.org

Database journal issue on BioMart

Filtering in BioMart

Attributes in BioMart

BiomartFilters

C. elegans genes with a human homologSpecify only genes with >= # isoformsprotein coding genes with a transmembrane domain

AttributesEntrez Gene IDs, WormBase IDs, Affy IDsSequence data

transcript, protein, UTRs, flanking regions, ect.

BioMartIn class demo

Today’s exerciseCompare current dataset from PLoS Pathogens

paper to data from a different datasetIdentify & retrieve additional information about C.

elegans genes using BioMart

top related