working with gene lists: finding data using geo & biomart june 5, 2014
TRANSCRIPT
Working with gene lists:Finding data using GEO
& BioMart
June 5, 2014
Analyzing a gene listWith hundreds of genes but a limited budget and lab
personnel, you need to prioritize the gene list to candidate genes for follow-up
Pick ones that are “interesting”Known to be involved in other related processes but
not (yet) in your process of interestHas protein features which suggest a function in your
process, but it has not been characterizedNo known function or domain, but it shows up in
other, related high-throughput experiments suggesting a key role in your process of interest
Our approach
Analyzing gene lists by:
1. Finding overlap with other high-throughput experiments
2. Finding additional information using BioMart1. Mouse/human homologs2. Protein domain content3. GO classification
GEO (gene expression omnibus)GEO Datasets
Curated gene expression datasets i.e. there is backlog of experiments that haven’t made it
into the databaseCan search for experiments and conduct differential
gene expression queries on some datasetsCan download datasets & do offline analyses
GEO ProfilesProfiles of expression data for genes
Why search GEO?What other experiments have been done that are
similar to yours?GEO datasets
How do my genes of interest behave in other large scale experimentsGEO profiles
GEO Profile searchSearch on a gene name (C04F5.7):
GEO Dataset search
“C. elegans”: 4434
GEO Dataset searches
Query Total datasets
C. elegans datasets
C. elegans 4434 4072
C. elegans AND response 131 121
C. elegans AND host response 5 5
C. elegans AND immune 24 20
C. elegans AND antimicrobial 109 94
Once dataset identifiedDownload data
SOFT format: tab-delimited data Issues:
Not necessarily processed such that they have the ratios of experiment/control
If starting with raw data, may not be able to replicate exactly what authors did or lack expertise/software to generate a list of DE genes
Look for supplementary data from publication Usually they provide a list of all DE genes
Choice of dataset for comparison
In class demo
Biomart – EBI EnsemblUse series of menus
Data source – organism (genes, variation, ect) Filters -- reduce the number of results Attributes – what data to return
Can set up very precise and multilayered queriesCan query across multiple organisms
Simple query:Given a list of gene IDs, you can obtain attributes or
sequences for the entire listTools
ID converter – very useful, easy to use
Two sites for BioMart access
www.biomart.org
Database journal issue on BioMart
Filtering in BioMart
Attributes in BioMart
BiomartFilters
C. elegans genes with a human homologSpecify only genes with >= # isoformsprotein coding genes with a transmembrane domain
AttributesEntrez Gene IDs, WormBase IDs, Affy IDsSequence data
transcript, protein, UTRs, flanking regions, ect.
BioMartIn class demo
Today’s exerciseCompare current dataset from PLoS Pathogens
paper to data from a different datasetIdentify & retrieve additional information about C.
elegans genes using BioMart