adding go for large datasets

13
Adding GO for Large Datasets COST Functional Modeling Workshop 22-24 April, Helsinki

Upload: latham

Post on 23-Feb-2016

67 views

Category:

Documents


0 download

DESCRIPTION

Adding GO for Large Datasets. COST Functional Modeling Workshop 22-24 April, Helsinki. Large Datasets. RNASeq data sets and etc.: large data sets often there is little functional information available m any enrichment analysis tools will not accept large gene lists - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Adding GO for Large Datasets

Adding GO for Large DatasetsCOST Functional Modeling Workshop22-24 April, Helsinki

Page 2: Adding GO for Large Datasets

Large Datasets

RNASeq data sets and etc.: large data sets often there is little functional information

available many enrichment analysis tools will not

accept large gene lists RNASeq data sets also contain “novel”

genes

Page 3: Adding GO for Large Datasets

1. Finding Existing GO

1. Use GOProfiler to search based upon taxon or name.

2. Check the GO Consortium Website to see if your species of interest has an active annotation effort.• or to determine which relate species may have GO

annotations that can be transferred

3. Use QuickGO or GOProfiler to download existing GO annotations.

4. Add your own GO annotations…

Page 4: Adding GO for Large Datasets
Page 5: Adding GO for Large Datasets
Page 6: Adding GO for Large Datasets
Page 7: Adding GO for Large Datasets

download GO annotation file from this link

Page 8: Adding GO for Large Datasets

http://geneontology.org/

Page 9: Adding GO for Large Datasets

2. Adding High-throughput GOnt fasta file

species’ taxon ID

aa fasta file

InterProScan list of motifs and domains

InterPro2GO GO association file (IEA, ND)

GOanna/Blast2GO,

etc

GO association file (ISA)

combine to make single

GO annotation file

EMBOSS Transeq (or etc)

BLAST database of EXP GO

annotations for related species

Note: AgBase & iPlant are working to make these tools freely available via the AgBase & iPlant websites.

Page 10: Adding GO for Large Datasets

http://www.ebi.ac.uk/Tools/emboss

Page 11: Adding GO for Large Datasets

Comments1. Translating transcripts to proteins:• many different programs • most assume proteins > 100aa• assume that proteins is translated from longest ORF• EMBOSS – free and high-throughput; also available on Galaxy, iPlant2. InterProScan:• searches sequences for conserved domains and motifs• very intensive computing (needs HPC)• Online tools at EBI – limited to proteins, low throughput• iPlant – is preparing an instance• AgBase – can help3. InterPro2GO• Script that converts InterPro IDs into their corresponding GO IDs• Available at geneontology.org

Page 12: Adding GO for Large Datasets

Comments4. Adding GO using Blast:• Need to identify related species that have experimental GO • Search database of experimental GO (should not transfer

annotations with IEA, ISS, etc evidence codes)• Use a test set of sequences to identify Blast parameters (e.g.

Evalues, expect, etc.) for the full dataset5. Combining GO from InterProScan & Blast:• Remove any duplicate annotations derived from InterProScan (IEA)

and Blast (ISA).• Remove any “no data” (ND) annotations where you have added an

annotation using Blast.

Note: GO IEA annotations are continually updated (by manual review) and are considered out of date after one year.

Page 13: Adding GO for Large Datasets

For help with adding GO, contact AgBase.