genome reannotation: dealing with the atypical, the ambiguous, and the contrary
TRANSCRIPT
Genome Genome reannotation:reannotation:
Dealing with the Dealing with the atypical, the atypical, the
ambiguous, and the ambiguous, and the contrarycontrary
Release 3.2 contributorsRelease 3.2 contributors
Kathy CampbellKathy Campbell Lynn CrosbyLynn Crosby Beverley MatthewsBeverley Matthews Andy SchroederAndy Schroeder Brian BettencourtBrian Bettencourt Yanmei HuangYanmei Huang Leyla Leyla
BayraktarogluBayraktaroglu Pavel HradeckyPavel Hradecky
Gillian MillburnGillian Millburn Sima MisraSima Misra Chris SmithChris Smith Eleanor WhitfieldEleanor Whitfield
Peili ZhangPeili Zhang Pinglei ZhouPinglei Zhou
Bottom linesBottom lines Annotate generouslyAnnotate generously
Criteria should not be too stringentCriteria should not be too stringent
Label the ambiguous and atypicalLabel the ambiguous and atypical Define a “problematic” categoryDefine a “problematic” category Use a CV to describeUse a CV to describe
Devise a confidence-rating system or an Devise a confidence-rating system or an evidence tally systemevidence tally system
Comments for validation Comments for validation flagsflags
Unusual spliceUnusual splice Short CDSShort CDS Short intronShort intron Overlaps transposonOverlaps transposon Unconventional translation startUnconventional translation start Multiphase exonMultiphase exon CDS overlapCDS overlap DicistronicDicistronic
The dubious annotationThe dubious annotation
Categorized as problematic/provisionalCategorized as problematic/provisional Described using controlled commentsDescribed using controlled comments
““Short CDS”Short CDS” ““Gene prediction only”Gene prediction only” ““Possible gene fragment”Possible gene fragment”
Allows capture of the ORF without Allows capture of the ORF without condoning the gene modelcondoning the gene model
The dubious transcriptThe dubious transcript
Problematic transcriptProblematic transcript ““Truncated ORF” Truncated ORF” ““Supported by single cDNA”Supported by single cDNA”
Controlled comments; distinguish Controlled comments; distinguish between:between: Truncated ORFTruncated ORF Short CDS relative to cDNA length Short CDS relative to cDNA length
(stops throughout; no long ORF)(stops throughout; no long ORF) Short CDS (previous case)Short CDS (previous case)
Annotated, but…Annotated, but…
Third transcript classified as Third transcript classified as problematicproblematic Can be excludedCan be excluded Clearly flaggedClearly flagged
Controlled commentsControlled comments ““Truncated ORF”Truncated ORF” ““Supported by single cDNA”Supported by single cDNA” ““Suspect cDNA: possible unspliced Suspect cDNA: possible unspliced
intron”intron”
Transcript confidence Transcript confidence ratings:ratings:
data typesdata types cDNA data (complete/partial)cDNA data (complete/partial) Protein homology/protein Protein homology/protein
domain(s)domain(s) Gene predictionGene prediction
Flagged as problematicFlagged as problematic
Evidence tally systemEvidence tally system
Yes/no indication for each Yes/no indication for each different level of supporting datadifferent level of supporting data
Flexible and open-endedFlexible and open-ended Can be dense and nuancedCan be dense and nuanced Users can easily set different Users can easily set different
combinations of criteria for bulk combinations of criteria for bulk data setsdata sets
Evidence tally:Evidence tally:cDNA and EST datacDNA and EST data
Transcript structure supportedTranscript structure supported UTRs supportedUTRs supported CDS supported (full-length)CDS supported (full-length) CDS supported (partial)CDS supported (partial)
Transcript overlaps cDNA(s) or Transcript overlaps cDNA(s) or EST(s)EST(s)
Evidence tally:Evidence tally:supporting protein datasupporting protein data
Homologous proteinsHomologous proteins High scoring of similar lengthHigh scoring of similar length Less similarLess similar Indication of taxonomic range?Indication of taxonomic range?
Complete protein domain(s) Complete protein domain(s) identifiedidentified
Evidence tally: cont.Evidence tally: cont.
Gene prediction(s) Gene prediction(s)
Problematic: [CV]Problematic: [CV] Short CDS; possible gene fragmentShort CDS; possible gene fragment Truncated CDSTruncated CDS Possible pseudogenePossible pseudogene CDS overlapCDS overlap etc.etc.
Evidence tally: open -Evidence tally: open -endedended
Experimental determination of 5’ endExperimental determination of 5’ end Northern dataNorthern data ORFeome dataORFeome data Microarray expression dataMicroarray expression data In situ expression dataIn situ expression data Protein expression dataProtein expression data
Dealing with the messy Dealing with the messy onesones
Allow provisional/problematic Allow provisional/problematic annotationsannotations Minimize biases of current knowledgeMinimize biases of current knowledge Can exclude from rigorous data setsCan exclude from rigorous data sets
Describe and categorize using Describe and categorize using controlled commentscontrolled comments
Fold into a transcript rating systemFold into a transcript rating system Evidence tallying systemEvidence tallying system