week 13, lecture 25 · week 13, lecture 25 istván albert biochemistry and molecular biology and...
TRANSCRIPT
![Page 1: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/1.jpg)
2015-BMMB852D:AppliedBioinforma8cs
Week13,Lecture25
IstvánAlbert
BiochemistryandMolecularBiologyandBioinforma8csConsul8ngCenter
PennState
![Page 2: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/2.jpg)
Genomerepresenta8onconcepts
• Atthesimplestlevelofabstrac8onthegenomeisrepresentedbyaonedimensional“space”(lines)
• Genomeistwostrandedàalinecorrespondstoeachstrand
• Eachstrandhasapolarityàeachlinehasadirec8on
• Strands(lines)arepaired
• Thesmallestunitisonebaseàoneintegeronthenumberline
• Annota8ons(features)aresegments(coordinates)oneachline
![Page 3: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/3.jpg)
Genomiccoordinates–briefoverviewDNAtwostrandedanddirec8onal
Butthereisonlyonecoordinatesystem
200 300
upstreamfortheforwardstrand
Standardformatsusestart<endevenforthereversestrand
Theupstreamregion–beforethe5’endrela8vetothedirec8onoftranscrip8on
upstreamforthereversestrand
5’ 3’
5’3’
![Page 4: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/4.jpg)
Coordinatesystems
• 0basedà0,1,2,…9• 1basedà1,2,3,…10
Typically
• 0basedarenon-inclusive10:20à[10,20)
• 1basedincludebothends10:20à[10,20]
![Page 5: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/5.jpg)
Comparingcoordinatesystems
VoteforwhatyouthinkisbeXer
1 based indexing
0 based indexing
Thirdelement
Firstten
Secondten
Thirdten
Onebaselongintervalstar8ngatthe10thelement.
Lengthofaninterval
Fiveelementsstar8ngatindex1000
Emptyinterval
![Page 6: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/6.jpg)
Fundamentalintervalformats
• SAM/BAM–SequenceAlignmentMap
• VCF/BCFàforvariantcalls
• BED/GFFàGeneAnnota8onrepresenta8on• BEDgraph,Wiggleàvaluesoverintervals
![Page 7: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/7.jpg)
Whatisagenomicfeature?
• Feature:agenomicregion(interval)associatedwithacertainannota8on(descrip8on).
TypicalaXributestodescribeafeature
1. chromosome2. start3. end4. strand5. name
Whydowehavesomanyvariants?Thereisnogoodra8onalreason…historyIguess
![Page 8: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/8.jpg)
Valuesonintervals
• Asinglevaluecharacterizesanen8reintervalàscore(value)fortheinterval
• Con8nuousvaluesàdifferentvalueforeachbaseoftheintervalàanalogoustoaseriesof1bplongintervals
Differentdatarepresenta8onformats
![Page 9: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/9.jpg)
hXp://genome.ucsc.edu/FAQ/FAQformat.html
![Page 10: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/10.jpg)
Twocommonlyusedformats
• BED–UCSCgenomebrowserà0basednoninclusiveàalsousedtodisplaytracksinthegenomebrowser(US“standard”)(variants:bigBed,bedgraph)
• GFF–Sangerins8tuteinGreatBritainà1basedinclusiveindexingsystem(“Europeanstandard”),(variants:GTF,GFF2.0)
![Page 11: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/11.jpg)
BEDformatSearchforBEDformat
Tabseparated3requiredand9op8onalcolumns.Lowernumberedfiledmustbefilled.
1. chrom(nameofthechromosome,sequenceid)2. chromStart(star8ngposi8ononthechromosome)3. chromEnd(endposi8onofthechromosome,notethisbaseisnotincluded!)4. name(featurename)5. score(between0and1000)6. strand(+or-)7. thickStart(thestar8ngposi8onatwhichthefeatureisdrawnthickly)8. thickEnd(theendingposi8onatwhichthefeatureisdrawnthickly)9. itemRGB(RGBcolorà255,0,0displaycolorofthedatacontained)10. blockCount(thenumberofblocks(exons)intheBEDline.)11. blockSizes(acomma-separatedlistoftheblocksizes)12. blockStarts(acomma-separatedlistoftheblockstarts)
Thesefilesmayalsotakeatrackdefini8onlinethatisvisualiza8onspecific
![Page 12: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/12.jpg)
BedGraphFormat
Tabseparated4requiredcolumns.
1. chrom(nameofthechromosome,sequenceid)2. chromStart(star8ngposi8ononthechromosome)3. chromEnd(endposi8onofthechromosome,notethisbaseisnotincluded!)4. dataValue(valueofthedataforthatregion)
![Page 13: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/13.jpg)
GFFformatSearchforGFF3àhXp://www.sequenceontology.org/gff3.shtml
Tabseparatedwith9columns.MissingaXributesmaybereplacedwithadotà.
1. Seqid(usuallychromosome)2. Source(whereisthedatacomingfrom)3. Type(usuallyatermfromthesequenceontology)4. Start(intervalstartrela8vetotheseqid)5. End(intervalendrela8vetotheseqid)6. Score(thescoreofthefeature,afloa8ngpointnumber)7. Strand(+or–)8. Phase(usedtoindicatereadingframeforcodingsequences)9. AZributes(semicolonseparatedaXributesàName=ABC;ID=1)
peopleliketostuffalotofinforma8onhere
![Page 14: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/14.jpg)
Wiggleformat
• twoversionsàfixedstepandvariablestepeachtryingtoop8mizetheamountofdatastorage
fixedStep chrom=chr1 start=100 step=1 10 15 11 22 … … …
variableStep chrom=chr1 100 10 101 15 102 11 103 22 variableStep chrom=chr2 2000 23 2005 40 … … …
Wiggleisannastyformat–itlookssimplerthanitis–pleaseavoid
![Page 15: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/15.jpg)
Wemayhavedataindifferentcoordinatesystems!
Being“oneoff”isoneofthemostcommonerrorsinbioinforma8cs.
ConversionfromGFFtoBED
(start,end)à(start–1,end)
ConversionfromBEDtoGFF
(start,end)à(start+1,end)
NotthattherewillbedifferenceswhenselecangposiaonsthatdependontheENDcoordinate!
![Page 16: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/16.jpg)
Handlingcoordinatesrelaavetointervals
Whatarethecoordinateofthebaseprecedingandfollowingtheinterval.Seemstrivialanditis-withacatch.
GFF[start,end]àbasebeforestartisatstart-1BED[start,end)àbasebeforestartisatstart-1GFF[start,end]ànextbaseaperendisatend+1BED[start,end)ànextbaseaperendisatend
![Page 17: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/17.jpg)
Represen8ngintervalrela8onships
• Wehaveagenewiththreesplicingvariants
Startat1000endsat8000,eachexonis1kbandisseparatedby1kbHowtorepresentthisrelaaonship?
![Page 18: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/18.jpg)
Datarepresenta8on
• BothBEDandGFFfilescanrepresentthem
• TwocommonversionsofGFFàGTF2andGFF3(note:tooldocumenta.oncano/enwrongandshowsaweirdcombina.onofthesetwoformats)
• InGFFthecontentoftheATTRIBUTE(9th)columnspecifiestherela8onshipbetweenfeatures
![Page 19: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/19.jpg)
GTF/GFFformatsGTFaXributes:
– gene_idvalue;agloballyuniqueiden.fierforthegenomicsourceofthetranscript
– transcript_idvalueagloballyuniqueiden.fierforthepredictedtranscript.
gene_id“G1”transcript_id“T1”
GFFaXributes:
ID=exon1;Parent=T1
SeetheGFF3siteforexactspecifica8onofthethesemean.
Important:Morethanoneparentmaybelisted!
![Page 20: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/20.jpg)
ExampleintervalasGTF
Adis8nctlineisenteredforeachexon,repeatedforeachtranscript
![Page 21: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/21.jpg)
ExampleintervalasGFF3
Thesameexonmaybepartofdifferenttranscripts(parents)
![Page 22: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/22.jpg)
ExampleintervalinBED
FromtheBEDformatspecifica8on
![Page 23: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/23.jpg)
VisualizinginIGV
![Page 24: Week 13, Lecture 25 · Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinformacs Consul8ng Center Penn State Genome representaon concepts • At the simplest](https://reader036.vdocuments.us/reader036/viewer/2022090317/5fc4a221090d5633ac57be91/html5/thumbnails/24.jpg)
Homework25
• CreateandvisualizeinIGVanintervalfilethatcontainsthreesplicevariantsofa1kblonggenewith5exons.
• Showthefileandascreenshot