regioner an r/bioconductor package for the magement and comparision of genomic regions anna díez...
DESCRIPTION
regioneR Basic management of genomic regions Statistical evaluation Helper function making our lives easierTRANSCRIPT
![Page 1: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/1.jpg)
regioneRan R/Bioconductor package for the magement and comparision of genomic regions
Anna DíezBernat GelRoberto Malinverni
![Page 2: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/2.jpg)
regioneR aimsPractical to use. Easy to understand.
Generic and useful
Efficient
Customizable Something we would like to use
![Page 3: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/3.jpg)
regioneRBasic management of genomic regions
Statistical evaluation
Helper function making our lives easier
![Page 4: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/4.jpg)
The BasicsStatistics
Customization
Helper Functions
The Basics Statistics Customization Helper Functions
![Page 5: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/5.jpg)
The Basics Statistics Customization Helper Functions
THE BASICS
![Page 6: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/6.jpg)
joinRegions
The Basics Statistics Customization Helper Functions
Amin.dist
joinRegions(A, min.dist)
![Page 7: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/7.jpg)
subtractRegions
The Basics Statistics Customization Helper Functions
A
B
subtractRegions(A, B)
![Page 8: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/8.jpg)
splitRegions
The Basics Statistics Customization Helper Functions
A
B
splitRegions(A, B, min.size=1, track.original=TRUE)
![Page 9: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/9.jpg)
mergeRegions
The Basics Statistics Customization Helper Functions
commonRegions
extendRegions¿any other? flankingRegions? …
![Page 10: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/10.jpg)
overlapRegions
The Basics Statistics Customization Helper Functions
A
B
overlapRegions(A, B, colA, colB, type, min.bases, min.pctA, min.pctB, get.pctA, get.pctB, get.bases, only.boolean, only.count, ...)
![Page 11: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/11.jpg)
overlapRegions
The Basics Statistics Customization Helper Functions
![Page 12: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/12.jpg)
Example: annotateRegions
The Basics Statistics Customization Helper Functions
regAnnotation(regions, annot.tab, ann.names, strands, descr, peak.point, gap3,
gap5)
![Page 13: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/13.jpg)
The Basics Statistics Customization Helper Functions
STATISTICS
![Page 14: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/14.jpg)
overlapPermTest
The Basics Statistics Customization Helper Functions
A
BB
![Page 15: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/15.jpg)
overlapPermTest
The Basics Statistics Customization Helper Functions
A
B
B’4
4
3
5
4
5
2
4
0.33
1
![Page 16: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/16.jpg)
overlapPermTest
The Basics Statistics Customization Helper Functions
![Page 17: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/17.jpg)
Example: TIs
The Basics Statistics Customization Helper Functions
TIs over: 81TIs under 66
SCNA gain: 60SCNA losses: 53
![Page 18: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/18.jpg)
The Basics Statistics Customization Helper Functions
Number of permutations: 1000Alternative: greaterEvaluation of the original region set: 81Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 37.00 47.00 50.00 50.43 53.00 67.00 Standard score: 6.8117P-value: 0.000999000999000999 ***--- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
overlapPermTest(TIs_over, SCNA.gains, alternative="g“, genome=“hg19”, ntimes=1000)
Gains vs Overexpression
~800s (~13min)
![Page 19: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/19.jpg)
The Basics Statistics Customization Helper Functions
Number of permutations: 1000Alternative: greaterEvaluation of the original region set: 66Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 37.00 48.00 50.00 50.18 53.00 60.00 Standard score: 4.4942P-value: 0.000999000999000999 ***--- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
overlapPermTest(Tis_under, SCNA.losses, alternative="g“, genome=“hg19”, ntimes=1000)
Losses vs Underexpression
![Page 20: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/20.jpg)
The Basics Statistics Customization Helper Functions
Number of permutations: 1000Alternative: greaterEvaluation of the original region set: 25Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 30.00 39.00 41.00 41.25 44.00 52.00 Standard score: -4.2739P-value: 1 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
overlapPermTest(TIs_under, SCNA.gains, alternative="g“, genome=“hg19”, ntimes=1000)
Gains vs Underexpression
recomputePermTest(gains.under, alternative="l")
![Page 21: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/21.jpg)
The Basics Statistics Customization Helper Functions
Number of permutations: 1000Alternative: lessEvaluation of the original region set: 25Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 30.00 39.00 41.00 41.25 44.00 52.00 Standard score: -4.2739P-value: 0.000999000999000999 ***
overlapPermTest(TIs_under, SCNA.gains, alternative=“l“, genome=“hg19”, ntimes=1000)
Gains vs Underexpression
![Page 22: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/22.jpg)
The Basics Statistics Customization Helper Functions
overlapPermTest(10KrandomA, 10KrandomB, alternative=“g“, genome=“hg19”, ntimes=1000)
Random Region Sets
Number of permutations: 1000Alternative: greaterEvaluation of the original region set: 68Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 42.00 57.00 62.00 62.16 67.00 89.00 Standard score: 0.7488P-value: 0.215784215784216 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
~1850s (~30min) (Single core)
~800s (~13min) (Parallel 4 cores)
![Page 23: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/23.jpg)
permTest
The Basics Statistics Customization Helper Functions
overlapPermTestoverlap
randomRegions
distance
resampling
value of a function
![Page 24: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/24.jpg)
permTest
The Basics Statistics Customization Helper Functions
permTest(A, ntimes=1000, randomize.function, evaluate.function, alternative, min.parallel=1000, force.parallel=NULL, ...)
overlapPermTest <- permTest(A, randomize.function=randomizeRegions, evaluate.function=countOverlaps)
![Page 25: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/25.jpg)
Example: Genes & ALUs
The Basics Statistics Customization Helper Functions
1.175.329 ALUs 9.111 overexpressed genes51.796 genes
¿Are overexpressed genes closer to ALUs than expected by chance?
![Page 26: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/26.jpg)
Example: Genes & ALUs
The Basics Statistics Customization Helper Functions
Resampling
¿Are overexpressed genes closer to ALUs than expected by chance?
Mean Distance
permTest(A=expressed, B=alus, ntimes=1000, randomize.function=resampleRegions, universe=genes2, evaluate.function=meanDistance, alternative="less")
![Page 27: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/27.jpg)
Example: Genes & ALUs
The Basics Statistics Customization Helper Functions
¿Are overexpressed genes closer to ALUs than expected by chance?
Number of permutations: 1000 Alternative: less Evaluation of the original region set: 353.371858193393 Summary of the evaluation of the permuted region set: Min. 1st Qu. Median Mean 3rd Qu. Max. 912.1 992.8 1010.0 1011.0 1028.0 1095.0 Standard score: -25.0275 P-value: 0.000999000999000999 ***
![Page 28: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/28.jpg)
The Basics Statistics Customization Helper Functions
CUSTOMIZATION
![Page 29: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/29.jpg)
The Basics Statistics Customization Helper Functions
countOverlapsmeanDistancemeanInRegions
Available functions
randomizeRegionsresampleRegions
Evaluation Randomization
GC content TF binding sites Encode classification …
GC aware randomization …
![Page 30: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/30.jpg)
The Basics Statistics Customization Helper Functions
Custom functions
randomize.function(A,...)
Randomization
resampleRegions <- function(A, universe, ...) { resample <- universe[sample(1:length(universe), length(A))] return(resample) }
![Page 31: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/31.jpg)
The Basics Statistics Customization Helper Functions
Custom functions
evaluate.function(A,...)
Evaluation
meanDistance <- function(A, B, ...) {d <- distanceToNearest(A, B, ...)
return(mean(as.matrix(d@elementMetadata)[,1])) }
![Page 32: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/32.jpg)
The Basics Statistics Customization Helper Functions
HELPERFUNCTIONS
![Page 33: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/33.jpg)
The Basics Statistics Customization Helper Functions
toGRanges & toDataframe
chr start end chr1 2000 4000 chr1 5000 5500 chr1 10000 12000
GRanges with 3 ranges and 0 elementMetadata values seqnames ranges strand | <Rle> <IRanges> <Rle> | [1] chr1 [ 2000, 4000] * | [2] chr1 [ 5000, 5500] * | [3] chr1 [10000, 12000] * |
Seqlengths chr1 NA
![Page 34: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/34.jpg)
The Basics Statistics Customization Helper Functions
Genomes & MasksgetGenome(genome)
getMask(genome)
getGenomeAndMask(genome, mask)
characterToBSGenome(genome.id)
maskFromBSGenome(bsgenome)
emptyCache()
![Page 35: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/35.jpg)
The Basics Statistics Customization Helper Functions
RandomizationrandomizeRegions(A, genome="hg19", mask=NULL, non.overlapping=FALSE, per.chromosome=FALSE, ...)
createRandomRegions(nregions=100, length.mean=250, length.sd=20, genome="hg19", mask=NULL, non.overlapping=FALSE)
resampleRegions(A, univers, per.chromosome=FALSE, ...)
![Page 36: RegioneR an R/Bioconductor package for the magement and comparision of genomic regions Anna Díez Bernat Gel Roberto Malinverni](https://reader036.vdocuments.us/reader036/viewer/2022062600/5a4d1b367f8b9ab05999d095/html5/thumbnails/36.jpg)
Aaaaaalmost finished: Anyone with experience in packaging for Bioconductor?
Suggestions? Requests? Improvements?
Beta Testers Wanted