intro to biocoductor and genegraph by kyrylo bessonov (kbessonov@ulg.ac.be) 9 oct 2012

Post on 16-Jan-2016

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Intro toBiocoductor and GeneGraph

by Kyrylo Bessonov

(kbessonov@ulg.ac.be)

9 Oct 2012

Bioconductor repository

• Is the repository with extensions and libraries for R-language found at http://www.bioconductor.org/

• Bioconductor libraries cover• Micorarray analysis• Genetic variants analysis (SNPs)• Sequence analysis (FASTA, RNA-seq)• Annotation (pathways, genes)• High-troughput assays (Mass Spec)

• All libraries are free to use and contain documentation (i.e. vignettes)• Vignettes are short and require some previous

knowledge of R and/or other defendant libraries

2

Configuring R for Bioconductor

• To install Bioconductor libraries the R environment needs to be configuredsource("http://bioconductor.org/biocLite.R")

• To download any Bioconductor library use biocLite("package_name") functionbiocLite("GenomeGraphs")

• Load the downloaded library functions via the library("lib_name")functionlibrary("GenomeGraphs")

• Read help file(s) via the ?? or browseVignettes()??library_name (e.g. ??GenomeGraphs)

browseVignettes(package = "GenomeGraphs")

3

Genome Data Display and Plotting

The GenomeGraphs libraryAuthors: Steen Durinck and James Bullard

(Bioconductor library)

Intro to GenomeGraphs library

• Allows to retrieve data from• Ensembl comprehensive DB on genomes

• intron/exon locations• sequence genetic variation data• protein properties (pI, domains, motifs)

• GenomeGraphs Bioc library allows to display data on:• Gene Expression (expression / location)• Comparative genomic hybridization (CGH) • Sequencing (e.g. variations/introns/exons)

5

GenomeGraphs: Plotting with gdPlot

• gdPlot main plotting functiongdPlot(gdObjects, minBase, maxBase ...)

•gdObjects = any objects created by GenomeGraphsa) BaseTrack; b) Gene; c) GenericArray; d) RectangleOverlay

•minBase the lowest nt position to be plotted (optional)•maxBase the largest nt to display (optional)

• Objects are data structures that hold/organize a set of variables1) Create an object to plot using makeBaseTrack() functionObjBaseT = makeBaseTrack(1:100, rnorm(1:100),strand = "+")

2) Plot the newly created objectgdPlot(ObjBaseT)

• Display object structure / properties (e.g. of BaseTrack)attributes(ObjBaseT)

•$ sign next to the name represents object variables

• Change or view individual object variablesattr(ObjBaseT, “strand")

6

Manipulating GenomeGraphs class graphical properties

• Changing values of variables (classical way)attr(Object, "variable/parameter") = value

attr(ObjBaseT, "strand") = "-“

•If an array, to access individual elements use [element number 1..n] attr(ObjBaseT, "variable/parameter")[number] = new_value

attr(ObjBaseT, "base")[1] = 0

• Changing graphical parameters such as colorshowDisplayOptions(ObjBaseT)alpha = 1 lty = solid color = orange lwd = 1 size = 5 type = p getPar(ObjBaseT, "color")[1] "orange"setPar(ObjBaseT, "color", "blue")

7

gdPlot composite/group plots

• gdPlot(…) can take any number of gdObjects to plot• Let’s plot two BaseTrack objects a and b

a = makeBaseTrack(1:100, rlnorm(100), strand = "+")

b = makeBaseTrack(1:100, rnorm(100), strand = "-")

ab=list(a,b)

gdplot(ab)

8

Manipulating values of the grouped gdObjects

• ab is list of objects a and b• To access individual object within the list use [ ]

ab[1] will display object a

ab[2] will display object b• To modify the grouped object use double [[ ]]

•To change color of b to red getPar(ab[[2]], "color")

setPar(ab[[2]], "color", "red") -OR- (re-create object)

b = makeBaseTrack(1:100,

rnorm(100), strand = "-",

dp=DisplayPars(color="red"))

ab=list(a,b)

gdPlot(ab)9

Plotting with labels and legend

• gdPlot does not provide functions to label axis• Trick = “use labeled / tagged” objects

"label" = GenomeGraph object

"+ strand"= makeBaseTrack(1:100, rlnorm(100), strand = "+")

• To display legend use makeLegend("text","color") ab=list("+"=a, "-"=b, makeGenomeAxis(),

makeLegend(c("+", "-"),c("orange", "red")) )

10

Retrieving and Displaying Data from

Public Database

Combining capabilities of

biomaRt and GenomeGraph libraries

Welcome to biomaRt• biomaRt library allows to retrieve data from public DBs

ensembl ENSEMBL GENES 68 (SANGER UK)

snp ENSEMBL VARIATION 68 (SANGER UK)

unimart UNIPROT (EBI UK)

bacteria_mart_14 ENSEMBL BACTERIA 14 (EBI UK)

***Use listMarts() to see all available databases***

• Let’s retrieve gene data of the Bacillus subtilis strain•useMart(database,dataset)allows to connect to specified database and dataset within this database

db=useMart("bacteria_mart_14")

listDatasets(db)

Dataset Description version

… … …

bac_6_gene Bacillus subtilis genes (EB 2 b_subtilis) EB 2 b_subtilis

db=useMart("bacteria_mart_14", "bac_6_gene")12

Exploring biomaRt object• listAttributes()shows all prop. of the biomaRt obj.

• The db object has total of 4175 genes• Use getBM(attribute, filter, value, biomaRt_obj)to

extract values belonging to specified attributes• attribute: general term such as gene

name/chromosome # / strand (+ or - )• filter: parameter applied on attribute such as

genomic region to consider (i.e. start and end in nt)• value: actual values of the applied filter(s)

getBM(c("external_gene_id", "description","start_position", "end_position", "strand"), filters = c("start", "end"),

values = list(1,10000), db)

ID description start(nt) end(nt) strand

metS Methionyl-tRNA synthetase 45633 47627 1

ftsH Cell division protease ftsH homolog 76984 78897 1

hslO 33 kDa chaperonin 79880 80755 1

DgkDeoxyguanosine kinase 23146 23769 -1 13

Plotting the selected “Genome Region”

• Create an object with makeGeneRegion()function

makeGeneRegion(start,end,chromosome name,

strand, biomaRt object, plotting options)

• Find notation used for the chromosome naminggetBM("chromosome_name","","",db)

chromosome_name

1 ChromosomegRegion = makeGeneRegion(1, 10000, chr = "Chromosome",

strand = "+", biomart = db, dp =DisplayPars(plotId = TRUE, idRotation = 90,

cex = 0.8, idColor = "black"))

gdPlot(list(gRegion, makeGenomeAxis(), makeTitle("Position(nt)",cex=3,"black",0.1)))

14

Bacillus subtilis genome region (intron / exon)

15

ensembl_gene_id

Mapping Expression data RNAseq

GenomeArray()

Intro into RNA-seq data

• HT sequencing technologies allow to sequence mRNA in a series of short contigs of 50-200 bp

• In addition to gene expression analysis it possible to• Map location of introns (UTRs) / exons • Principal:  one searches for a rapid changes in

abundance of the RNA-Seq signal (contigs)• Integration of sequence + expression information• Ugrappa Nagalakshmi et. al. 2008 had used this

strategy to accurately map yeast genome1 • Task: Display part of the seqDataEx dataset having both

• abundance of mRNA (cDNA) transcripts and• annotated yeast genome

171Ugrappa Nagalakshmi et. al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science, 2008

Plotting RNA-seq data with GenomeArray() • Need to get mRNA contig abundance data

data("seqDataEx")

rnaSeqAb=seqDataEx$david

• Create GeneticArray object with makeGenericArray(intensity, probeStart, probeEnd, trackOverlay, dp = NULL)

•intensity either microaray or RNAseq transcript abundance signal•probStart start position of the probe (location in nt)

• Plot contig abundance w.r.t. to genomic locationgdPlot(list("abun"=makeGenericArray(rnaSeqAb[, "expr", drop = FALSE], rnaSeqAb [, "location"]) , makeGenomeAxis() ) )

18

Adding genomic annotation to the prev plot

• Need to get annotated yeast genome dataannotGen = useMart("ensembl", "scerevisiae_gene_ensembl")

• Create mRNA contig abundancemRNAabun = makeGenericArray(rnaSeqAb[, "expr",

drop = FALSE], rnaSeqAb [, "location"])

• Create annotated seq. covering the mRNA contigs locationannotSeq= makeGeneRegion(start = min(rnaSeqAb[, "location"]),

end = max(rnaSeqAb[, "location"]), chr = "IV",

strand = "+", biomart = annotGen,

dp = DisplayPars(plotId = TRUE, idRotation = 0,

cex = 0.85, idColor = "black", size=0.5))

• Combine objects and plot themgdPlot( list("abund" = mRNAabun,makeGenomeAxis(), "+" = annotSeq),

1299000,1312000)19

Resulting plot:Transcript abundance w.r.t. to location

20

Overlays of Basic Shapes and Custom Text

• To overlay rectangle use makeRectangleOverlay() makeRectangleOverlay(start, end, region = NULL,

coords = c("genomic", "absolute"), dp)

rectOver = makeRectangleOverlay( 1301500, 1302200, region=c(1,2), "genomic", DisplayPars(alpha = 0.5))

• To overlay text use makeTextOverlay()tOver = makeTextOverlay("Ribosomal Large subunit",

1302000, 0.95, region = c(1,1),

dp = DisplayPars(color = "red"))

• Combine all overlay objects into one vector with c(v1,v2)gdPlot( list("abund" = mRNAabun,makeGenomeAxis(), "+" = annotSeq), overlays=c(rectOver, tOver) )

21

Overlay of rectangle and text

22

Alternative splicing of transcript

makeTranscript(id, type, biomart, dp = NULL)

Displaying alternative splicing of a gene• mRNA coming from the same ORF could be spliced in

many ways • E.g. case of VDR genes of IgG• Given biomaRt object, the makeTranscript()will

extract splicing information for given id (i.e. gene)• Download human genome databse hGenome <- useMart("ensembl", "hsapiens_gene_ensembl")

• Select Ensembl ID to look at head(getBM(c("ensembl_gene_id", "description"),"","", hGenome))

• Get splicing data from biomaRt object (hGenome)spliceObj = makeTranscript("ENSG00000168309",

"ensembl_gene_id" ,hGenome)

• Plot the object with gdPlotgdPlot(list(makeTitle("Transcript ID:

ENSG00000168309"),splicingObj, makeGenomeAxis()) )

24

The final result

25

Conclusion

• GeneGraphs provides a wide range to plot genomic data• Can use external databases through biomaRt• Main useful features

• identifies exons/introns• allows to cross-reference expression / genome data• flexible albeit complex plotting capabilities• allows to overlay graphical objects and text• ability to create custom legends• annotation capabilities provided by powerful biomaRt

26

Thank you for your patience!&

Happy Bioconductor/R Exploration!

top related