introduction to analysis of genomic data using r...

11
- - : Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R Dr. Yen-Yi Ho ([email protected]) Jan 17, 2018 1/11

Upload: others

Post on 15-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R

- - :

Introduction to Analysis of Genomic Data Using RLab 2: Introduction to R

Dr. Yen-Yi Ho ([email protected])

Jan 17, 2018

1/11

Page 2: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R

- - :

2/11

Page 3: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R

- - :

R computing & graphics package

I R is a powerful, free statistical computing and graphicspackage.

I Popular with many researchers due to contributed packages:R functions to do specialized, advanced, & often complexstatistical analysis.

I R can also do many important, routine calculations, analysis,and provide common graphical displays used in this course.

I Installed in several of the computing labs across campus, e.g.Sloan 108 & 109, Gambrell 003.

I You can download it and install it from CRAN:http://cran.r-project.org/

3/11

Page 4: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R

- - :

R: Pros and Cons

Pros Cons+ Free - No dedicated support+ Available for all major - Complex Syntax

platforms+ Powerful graphics - Not point-and-click+ Comprehensive - No warranty+ Easy interface with other languages

(such as C, Fortran) - Relatively slow+ Well-designed programming

language (object-oriented)+ Unlimited extensibility+ Widely used by statisticians+ Increasingly used for genomic

analyses (Bioconductor)

4/11

Page 5: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R

- - :

Bioconductor: a collection of R packages for genomic dataanalysis

I Started by Robert Gentleman housing R packages for genomicdata analysis.

5/11

Page 6: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R

- - :

Bioconductor installation

I Use biocLite.R script

I Installing a specific package from Bioconductor:

source("http://www.bioconductor.org/biocLite.R")

biocLite("limma")

6/11

Page 7: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R

- - :

Online resources: genome browser and public datarepositories

I UCSC genome browser: host genomic annotation data formany species.

7/11

Page 8: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R

- - :

Public high-throughput data repositories

I GEO: Gene expression omnibus.I Funded by NCBII Host array- and sequencing-based data.

I ArrayExpression: European version of GEOI Better curated than GEO but has less data.

I SRA: sequence read archive.I Designed for hosting large scale high-throughput sequencing

data (high speed file transfer).

8/11

Page 9: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R

- - :

Other public data resources

I TCGA (The Cancer Genome Atlas)I Host data generated by TCGA, a big consortium to study

cancer genomics.I Huge collection of cancer related data: different types of

genomic, genetic and clinical data for many different types ofcancers.

I ICGC (International Cancer Genome Consortium): Similar toTCGA but have a larger collection of studies.

I ENCODE (the ENCyclopedia Of DNA Elements) datacoordination center

I Host data generated by ENCODE, a big consortium to studyfunctional elements of human genome.

I Rich collection of genomic and epigenomic data.

I Many others ...

9/11

Page 10: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R

- - :

Next Lab: R Topics Outline

I Get Started

I R as a calculator

I Vectors

I Matrices, Arrays, Factors, List, Data Frame

I Import/Export Data

I R Graphics

I Random number generating

I Writing R function

I for loops

I rep, seq, which, match

10/11

Page 11: Introduction to Analysis of Genomic Data Using R …people.stat.sc.edu/hoyen/PastTeaching/BIOL599-2018/...Introduction to Analysis of Genomic Data Using R Lab 2: Introduction to R

- - :

To do list after this class

I Review slides.

I Read WiKi for DNA, gene, genome, DNA microarray andDNA sequencing.

I Install R and Bioconductor on your computer.

I Start to learn R by reading Applied Statistics froBioinformatics Using R https://cran.r-project.org/doc/contrib/Krijnen-IntroBioInfStatistics.pdf

11/11