bioinformatics tools in context how informatics improves research every day adelaide fletcher, mlis...

67
Bioinformatics Tools in Context How Informatics Improves Research Every Day Adelaide Fletcher, MLIS Tzu L. Phang Ph.D. July 27, 2012

Upload: imogene-franklin

Post on 28-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Bioinformatics Tools in Context

How Informatics Improves Research Every Day

Adelaide Fletcher, MLIS

Tzu L. Phang Ph.D.

July 27, 2012

2

True or False?

You have to be a collaborator on someone’s clinical trial to make discoveries with their genetic data...

3

• Stanford School of Medicine's Atul Butte identified a new drug target for diabetes by downloading data from 130 gene-expression studies in mice, rats, and humans that were done by other researchers and doing a meta-analysis to look for a common link

• wet lab experiments are more for validating hypotheses than making discoveries

7

Meet Our Hero...

• Name: Hunter

• Research Interests: The role of mammary epithelial cells in breast cancer

• Goal: Develop a genetic drug tarFget for breast cancer

• Post-grad experience: < 1 year

• Funding: $0

7

8

Where should he start?

• A. Ask for $$!• B. Do a lit search• C. Try to find free genetic data

8

9

Finding out what’s known

• Google Scholar - http://scholar.google.com

• Web Of Science - http://isiknowledge.com/WOS– (

http://hsl-ezproxy.ucdenver.edu/login?url=http://isiknowledge.com/WOS)

9

10

Google Scholar

• http://scholar.google.com - search “mammary epithelial cells” 10

11

What’s this? Free data?

12

Follow the path of serendipity

“Data”

13

Now that we’ve found “Data” what are we going to “Tzu”?

http://cctsi.ucdenver.edu/RIIC

GEO (Gene Expression Omnibus)http://www.ncbi.nlm.nih.gov/geo/

As of July 19, 2012

Using GEO as an example

• Naming schemes:

GPLGSMGSEGDS

GPL (Geo PLatform)

• Describe list of elements in the array– cDNAs, oligonucleotide probesets, ORFs,

antibodies)

• Each platform is assigned a unique and stable GEO accession number (GPLxxx)

• Example:– GPL570: Affymetrix GeneChip Human

Genome U133 Plus 2.0 Array

GSM (Geo SaMple)

• Describe the conditions under which an individual Sample was handled, the manipulation it underwent, and the abundance measurement of each element derived from it!

• A Sample entity must reference only one Platform and may be included in multiple Series

• Example: GSM300166 (remember HW 2??!) PostcentralGyrus_female_91yrs_indiv10

GSE (Geo SEries)

• Defines a set of related Samples considered to be part of a group

• Provide a focal point and description of the experiment as a whole

• Example:

Let’s look at an example

• Goto the GEO site• Under “GEO accession”, type:

– GSE11882

• Find these terms:– GPL– GSM– GSE

GDS (Geo DataSet)

• Curated sets of GEO Sample data• Represents a collection of biologically

and statistically comparable GEO Samples– Same platform– Shared common set of probe elements– Samples’ intensities calculated in an

equivalent manner (background correction, normalization, etc)

• Example: GSD200 (see next page)

What can you do in GEO?

Clustering Analysis

Class Comparison Analysis

Gene Expression Profile

Let’s import the dataset

• GDS2789

What’s wrong with the approach?

• Only show one gene at a time• Hard to select a gene set for

downstream analysis such as clustering

• Hard to output a gene list.

BRB-ArrayTools

Free, open-source softwareMicrosoft Excel plug-in Only works on Windows platform Imposed by all Excel limitations

http://linus.nci.nih.gov/BRB-ArrayTools.html

BRB-ArrayTools• Biometric Research Branch (BRB)

– Statistical/biomathematical component – Division of Cancer Treatment and Diagnosis (NCI)

• Richard Simon & BRB-ArrayTools Development Team

• BRB ArrayTools– Visualization and statistical analysis of DNA microarray

gene expression data– Developed by statisticians – Excel add-in– Analytic/visualization tools: R statistical system, C and

Fortran programs, Java applications.– Visual Basic for Applications integrates components

Objectives

• “provide scientists with software … without requiring them to learn a programming language”

• “encapsulate into software the experience of professional statisticians”

• “facilitate education of scientists in statistical methods for the analysis of DNA microarray data”

Installing BRB-ArrayTools

• Windows 98/2000/NT/XP/Vista/7 • Loads package as add-in to Microsoft

Excel– Excel 2000 or later– Creates ArrayTools menu on Excel menu

bar

• Intensive computations performed in R or compiled programs

Installation

• Go to “http://linus.nci.nih.gov/BRB-ArrayTools.html”• Click on “All required components in ONE file”

Installation

• Click on “Download Standard Version 3.7.1 (All in one file)”• When prompted, enter User name and Password

(these will be sent to you after your FREE registration)

DemonstrationDemonstration

Installation

• Follow the step-by-step procedures• In the interest of time, the software has already

been installed on your machine

DemonstrationDemonstration

Excel 2007: Security Setting

Now, a video demo ….

http://david.abcc.ncifcrf.gov/home.jsp1

2

3

4

5

42

A quick recap...

43

List of 220 or so genes with potential indications for treatment or further understanding of Breast

Cancer pathways

44

List of 220 or so genes with potential indications for treatment or further understanding of Breast

Cancer pathways

List of 6 or so genes with a shared biological pathway (transcription factor activity)

45

Do these genes have a CA connection?

• In NCBI GENE search: “(TBX6 OR ZNF423 OR NR4A3 OR SCAND2 OR CEBPE OR SIX2) AND Cancer”

45

46

NCBI Gene - a 1 stop shop

47

All Roads Lead to GENE

48

Browsing Genes and Genomes

• NCBI • Ensembl• UCSC Genome Browser

– Which one to use?• http://cctsi.ucdenver.edu/RIIC/Pages/

TranslationalInformaticsVideos.aspx#GenomeBrowsers

– A full day of Ensembl training: http://hsl2.ucdenver.edu/ensembl/

48

49

BLASTing

• To what gene does this nucleotide sequence most likely belong?

• gggtgaacag ccgcacggga gtaggtacgc acctgacctc gctggcactg ccgggcaagg cagagggtgt ggcgtcgctc accagccagt gcagctacag cagcaccatc gtccatgtgg gagacaagaa gccgcagccg gagttagaga tggtggaaga tgctgcgagt gggccagaat

• http://blast.ncbi.nlm.nih.gov/Blast.cgi

• http://www.ensembl.org/Danio_rerio/blastview

• http://genome.ucsc.edu/cgi-bin/hgBlat?command=start

50

BLASTing

• What about this one?

• acatttgctt ctgacacaac tgtgttcact agcaacctca aacagacacc atggtgcacc tgactcctga ggagaagtct gcggttactg ccctgtgggg caaggtgaac gtggatgaag ttggtggtga ggccctgggc aggctgctgg tggtctaccc ttggacccag aggttctttg agtcctttgg ggatctgtcc actcctgatg cagttatggg caaccctaag gtgaaggctc atggcaagaa agtgctcggt gcctttagtg atggcctggc tcacctggac aacctcaagg gcacctttgc cacactgagt gagctgcact gtgacaagct gcacgtggat cctgagaact tcaggctcct gggcaacgtg ctggtctgtg tgctggccca tcactttggc aaagaattca ccccaccagt gcaggctgcc tatcagaaag tggtggctgg tgtggctaat gccctggccc acaagtatca ctaagctcgc tttcttgctg tccaatttct attaaaggtt cctttgttcc ctaagtccaa ctactaaact gggggatatt atgaagggcc ttgagcatct ggattctgcc taataaaaaa catttatttt

51

Genetics in Literature

• What does this Sequence:

• ATTAAAGATGATTTTTACAGTCAATGAGCCACGTCAGGGAGCGATGGCACCCGCAGGCGGTATCAACTGATGCAAGTGTTCAAGCGAATCTCAACTCGTTTTTTCCGGTGACTCATTCCCGGCCCTGCTTGGCAGCGCTGCACCCTTTAACTTAAACCTCGGCCGGCCGCCCGCCGGGGGCACAGAGTGTGCGCCGGGCCGCGCGGCAATTGGTCCCCGCGCCGACCTCCGCCCGCGAGCGCCGCCGCTTCCCTTCCCCGCCCCGCGTCCCTCCCCCTCGGCCCCGCGCGTCGCCTGTCCTCCGAGCCAGTCGCTGACAGCCGCGGCGCCGCGAGCTTCTCCTCTCCTCACGACCGAGGCAGGTAAACGCCCGGGGTGGGAGGAACGCGGGCGGGGGCAGGGGAGCCGCGGGGGCCGAGTGAGGACCCCGGGCCTCGGGTCCCAGGCGCAAGGGTGCCCGGCCGGGCGGGGTCGGGACCCCAGTGAGGAGGGGCCGGGGGCTGCCCCGCGGGCGCGTGACGCGTCTCGGGCCTGCCCGGCTGCGCTGGTCTCCGCTCGGGTGAGGCGGCTTGGCTTCGCTTTTCAGGTTAGGAAAGCTCCCTTTACTGCGCGTTGGGGGGCTGGGGGAGCTGGCGGAGCCCCGTTAGGGAGGTCGGTGGCGCCGGGGTGTCTCAGCGCCCCCTGCACCCCGCGCGGGTCCGGCCCAGCGGGCGATCGCTGGCGCCCAGGGAACTCCGGGAGGGCCGCCAGCGGGCTCCGCAGGGCGCGGGGCGGGGAGGGGCGCCTGGGGGCCGCGGGGCTCGCGCTCCCCGCCCGTTGGCCGCCCCTCGGAGGCCGAGATCGGGGCCCAGAACGCCCCTTGGCAAGGCCTGGCGCTTCCGCGATGCCCAGAGGGTGCTTGGGGGGATGGAGAGAGGGGCGCCCGCCGGGGGAGTTCCGGGAGCCTCGGTGCCTCCCGCCGCAGCTGCAGCGTTCCTCCCGGGAGGCGGCCCAGCCCTTCATCCTCGCCGCCTGAGCTTCTCCGAGGGGGGCTGCAGCCTTGCGGCCGTTGCCACCGCCTGGAGAAGCGGCCCACGCGGACTGACGGGCGGGGGCGGGGCCTCGGGCCTCGGCGGGGGCGGGGTCCGGGGAGGCCCCACCCTCTGTTCTCCAGGGGCGGGGAGAGAGGAGCTGCAGGTCTGCGGCCTGGC

• Have to do with this book?http://www.amazon.com/The-Family-That-Couldnt-Sleep/dp/1400062454

52

Oh yeah, him

Phylogenetics• Scientific procedure to reconstruct the

evolutionary history of organism or sequences• Evolutionary theory: groups of similar organisms

are descended from common ancestor.• Cladistics:

– Developed by Will Hennig, German entomologist (1950)

– Phylogenetic systematics: a mathematical approach

– Method of taxonomic classification of organism based on their evolution 

• So, why do we study phylogenetics?

What can Phylogenetic tell you?

• Discovering the function of a gene– Is your gene of interest orthologous to

another well-characterized gene from another species

• Retracing the origin of a gene– Most genes travel together through

evolutionary time.– Determine if genes undergo genomic

modification such as mutation, deletion, duplication, speciation, loss and gain of function, inactivation and etc.

DNA; a good measurement

• Advantages over morphological taxonomic characters:– Character states are unambigous– Large number of characters can be used

to perform the analysis.

Using clustalw: www.ebi.ac.uk/clustalw

Now, a video demo …

62

Find collaborators

• Colorado Profiles:http://profiles.ucdenver.edu/Search.aspx – Search: “mammary epithelial cells”

• Colorado Translational Informatics Community on Facebook: http://www.facebook.com/pages/Colorado-Translational-Informatics-Community/136023206424789

62

63

Get Informatics Help

• http://cctsi.ucdenver.edu/RIIC – 5 x 5 Videos– Find informatics experts – Monthly podcast– SeDLAC (Secondary Database Library

and Analysis Center)– Consultation and Data Analysis

63

64

Get $$

• NLM Professional Development Repository: http://cnx.org/content/m37008/latest/

• CCTSI Funding: http://cctsi.ucdenver.edu/Funding/Pages/default.aspx

• UC Denver Office of Grants and Contracts: http://www.ucdenver.edu/academics/research/AboutUs/GrantsContractsOffice/Pages/default.aspx

64

65

Find a Journal to Publish Findings

• http://www.biosemantics.org/jane/ - Example Search:

• “cDNA microarrays and a clustering algorithm were used to identify patterns of gene expression in human mammary epithelial cells growing in culture and in primary human breast tumors. Clusters of coexpressed genes identified through manipulations of mammary epithelial cells in vitro also showed consistent patterns of variation in expression among breast tumor samples. By using immunohistochemistry with antibodies against proteins encoded by a particular gene in a cluster, the identity of the cell type within the tumor specimen that contributed the observed gene expression pattern could be determined. Clusters of genes with coherent expression patterns in cultured cells and in the breast tumors samples could be related to specific features of biological variation among the samples. Two such clusters were found to have patterns that correlated with variation in cell proliferation rates and with activation of the IFN-regulated signal transduction pathway, respectively. Clusters of genes expressed by stromal cells and lymphocytes in the breast tumors also were identified in this analysis. These results support the feasibility and usefulness of this systematic approach to studying variation in gene expression patterns in human cancers as a means to dissect and classify solid tumors.”

65

66

Get Informatics Help!

• http://cctsi.ucdenver.edu/RIIC – 5 x 5 Videos– Find informatics experts – Monthly podcast– SeDLAC (Secondary Database Library

and Analysis Center)– Consultation and Data Analysis

67

Thank You!

• Tzu Phang, Ph.D. – [email protected]

• Addie Fletcher, MLIS– [email protected]

67