introduction to biocomputing biology in silico 3 rd february 2010

77
Introduction to BioComputing Biology in silico 3 rd February 2010 Carrie Iwema, PhD, MLS Molecular Biology Information Specialist Health Sciences Library System University of Pittsburgh [email protected] http://www.hsls.pitt.edu/guides/ge netics

Upload: braden

Post on 24-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Introduction to BioComputing Biology in silico 3 rd February 2010. Carrie Iwema , PhD, MLS Molecular Biology Information Specialist Health Sciences Library System University of Pittsburgh [email protected] http://www.hsls.pitt.edu/guides/genetics. General Topics. Information Overload - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Introduction to BioComputingBiology in silico3rd February 2010

Carrie Iwema, PhD, MLSMolecular Biology Information SpecialistHealth Sciences Library SystemUniversity of [email protected]

http://www.hsls.pitt.edu/guides/genetics

Page 2: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

General Topics

Information Overload

Genome Gene Protein

http://www.hsls.pitt.edu/guides/genetics

Page 3: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Specific Topics Information Overload

PubMed Alternatives to PubMed

GoPubMed Novoseek PubGet

Molecular Databases HSLS Molecular Biology Information Service

Genome Gene Protein Genome Biology Genome Browsers

UCSC Genome Browser NCBI MapViewer

Entrez Gene UniProt

http://www.hsls.pitt.edu/guides/genetics

Page 4: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Information Overload

209K• Breast

Cancer

84K• Colon

Cancer

52K • p53

4K • STAT1

5,394 Journals

http://www.hsls.pitt.edu/guides/genetics

1.3 billionsearches in 2009

Page 6: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Growth of Molecular Databases

Source: Nodal Point Blog

2008: 1075

http://www.hsls.pitt.edu/guides/genetics

2009: 1170

2010: 1230

Page 7: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Molecular Databases Nucleic Acids Research: Oxford Journals

Annual Database Issue Annual Web Server Issue

Journals Bioinformatics: Oxford Journals BMC Bioinformatics: BioMed Central Database: Oxford Journals *new in 2009*

Articles on “genetic databases” PubMed: 21,851 results MeSH: 16,398 results

http://www.hsls.pitt.edu/guides/genetics

Page 8: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

HSLS Molecular Biology Information Service

Workshops

Website

Software Licensing

Bioinformatics Consultations

http://www.hsls.pitt.edu/guides/genetics

Page 9: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

HSLS OBRC

http://www.hsls.pitt.edu/guides/genetics

Page 10: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

HSLS OBRC in Science

HSLS OBRC

2441 links to databases

and software

~3000hits/day

http://www.hsls.pitt.edu/guides/genetics

Page 11: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

search.HSLS.MolBio Integrated search system

Databases & Software Articles on Databases & Software Genes/Proteins Pathways Protocols Videos Recommended Articles

Tabbed browsing Clustered search results

http://www.hsls.pitt.edu/guides/genetics

Page 12: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Hands-on exercises Locate databases on

Natural antisense, UTR, copy number variation

Retrieve gene information for Your favorite gene, BRCA1, STAT1

Find a suitable protocol for Methylation PCR, in situ hybridization, primer design

Identify videos on Protein structure prediction, human genome project

http://www.hsls.pitt.edu/guides/genetics

Page 13: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Genome Biology

http://www.hsls.pitt.edu/guides/genetics

Page 14: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

From Cell to Gene

Human Genome Project Video

http://www.hsls.pitt.edu/guides/genetics

Page 15: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Genome Biology Time Line

1976

RNA Bacteriophage MS2

2001

Human Genome Draft Seq

2003

Published Complete Human Ref Genome

2007

Diploid Genome seq ofan Individual Human

2010

Published Complete Genomes: 1191 organisms

1995

HaemophilusInfluenza

Human Genome Project Video

2008

Jim Watson Genome

http://www.hsls.pitt.edu/guides/genetics

Page 16: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Genome Resources

NCBI: Genomes Resources : Link

Genome Project Genome: 6108 species

Genomes OnLine Database (GOLD): Link

JGI: Integrated Microbial Genomes: Link

http://www.hsls.pitt.edu/guides/genetics

Page 17: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

NCBI Genome Resources

http://www.hsls.pitt.edu/guides/genetics

Page 18: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Practice Question: Query: Check the status of genome sequencing for

an organism, such as rabbit.

Answer: Pick an organism or metagenome project name. Search the Genome Project database. To get the most precise

results specify the organism field when searching with an organism name, for example: human[orgn].

Click on the desired Genome Project if more than one result. The Genome Project summary page will provide information of

available projects and sequencing status.

http://www.hsls.pitt.edu/guides/genetics

Page 19: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

NCBI Genome Project A collection of complete and in-progress large-scale sequencing,

assembly, annotation, and mapping projects for cellular organisms. The database is organized into organism-specific overviews that function as portals for browsing and retrieving projects pertaining to each organism.

CLICKRabbit

http://www.hsls.pitt.edu/guides/genetics

Page 20: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

NCBI Genome Project : Rabbit Genome

http://www.hsls.pitt.edu/guides/genetics

Page 21: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

NCBI Genome Project : Rabbit Genome

http://www.hsls.pitt.edu/guides/genetics

Page 22: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

NCBI Entrez Genome:

http://www.hsls.pitt.edu/guides/genetics

Page 23: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Genomes Online Database (GOLD) http://genomesonline.org/index2.htm

Global resource for comprehensive access to information regarding complete and ongoing genome projects, metagenomes, and metadata.

“genome sequencing has come of age, and genomics will become central to microbiology's future. It may appear at the moment that the human genome is the main focus and primary goal of genome sequencing, but do not be deceived. The real justification in the long run, is microbial genomics”

Carl Woese, 1998http://www.hsls.pitt.edu/guides/genetics

Page 24: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Genome Browsers

http://www.hsls.pitt.edu/guides/genetics

Page 25: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Genome Browsers: What are they?

Genome Browsers enable researchers to visualize and browse entire 

genomes with annotated data including gene prediction and

structure, proteins, expression, regulation, variation, comparative

analysis, etc.

http://www.hsls.pitt.edu/guides/genetics

Page 26: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Genome Browsers The Big Three

NCBI MapViewer UCSC Genome Browser EBI Ensembl

Generic Genome Browser (Gbrowse) JBrowse (Ajax based like Google Map)

Display: Vertical

Display: Horizontal

http://www.hsls.pitt.edu/guides/genetics

Page 29: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser

http://www.hsls.pitt.edu/guides/genetics

Page 30: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Navigating the Human Genome

Browse the region of human chromosome 7 between bp 54,318,043 to 55,974,438

UCSC Genome Browser

http://www.hsls.pitt.edu/guides/genetics

Page 31: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region

Set up basic browser parameters

http://www.hsls.pitt.edu/guides/genetics

Page 32: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region

http://www.hsls.pitt.edu/guides/genetics

Page 33: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region

http://www.hsls.pitt.edu/guides/genetics

Page 34: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region

Start fresh

http://www.hsls.pitt.edu/guides/genetics

Page 35: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region

Browse the region of human chromosome 7 between bp 54,318,043 to 55,974,438

What genes are present in this region?

http://www.hsls.pitt.edu/guides/genetics

Page 36: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region

http://www.hsls.pitt.edu/guides/genetics

Page 37: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

NCBI sequence databases RefSeq

based on GenBank records; non-redundant, expert-verified databases of reference sequences Link

GenBank archival database of nucleotide sequences

from >160,000 organisms Link

http://www.hsls.pitt.edu/guides/genetics

Page 38: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

International Nucleotide Sequence Database Collaboration

http://www.hsls.pitt.edu/guides/genetics

Page 39: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Primary Vs Derivative databases

http://www.hsls.pitt.edu/guides/genetics

Page 40: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

RefSeq Scope & Accessions Genomic DNA

NC_123456 - complete genome, chromosome, plasmid NG_123456 - genomic region NT_123456 - genomic contig

mRNA NM_123456 Protein NP_123456

more about RefSeq scope and accessions...

http://www.hsls.pitt.edu/guides/genetics

Page 41: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region

http://www.hsls.pitt.edu/guides/genetics

Page 42: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region

Browse the region of human chromosome 7 between bp 54,318,043 to 55,974,438

Zoom in and display only the EGFR gene

http://www.hsls.pitt.edu/guides/genetics

Page 43: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region

Select the gene region from the “Scale” track to zoom in

http://www.hsls.pitt.edu/guides/genetics

Page 44: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region

Browse the region of human chromosome 7 between bp 54,318,043 to 55,974,438

Display all Single Nucleotide polymorphisms (SNPs) present in this gene

http://www.hsls.pitt.edu/guides/genetics

Page 45: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region

http://www.hsls.pitt.edu/guides/genetics

Page 46: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region

Browse the region of human chromosome 7 between bp 54,318,043 to 55,974,438

Retrieve the nucleotide sequence of this genomic region showing all exons in blue and SNPs in Red,

bold faced and underlined.

http://www.hsls.pitt.edu/guides/genetics

Page 47: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region

http://www.hsls.pitt.edu/guides/genetics

Page 48: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region: sequence view

http://www.hsls.pitt.edu/guides/genetics

Page 49: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region

Browse the region of human chromosome 7 between bp 54,318,043 to 55,974,438

Look in probable promoter region and see if there’s anything

interesting…

http://www.hsls.pitt.edu/guides/genetics

Page 50: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region

http://www.hsls.pitt.edu/guides/genetics

Zoom out

Page 51: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region

Browse the region of human chromosome 7 between bp 54,318,043 to 55,974,438

What transcription factors bind in this region?

http://www.hsls.pitt.edu/guides/genetics

Page 52: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: navigating a genomic region

http://www.hsls.pitt.edu/guides/genetics

Page 53: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Discovery Tool…

http://www.hsls.pitt.edu/guides/genetics

Page 54: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

NCBI MapViewer

http://www.hsls.pitt.edu/guides/genetics

Page 55: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

NCBI MapViewer How To: View/download features around an object or

between two objects on a chromosomeStarting with...CHROMOSOMAL COORDINATES

Begin on the Map Viewer home page. Click the "R" icon under Tools for the desired organism and build.

Select the chromosome, enter the coordinates in the From and To boxes, and click Go. Use either exact coordinates, e.g., 61551076, or values such as, 61M or 61551K.

If necessary, use the Maps & Options dialog box to change displayed maps; the maps and region displayed determine the data available.

Page 56: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Entrez Gene

http://www.hsls.pitt.edu/guides/genetics

Page 57: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Common Questions

What is its function?

What are its neighboring genes?

What is its genomic seq?How many splice varients are there?

What are its intron-exon architechure?

What diseases are associated with it?

Which tissues it expressed ?

How can I get its cDNA clone?

http://www.hsls.pitt.edu/guides/genetics

Page 58: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

SNP

Genomic Sequence

Expression Profile

Interacting Partners3D Structure

mRNA Sequence

Chromosomal Localization

Disease

Amino acid Sequence

Homologous Sequences

http://www.hsls.pitt.edu/guides/genetics

NCBI : Entrez Gene

Page 59: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Entrez GeneFind: gene symbols and aliases sequences: genomic, mRNA, protein intron-exon architecture genomic context: neighboring and

antisense genes interacting partners associated gene ontology terms:

function, cellular component and biological process

http://www.hsls.pitt.edu/guides/genetics

Page 60: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Entrez Gene

a searchable database of genes, from RefSeq genomes, and defined by sequence and/or located in the NCBI Map Viewer

each record represents a single gene from a given organism

http://www.hsls.pitt.edu/guides/genetics

Page 61: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Entrez Gene Sequences

mRNA Seq

Protein Seq

Genomic Seq

http://www.hsls.pitt.edu/guides/genetics

Page 62: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Entrez Gene Links

http://www.hsls.pitt.edu/guides/genetics

Page 63: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Gene Ontology (GO)

Controlled vocabulary tagging

Function Biological Processes Cellular Component

http://www.hsls.pitt.edu/guides/genetics

Page 64: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Entrez Gene: Gene Table

http://www.hsls.pitt.edu/guides/genetics

Introns/Exons

Page 65: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Try it!

Find mRNA sequence for your gene of interest

http://www.hsls.pitt.edu/guides/genetics

Page 66: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Find mRNA Sequence for Reelin Gene

http://www.hsls.pitt.edu/guides/genetics

Page 67: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

FASTA vs GenBank records

http://www.hsls.pitt.edu/guides/genetics

Page 69: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: find a gene in the genome

http://www.hsls.pitt.edu/guides/genetics

Page 70: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: find a gene in the genome

http://www.hsls.pitt.edu/guides/genetics

Page 71: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UCSC Genome Browser: find a gene in the genome

http://www.hsls.pitt.edu/guides/genetics

Page 72: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Bioinformatics Databases & Software Providers

NCBI Home page Site map Resource Guide

EBI Home page Databases Software

http://www.hsls.pitt.edu/guides/genetics

Page 73: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UniProt

http://www.hsls.pitt.edu/guides/genetics

Page 74: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UniProt

world's most comprehensive catalog of information on proteins

http://www.hsls.pitt.edu/guides/genetics

a central repository of protein sequence and function created by joining the information contained in Swiss-Prot, TrEMBL, and PIR

Page 75: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UniProt

http://www.hsls.pitt.edu/guides/genetics

Page 76: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

UniProt

http://www.hsls.pitt.edu/guides/genetics

Page 77: Introduction to  BioComputing Biology  in  silico 3 rd  February 2010

Thank you!Any questions?

Carrie Iwema Ansuman [email protected] [email protected] 412-383-6887 412-648-1297

http://www.hsls.pitt.edu/guides/genetics