1 introduction to bioinformatics. 2 what is bioinformatics?

104
1 Introduction to Bioinformatics

Upload: spencer-cross

Post on 16-Dec-2015

246 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

1

Introduction to Bioinformatics

Page 2: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

2

What is Bioinformatics?

Page 3: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

3

NIH – definitions

What is Bioinformatics? - Research, development, and application of computational tools and approaches for expanding the use of biological, medical, behavioral, and health data, including the means to acquire, store, organize, archive, analyze, or visualize such data.

What is Computational Biology? - The development and application of analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social data.

on molecular

Page 4: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

4

NSF – introductionLarge databases that can be accessed and analyzed with sophisticated tools have become central to biological research and education. The information content in the genomes of organisms, in the molecular dynamics of proteins, and in population dynamics, to name but a few areas, is enormous. Biologists are increasingly finding that the management of complex data sets is becoming a bottleneck for scientific advances. Therefore, bioinformatics is rapidly become a key technology in all fields of biology.

Page 5: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

5

NSF – mission statementThe present bottlenecks in bioinformatics include the education of biologists in the use of advanced computing tools, the recruitment of computer scientists into this evolving field, the limited availability of developed databases of biological information, and the need for more efficient and intelligent search engines for complex databases.

Page 6: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

6

NSF – mission statementThe present bottlenecks in bioinformatics include the education of the education of biologists in the use of advanced computing toolsbiologists in the use of advanced computing tools, the recruitment of computer scientists into this evolving field, the limited availability of developed databases of biological information, and the need for more efficient and intelligent search engines for complex databases.

Page 7: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

7

Molecular Bioinformatics

Molecular Bioinformatics involves the use of computational tools to discover new information in complex data sets (from the one-dimensional information of DNA through the two-dimensional information of RNA and the three-dimensional information of proteins, to the four-dimensional information of evolving living systems).

Page 8: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Bioinformatics (Oxford English Dictionary):

The branch of science concerned with information and information flow in biological systems, esp. the use of computational methods in genetics and genomics.

8

Page 9: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Biologists collect molecular data: DNA & Protein sequences,gene expression, etc.

Computer scientists (+Mathematicians, Statisticians, etc.)Develop tools, softwares, algorithms to store and analyze the data.

BioinformaticiansStudy biological questions by analyzing molecular data

The field of science in which biology, computer science and information technology merge into a single discipline

9

Page 10: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Some biological background….

A biologist

10

Page 11: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

The hereditary information of all living organisms, with the exception of some viruses, is carried by deoxyribonucleic acid (DNA) molecules.

2 purines:2 purines: 2 pyrimidines:2 pyrimidines:

adenine (A)adenine (A) cytosine (C)cytosine (C)guanine (G)guanine (G) thymine (T)thymine (T)

two ringstwo rings one ringone ring11

Page 12: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Eukaryotes may have up to 3 subcellular genomes: 1. Nuclear2. Mitochondrial 3. Plastid

Bacteria have either circular or linear genomes and may also carry plasmids

The entire complement of genetic material carried by an individual is called the genome.

Human chromosomes

Circular genome12

Page 13: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Central dogma: DNA makes RNA makes Protein

Modified dogma: DNA makes DNA and RNA, RNA makes DNA, RNA an Protein

13

Page 14: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Amino acids - The protein building blocks

14

Page 15: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

15

Page 16: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Any region of the DNA sequence can, in principle, code for six different amino acid sequences, because any one of three different reading frames can be used to interpret each of the two strands.

16

Page 17: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Protein folding

A human Hemoglobin:

17

Page 18: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

How does it all looks like on a computer monitor?

18

Page 19: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

A cDNA sequence

>gi|14456711|ref|NM_000558.3| Homo sapiens hemoglobin, alpha 1 (HBA1), mRNAACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGC

19

Page 20: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

A cDNA sequence (reading frame)

A protein sequence

>gi|14456711|ref|NM_000558.3| Homo sapiens hemoglobin, alpha 1 (HBA1), mRNA

ACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACC

GTTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGC

>gi|4504347|ref|NP_000549.1| alpha 1 globin [Homo sapiens]

MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR

20

Page 21: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

ACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGCACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGCACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGCGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGCACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCG...

And, a whole genome…

21

Page 22: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

E. coli 4.6 x 106 nucleotides

– Approx. 4,000 genes

Yeast 15 x 106 nucleotides

– Approx. 6,000 genes

Human 3 x 109 nucleotides

– Approx. 30,000 genes

Smallest human chromosome 50 x 106 nucleotides

How big are whole genomes?

22

Page 23: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

What do we actually do with bioinformatics?

23

Page 24: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Sequence assembly

(next generation sequencing)

24

Page 25: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Genome annotation

25

Page 26: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Molecular evolution

Smith et al. (2009) Nature 459, 1122-1125

Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic

26

Page 27: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Analysis of gene expression

Gene expression profile of relapsing versus non-relapsing Wilms tumors. A set of 39 genes discriminates between the two classes of tumors.

(http://www.biozentrum2.uni-wuerzburg.de/, Prof. Gessler)27

Page 28: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Analysis of regulation

Toledo and Bardot (2009) Nature 460, 466-467

28

Page 29: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Protein structure predictionProtein docking

29

Page 30: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Luscombe, Greenbaum, Gerstein (2001)

30

Page 31: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

31

From DNA to Genome

Watson and Crick DNA model

Sanger sequences insulin protein

Sanger dideoxy DNA sequencing

PCR (Polymerase Chain Reaction)

1955

1960

1965

1970

1975

1980

1985

ARPANET (early Internet)

PDB (Protein Data Bank)

Sequence alignment

GenBank database

Dayhoff’s Atlas

Page 32: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

32

1995

1990

2000

SWISS-PROT database

NCBI

World Wide Web

BLAST

FASTA

EBI

Human Genome Initiative

First human genome draft

First bacterial genome

Yeast genome

Page 33: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

33

The first protein sequence reported was that of bovine insulin in 1956, consisting of 51 residues.

Origin of bioinformatics and biological databases:

Nearly a decade later, the first nucleic acid sequence was reported, that of yeast tRNAalanine with 77 bases.

Page 34: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

34

In 1965, Dayhoff gathered all the available sequence data to create the first bioinformatic database (Atlas of Protein Sequence and Structure).

The Protein DataBank followed in 1972 with a collection of ten X-ray crystallographic protein structures. The SWISSPROT protein sequence database began in 1987.

Page 35: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

35

Nucleotides

Page 36: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

36

as of August 2011:

Eukaryotes 37

Prokaryotes 1708

Total 1745

Complete Genomes

Page 37: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

37

What can we do with sequences and other type of molecular information?

Page 38: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

38

Annotation

Open reading frames

Functional sites

Structure, function

Page 39: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

39

CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG

CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA

CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC

AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA

AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA

TAT GGA CAA TTG GTT TCT TCT CTG AAT ......

.............. TGAAAAACGTA

Page 40: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

40

CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG

CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA

CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC

AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA

AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA

TAT GGA CAA TTG GTT TCT TCT CTG

AAT .................................

.............. TGAAAAACGTA

TF binding sitepromoter

Ribosome binding Site

ORF = Open Reading FrameCDS = Coding Sequence

Transcription

Start Site

Page 41: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

41

Comparativegenomics

Comparing ORFs

Identifying orthologs

Inferences on structure and function

Comparing functional sites

Inferences on regulatorynetworks

Page 42: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

42

Similarity profiles

Researchers can learned a great deal about the structure and function of human genes by examining their counterparts in model organisms.

Page 43: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

43

Xenopus MALWMQCLP-LVLVLLFSTPNTEALANQHLBos MALWTRLRPLLALLALWPPPPARAFVNQHL **** : * *.*: *:..* :. *:****

Xenopus CGSHLVEALYLVCGDRGFFYYPKIKRDIEQBos CGSHLVEALYLVCGERGFFYTPKARREVEG ***************:***** ** :*::*

Xenopus AQVNGPQDNELDG-MQFQPQEYQKMKRGIVBos PQVG---ALELAGGPGAGGLEGPPQKRGIV .**. ** * * *****

Xenopus EQCCHSTCSLFQLENYCNBos EQCCASVCSLYQLENYCN **** *.***:*******

Alignment preproinsulin

Page 44: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

44

Page 45: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

45

Ultraconserved Elements in the Human Genome Gill Bejerano, Michael Pheasant, Igor Makunin, Stuart Stephen, W. James Kent, John S. Mattick, & David Haussler (Science 2004. 304:1321-1325)There are 481 segments longer than 200 base pairs (bp) that are absolutely conserved (100% identity with no insertions or deletions) between orthologous regions of the human, rat, and mouse genomes. Nearly all of these segments are also conserved in the chicken and dog genomes, with an average of 95 and 99% identity, respectively. Many are also significantly conserved in fish. These ultraconserved elements of the human genome are most often located either overlapping exons in genes involved in RNA processing or in introns or nearby genes involved in the regulation of transcription and development.

There are 156 intergenic, untranscribed, ultraconserved segments

Page 46: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

46

Junk is real!

Junk:Junk:Supporting evidenceSupporting evidence

Page 47: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

47

Functionalgenomics

Genome-wide profiling of:• mRNA levels• Protein levels

Co-expression of genesand/or proteins

Identifying protein-protein interactions

Networks of interactions

Page 48: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

48

Understanding the function of genes and other Understanding the function of genes and other parts of the genomeparts of the genome

Page 49: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

49

Structural genomics

Assign structure to all proteins encoded in a genome

Page 50: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Biological databases

50

Page 51: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Database or databank?

Initially

• Databank (in UK)• Database (in the USA)

Solution

• The abbreviation db

51

Page 52: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

What is a Database?

A structured collection of data held in computer storage; esp. one

that incorporates software to make it accessible in a variety of ways;

transf., any large collection of information.

database management: the organization and manipulation of data in

a database.

database management system (DBMS): a software package that

provides all the functions required for database management.

database system: a database together with a database

management system.

Oxford Dictionary52

Page 53: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

What is a database?• A collection of data

– structured – searchable (index) -> table of contents

– updated periodically (release) -> new edition

– cross-referenced (hyperlinks) -> links with other db

• Includes also associated tools (software) necessary for access, updating, information insertion, information deletion….

• Data storage management: flat files, relational databases…

53

Page 54: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Database: a « flat file » example

Accession number: 1

First Name: Amos

Last Name: Bairoch

Course: Pottery 2000; Pottery 2001;

//

Accession number: 2

First Name: Dan

Last name: Graur

Course: Pottery 2000, Pottery 2001; Ballet 2001, Ballet 2002

//

Accession number 3:

First Name: John

Last name: Travolta

Course: Ballet 2001; Ballet 2002;

//

• Easy to manage: all the entries are visible at the same time !

Flat-file database («  flat file, 3 entries  »):

54

Page 55: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Database: a « relational » example

Teacher Accession number

Education

Amos 1 Biochemistry

Dan 2 Genetics

John 3 Scientology

Course Year Involved teachers

Advanced Pottery

2000; 2001 1; 2

Ballet for Fat People

2001; 2002 2; 3

Relational database (« table file »):

55

Page 56: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Why biological databases?• Exponential growth in biological data.

• Data (genomic sequences, 3D structures, 2D gel analysis, MS analysis, Microarrays….) are no longer published in a conventional manner, but directly submitted to databases.

• Essential tools for biological research. The only way to publish massive amounts of data without using all the paper in the world.

56

Page 57: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Distribution of sequences

• Books, articles 1968 -> 1985• Computer tapes 1982 -> 1992• Floppy disks 1984 -> 1990• CD-ROM 1989 ->• FTP 1989 ->• On-line services 1982 -> 1994• WWW 1993 ->• DVD 2001 ->

57

Page 58: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Some statistics• More than 1000 different ‘biological’ databases

• Variable size: <100Kb to >20Gb– DNA: > 20 Gb

– Protein: 1 Gb

– 3D structure: 5 Gb

– Other: smaller

• Update frequency: daily to annually to seldom to forget about it.

• Usually accessible through the web (some free, some not)

58

Page 59: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Some databases in the field of molecular biology…

AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb, ARR, AsDb, BBDB, BCGD, Beanref, Biolmage,

BioMagResBank, BIOMDB, BLOCKS, BovGBASE,BOVMAP, BSORF, BTKbase, CANSITE, CarbBank,

CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP,ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG,

CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb,Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC,ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db,

ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView,GCRDB, GDB, GENATLAS, Genbank, GeneCards,

Genline, GenLink, GENOTK, GenProtEC, GIFTS,GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB,

HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD,HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB,HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat,

KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB,Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5

Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us,MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase,OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB,

PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD,PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE,

PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE,SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase,SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D,

SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS-MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB,

TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE,VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD,

YPM, etc .................. !!!!59

Page 60: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

Categories of databases for Life Sciences

• Sequences (DNA, protein)• Genomics• Mutation/polymorphism• Protein domain/family• Proteomics (2D gel, Mass Spectrometry)• 3D structure• Metabolic networks• Regulatory networks• Bibliography• Expression (Microarrays,…)

• Specialized60

Page 61: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

61

NCBI:http://www.ncbi.nlm.nih.gov

EBI:http://www.ebi.ac.uk/

DDBJ:http://www.ddbj.nig.ac.jp/

Page 62: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

62

Bookshelf: A collection of searchable biomedical books linked to PubMed.

PubMed: Allows searching by author names, journal titles, and a new Preview/Index option. PubMed database provides access to over 12 million MEDLINE citations back to the mid-1960's. It includes History and Clipboard options which may enhance your search session.

PubMed Central: The U.S. National Library of Medicine digital archive of life science journal literature.

OMIM: Online Mendelian Inheritance in Man is a database of human genes and genetic disorders (also OMIA).

Literature Databases:

Page 63: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?
Page 64: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

PubMed (Medline)

• MEDLINE covers the fields of medicine, nursing, dentistry,

veterinary medicine, public health, and preclinical sciences

• Contains citations from approximately 5,200 worldwide journals in

37 languages; 60 languages for older journals.

• Contains over 20 million citations since 1948

• Contains links to biological db and to some journals

• New records are

added to

PreMEDLINE daily!

Page 65: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

65

• Alerting services– http://www.pubcrawler.ie/– http://www.biomail.org

Page 66: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

A search by subject: “mitochondrion evolution”

Page 67: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?
Page 68: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

68

Type in a Query term

• Enter your search words in the

query box and hit the “Go” button

http://www.ncbi.nlm.nih.gov/entrez/query/static/help/helpdoc.html#Searching

Page 69: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

69

The Syntax …1. Boolean operators: AND, OR, NOT must be entered in

UPPERCASE (e.g., promoters OR response elements). The default is AND.

2. Entrez processes all Boolean operators in a left-to-right sequence. The order in which Entrez processes a search statement can be changed by enclosing individual concepts in parentheses. The terms inside the parentheses are processed first. For example, the search statement: g1p3 OR (response AND element AND promoter).

3. Quotation marks: The term inside the quotation marks is read as one phrase (e.g. “public health” is different than public health, which will also include articles on public latrines and their effect on health workers).

4. Asterisk: Extends the search to all terms that start with the letters before the asterisk. For example, dia* will include such terms as diaphragm, dial, and diameter.

Page 70: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

70

Refine the Query• Often a search finds too many (or too few) sequences, so you

can go back and try again with more (or fewer) keywords in your query

• The “History” feature allows you to combine any of your past queries.

• The “Limits” feature allows you to limit a query to specific organisms, sequences submitted during a specific period of time, etc.

• [Many other features are designed to search for literature in MEDLINE]

Page 71: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

71

You can search for a text term in sequence annotations or in MEDLINE abstracts, and find all articles, DNA, and protein sequences that mention that term.

Then from any article or sequence, you can move to "related articles" or "related sequences".

•Relationships between sequences are computed with BLAST

•Relationships between articles are computed with "MESH" terms (shared keywords)

•Relationships between DNA and protein sequences rely on accession numbers

•Relationships between sequences and MEDLINE articles rely on both shared keywords and the mention of accession numbers in the articles.

Related Items

Page 72: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

72

Page 73: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

73

Page 74: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

74

Page 75: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

A search by authors: “Esser” [au] AND “martin” [au]

Page 76: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

A search by title word: “Wolbachia pipientis” [ti]

Page 77: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

77

Database Search Strategies

• General search principles - not limited to sequence (or to biology).

• Start with broad keywords and narrow the search using more specific terms.

• Try variants of spelling, numbers, etc.

• Search many databases.

• Be persistent!!

Page 78: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

78

Searching PubMed

• Structureless searches– Automatic term mapping

• Structured searches– Tags, e.g. [au], [ta], [dp], [ti]– Boolean operators, e.g. AND, OR, NOT, ()

• Additional features– Subsets, limits– Clipboard, history

Page 79: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

79

Start working:

Search PubMed

1. cuban cigars

2. cuban OR cigars

3. “cuban cigars”

4. cuba* cigar*

5. (cuba* cigar*) NOT smok*

6. Fidel Castro

7. “fidel castro”

8. #6 NOT #7

Page 80: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

80

“Details” and “History” in PubMed

Page 81: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

81

“Details” and “History” in PubMed

Page 82: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

82

The OMIM (Online Mendelian Inheritance in Man)

– Genes and genetic disorders– Edited by team at Johns Hopkins– Updated daily

Page 83: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

83

MIM Number Prefixes

* gene with known sequence+ gene with known sequence and

phenotype# phenotype description, molecular

basis known% mendelian phenotype or locus,

molecular basis unknownno prefix other, mainly phenotypes with

suspected mendelian basis

Page 84: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

84

Searching OMIM

• Search Fields– Name of trait, e.g., hypertension– Cytogenetic location, e.g., 1p31.6– Inheritance, e.g., autosomal dominant– Gene, e.g., coagulation factor VIII

Page 85: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

85

OMIM search tags

All Fields [ALL]Allelic Variant [AV] or [VAR]Chromosome [CH] or [CHR]Clinical Synopsis [CS] or [CLIN]Gene Map [GM] or [MAP]Gene Name [GN] or [GENE]Reference [RE] or [REF]

Page 86: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

86

Page 87: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

87

Start working:

Search OMIM

How many types of hemophilia are there?

For how many is the affected gene known?

What are the genes involved in hemophilia A?

What are the mutations in hemophilia A?

Page 88: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

88

Online Literature databases

1. How to use the UH online Library?

2. Online glossaries

3. Google Scholar

4. Google Books

5. Web of Science

Page 89: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

89

How to use the online UH Library? http://info.lib.uh.edu/

Page 90: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

90

Page 91: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

91

Online Glossaries

Bioinformatics :http://www.geocities.com/bioinformaticsweb/glossary.htmlhttp://big.mcw.edu/

Genomics: http://www.geocities.com/bioinformaticsweb/genomicglossary.html

Molecular Evolution: http://workshop.molecularevolution.org/resources/glossary/

Biology dictionary: http://www.biology-online.org/dictionary/satellite_cells

Other glossaries, e.g., the list of phobias:http://www.phobialist.com/class.html

Page 92: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

92

4. Google Scholarhttp://www.scholar.google.com/

Page 93: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

93

Enables you to search specifically for scholarly literature, including peer-reviewed papers, theses, books, preprints, abstracts and technical reports from all broad areas of research.

What is Google Scholar?

Page 94: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

94

Use Google Scholar to find articles from a wide variety of academic publishers, professional societies, preprint repositories and universities, as well as scholarly articles available across the web.

Page 95: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

95

Google Scholar orders your search results by how relevant they are to your query, so the most useful references should appear at the top of the page

This relevance ranking takes into account the: full text of each article. the article's author, the publication in which the article appeared and how often it has been cited in scholarly literature.

Page 96: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

96

What other DATA can we retrieve from the record?

Page 97: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

97

Page 98: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

98

Page 99: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

99

5. Google Book Search

Page 100: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

100

Page 101: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

101

Start working:

Search Google Books

How many times is the tail of the giraffe mentioned in On the Origin of Species by Mr. Darwin?

Page 102: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

102

6. Web of sciencehttp://http://apps.webofknowledge.com.ezproxy.lib.uh.edu/WOS_GeneralSearch_input.do?product=WOS&search_mode=GeneralSearch&SID=4FB7LbbLgDMhG9fDiLh&preferencesSaved=

Page 103: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

103

Page 104: 1 Introduction to Bioinformatics. 2 What is Bioinformatics?

104