1 introduction to bioinformatics fall 2008. 2 administration adi doron doronadi@post.tau.ac.il ...

Post on 17-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

11

Introduction to Introduction to BioinformaticsBioinformatics

Fall 2008

22

AdministrationAdministration

Adi DoronAdi Doron doronadi@post.tau.ac.il doronadi@post.tau.ac.il Nimrod RubinsteinNimrod Rubinstein rubi@post.tau.ac.il rubi@post.tau.ac.il Dudu BursteinDudu Burstein davidbur@post.tau.ac.il davidbur@post.tau.ac.il Reception hours:Reception hours:

by appointmentby appointmentBritania 405, 6409245Britania 405, 6409245

33

Course WebsiteCourse Website

http://bioinfo.tau.ac.il/~intro_bioinfo/http://bioinfo.tau.ac.il/~intro_bioinfo/

44

ExercisesExercises

Each student participates once in 2 weeks:Each student participates once in 2 weeks:Sunday 16:00-18:00Sunday 16:00-18:00Monday 12:00-14:00Monday 12:00-14:00

Monday 14:00-16:00 Monday 14:00-16:00 Computer classroom Sherman 03Computer classroom Sherman 03

55

RequirementsRequirements

Exam – 80% of final gradeExam – 80% of final grade Assignments – 20% of final grade Assignments – 20% of final grade

(Compulsory)(Compulsory) Assignments include class and home works:Assignments include class and home works:

• Class works are planned to be completed during the Class works are planned to be completed during the exercise. They should be mailed to the TA. They will exercise. They should be mailed to the TA. They will be checked but not graded.be checked but not graded.

• Home works should be handed in the following Home works should be handed in the following exercise (2 weeks after the hand out date). They will exercise (2 weeks after the hand out date). They will be checked and graded.be checked and graded.

66

GoalsGoals

To familiarize the students with research topics To familiarize the students with research topics in bioinformatics, and with bioinformatic toolsin bioinformatics, and with bioinformatic tools

The emphasis will be on tools and their useThe emphasis will be on tools and their use

PrerequisitesPrerequisites

Familiarity with topics in molecular biology Familiarity with topics in molecular biology (cell biology and genetics)(cell biology and genetics)

Basic familiarity with computers & internetBasic familiarity with computers & internet

77

BIOINFORMATIC DATABASESBIOINFORMATIC DATABASES

88

What’s in a databaseWhat’s in a database?? Sequences – genes, proteins, etc.Sequences – genes, proteins, etc.

Full genomesFull genomes

Annotation – information about the gene/protein:Annotation – information about the gene/protein:- function- function- cellular location- cellular location- chromosomal location- chromosomal location- introns/exons- introns/exons- protein structure- protein structure- phenotypes, diseases- phenotypes, diseases

PublicationsPublications

99

NCBI and EntrezNCBI and Entrez

One of the largest and most comprehensive One of the largest and most comprehensive databases belonging to the NIH – national databases belonging to the NIH – national institute of health (USA)institute of health (USA)

Entrez is the search engine of NCBIEntrez is the search engine of NCBI Search for :Search for :

genes, proteins, genomes, structures, diseases, genes, proteins, genomes, structures, diseases, publications and morepublications and more..

httphttp://://wwwwww..ncbincbi..nlmnlm..nihnih..govgov//

1010

Search for published papersSearch for published papers Yang X, Kurteva S, Ren X, Lee S,Yang X, Kurteva S, Ren X, Lee S,

Sodroski JSodroski J.. “Subunit stoichiometry of human “Subunit stoichiometry of human immunodeficiency virus type 1 envelope glycoprotein immunodeficiency virus type 1 envelope glycoprotein trimers during virus entry into host cells “, J Viroltrimers during virus entry into host cells “, J Virol.. 2006 2006

May;80(9):4388-95.May;80(9):4388-95.

1111

Use fieldsUse fields!!Yang[AU] AND glycoprotein[TI] AND 2006[DP] AND J virol[TA]

For the full list of field tags: go to help -> Search Field Descriptions and Tags

1212

ExerciseExercise

Retrieve all publications in which the Retrieve all publications in which the first first author is:author is: Pe'er I Pe'er I and the and the last author is:last author is: Shamir RShamir R

1313

Using LimitsUsing Limits

Retrieve the publications of Friedman N, in the journals: Bioinformatics and Journal of Computational Biology, in the last 5 years

1414

Google scholarGoogle scholarhttp://scholar.google.com/

1515

1616

NCBI gene & protein databases: NCBI gene & protein databases: GenBankGenBank

GenBankGenBank is an annotated collection of all is an annotated collection of all publicly available DNA sequences. publicly available DNA sequences.

Holds Holds 65 billion65 billion bases (Oct. 2007)bases (Oct. 2007)

GenPeptGenPept is a database of translated is a database of translated coding sequences from GenBankcoding sequences from GenBank

1717

Searching for CD4 human using Searching for CD4 human using EntrezEntrez

Search demonstrationSearch demonstration

1818

1919

Using Field Descriptions, Qualifiers, Using Field Descriptions, Qualifiers, and Boolean Operatorsand Boolean Operators

Cd4[GENE] AND human[ORGN] Cd4[GENE] AND human[ORGN] Or Or Cd4[gene name] AND human[organism]Cd4[gene name] AND human[organism]

List of field codes: List of field codes: httphttp://://wwwwww..ncbincbi..nlmnlm..nihnih..govgov//entrezentrez//queryquery//staticstatic//helphelp//Summary_MatricesSummary_Matrices..html#Search_Fields_and_Qualifiershtml#Search_Fields_and_Qualifiers

Boolean Operators:Boolean Operators:ANDANDORORNOTNOT

Note: do not use the field Protein name [PROT], only Note: do not use the field Protein name [PROT], only GENE!GENE!

2020

2121

RefSeqRefSeq REFSEQ: sub-collection of NCBI databases with REFSEQ: sub-collection of NCBI databases with

only non-redundant, highly annotated entries only non-redundant, highly annotated entries (genomic DNA, transcript (RNA), and protein (genomic DNA, transcript (RNA), and protein products)products)

2222

2323An explanation on GenBank records

2424

Accession NumbersAccession NumbersGenBankGenBank

EMBLEMBL

Two letters followed by six digits, e.g.:Two letters followed by six digits, e.g.:AY123456AY123456

One letter followed by five digits, eOne letter followed by five digits, e..gg.:.:U12345U12345

GenPept (a.a. GenPept (a.a. translations of translations of GenBank)GenBank)

Three letters and five digits, e.g.:Three letters and five digits, e.g.:AAA12345AAA12345

RefseqRefseqRefSeq accession numbers can be distinguished from RefSeq accession numbers can be distinguished from GenBank accessions by their prefix distinct format of GenBank accessions by their prefix distinct format of [[2 2 characters+underscorecharacters+underscore]], e.g.: , e.g.: NP_015325NP_015325..NM_: nucleotide, NP_: proteinNM_: nucleotide, NP_: protein

SWISSSWISS--PROTPROT

(another protein (another protein database)database)

All are six charactersAll are six characters::Character/FormatCharacter/Format1 [O,P,Q] 2 [0-9] 3 [A-Z,0-9] 4 [A-Z,0-9]1 [O,P,Q] 2 [0-9] 3 [A-Z,0-9] 4 [A-Z,0-9]5 [A-Z,0-9] 6 [0-9] 5 [A-Z,0-9] 6 [0-9] e.g.:e.g.:P12345P12345 and and Q9JJS7Q9JJS7

PDB (Protein Data PDB (Protein Data Bank – structure Bank – structure database)database)

one digit followed by three letters, eone digit followed by three letters, e..gg.:.:1hxw1hxw

2525

SwissprotSwissprot

A protein sequence database which A protein sequence database which strives to provide a high level of strives to provide a high level of annotation:annotation:* the function of a protein* the function of a protein* domains structure* domains structure* post* post--translational modificationstranslational modifications* variants* variants

One entry for each proteinOne entry for each protein

2626

2727

GenBank Vs. Swiss-ProtGenBank Vs. Swiss-Prot

GenBank results Swiss-Prot results

2828

Downloading & Fasta formatDownloading & Fasta format Fasta formatFasta format

> sp|P01730|CD4_HUMAN T-cell surface glycoprotein CD4 precursor MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLTKGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLTLTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSIVYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPLHLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAKVSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCVRCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI

Save Accession Numbers for future use (makes searching quicker):Refseq: NP_000607Swissprot: P01730

2929

3030

PDBPDB:: Protein Data Bank Protein Data Bank

Main database of 3D structures.Main database of 3D structures. Includes ~47,000 entries (Includes ~47,000 entries (proteinsproteins, ,

nucleic acids, others).nucleic acids, others). Proteins organized in groups, families etc.Proteins organized in groups, families etc. Is highly redundant.Is highly redundant. http://www.rcsb.orghttp://www.rcsb.org

3131

CD4 in complex with gp120CD4 in complex with gp120

gp120

CD4

PDB ID 1G9M

3232

Model organisms have independent database:Model organisms have independent database:

Organism specificOrganism specific

HIV database http://hiv-web.lanl.gov/content/index

3333

GenecardsGenecards

All in one database of human genes (a All in one database of human genes (a project by Weizmann institute) project by Weizmann institute)

Attempts to integrate as many as possible Attempts to integrate as many as possible databases, publications and all available databases, publications and all available knowledgeknowledge

httphttp://://wwwwww..genecardsgenecards..orgorg

3434

3535

SummarySummary

General and comprehensive databases:General and comprehensive databases: NCBI, EMBL, DDBJNCBI, EMBL, DDBJ

Genome specific databases:Genome specific databases: ENSEMBL, UCSC genome browserENSEMBL, UCSC genome browser

Highly annotated databases:Highly annotated databases: Human genesHuman genes

• Genecards Genecards Proteins:Proteins:

• Swissprot, RefseqSwissprot, Refseq Structures:Structures:

• PDBPDB

3636

The MOST important of allThe MOST important of all

1.1.GoogleGoogle (or any search engine) (or any search engine)

3737

And always rememberAnd always remember::

2.2.RT(F)MRT(F)M – –

Read the manual!!Read the manual!!

3838

HelpHelp!!

Read the Help sectionRead the Help section Read the FAQ sectionRead the FAQ section Google the question!Google the question!

top related