introduction to bioinformaticsvasighi/courses/bioinfo98/bioinfo_01.pdf · (ncbi) defines...

43
Introduction to Bioinformatics Part 1 Mahdi Vasighi

Upload: others

Post on 21-May-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Introduction to

Bioinformatics

Part 1

Mahdi Vasighi

Bioinformatics

Biology Cell theory

EvolutionGenetics

BiologyMolecular

is an information science

DNAThe Language of God

DNA stores information

can be copied

Adenine

Thymine

Cytosine

Guanine

DNA stores information

DNA sequence does matter

DNA control cellular functions

contains genes

base pairs 2,159,779 to 2,161,209 on chromosome 11

Insulin

DNA

DNA byte has three letter “codon”

Genes can be translated into Protein

Protein language has 20 alphabetsare made up of one or more chains of amino acids

Proteinis made up of one or more chains of amino acids

MALWMRLLPLLALLALWGPDPAAAFVNQ

HLCGSHLVEALYLVCGERGFFYTPKTRR

EAEDLQVGQVELGGGPGAGSLQPLALEG

SLQKRGIVEQCCTSICSLYQLENYCN

Proteinchain should be fold in to a 3D structure to do an action

MALWMRLLPLLALLALWGPDPAAAFVNQ

HLCGSHLVEALYLVCGERGFFYTPKTRR

EAEDLQVGQVELGGGPGAGSLQPLALEG

SLQKRGIVEQCCTSICSLYQLENYCN

informaticsBioBiology

Biochemistry

Biological Data

Applied Mathematics

Computer Science

Knowledge Extraction

StatisticsMedicine

Bioinfomatics

Biochemistry

Molecular Biology

Biophysics

Pharmacology

Medicine

Pathology

Computer science

Statistics

Mathematics

Computer

Science

Biology

and

Medicine

Mathematics

and

Statistics

Basics in molecular

and cell biology

Measurement

techniques

Programming

Databases

Algorithms

Calculus

Probability

Linear algebra

Bioinformatics

Biological sequence

analysis

Biological databases

Analysis of gene

expression

Modeling protein

structure and

function

Gene, protein and

metabolic networks

Prof. Juho Rousu, 2006

Bioinformatics is the science of information and information flow

in biological systems, especially the use of computational

methods in genetics and genomics.

The National Center for Biotechnology Information

(NCBI) defines bioinformatics as:

" Bioinformatics is the field of science in which

biology, computer science, and information technology

merge into a single discipline.

There are three important sub-disciplines within

bioinformatics:

- the development of new algorithms and statistics to

assess relationships in large data sets;

- the analysis and interpretation of various types of

data including nucleotide and amino acid sequences,

protein domains, and protein structures;

- the development and implementation of tools that

enable efficient access and management of different

types of information."

Method Inform Med 4/2001

the aims of bioinformatics are three-fold: organizing biological databases

to develop tools and resources for data analysis

to use these tools to analyze the data and interpret the

results in a biologically meaningful manner

PLOS Computational Biology, 2014

Bioinformatics User

Bioinformatics Scientist

Bioinformatics Engineer

General categories in Bioinformatics

Databases

Building DBs

Querying

Text String Comparison

Pairwise Alignment

Multiple Alignment

Text Search

Finding Patterns

AI / Machine Learning

Clustering

Data mining

GeometryMatching structure

Computational Geometry

Computer Vision

Physical Simulation

Newtonian Mechanics

Electrostatics

Numerical Algorithms

Simulation

Biologically-inspired computation, e.g., genetic

algorithms and neural networks

However, application of neural networks and

other biologically-inspired methods to solve

some biological problem, could be called

bioinformatics

What is not bioinformatics?

Origin of bioinformatics

The first protein sequence reported was that of bovine

insulin in 1956, consisting of 51 residues.

Nearly a decade later, the first nucleic acid sequence was

reported, that of yeast tRNAalanine with 77 bases.

In 1965, Dr Margaret Dayhoff gathered all

the available sequence data to create the

first bioinformatics database (Atlas of Protein

Sequence and Structure).

The Protein Data Bank (PDB) followed in 1972

with a collection of ten X-ray crystallographic

protein structures.

The SWISSPROT protein sequence database

began in 1987.

The GenBank sequence

database was created in

1982.

It is include collection of all

publicly available nucleotide

sequences and their protein

translations.

Nucl. Acids Res. (2015)

From DNA to Genome

Watson and Crick

DNA modelinsulin protein

Sanger DNA

sequencing

PCR (Polymerase

Chain Reaction)

1955

1960

1965

1970

1975

1980

1985

ARPANET

(early Internet)

PDB

(Protein Data Bank)

Sequence

alignment

GenBank database

Dayhoff’s Atlas

27

1995

1990

2000

SWISS-PROT

databaseNCBI

World Wide Web

BLAST

FASTA

EBI

Human Genome

Initiative

First human

genome draft

First bacterial

genome

Yeast genome

From DNA to Genome

Biological DatabasesThe European Molecular Biology Laboratory (EMBL)

Nucleotide sequencs, Protein Sequences, Structural information for proteins and

ligands.

http://www.ebi.ac.uk/

Biological Databases

ExPASy is the SIB Bioinformatics Resource Portal which provides access to

scientific databases and software tools

http://www.expasy.org/

http://www.reactome.org

Biological Databases

REACTOME is an open access pathway database.

PubMed is a free database accessing the MEDLINE

database of citations, abstracts and some full text articles

on life sciences and biomedical topics.

Biological Databases

http://www.ncbi.nlm.nih.gov/pubmed/

Bioinformatics Journals

title 'Bioinformatics'

Discipline Computational biology

Publisher Oxford Journals (UK)

Frequency 24 issues/year

Impact factor 5.766 (2015)

title ‘BMC Bioinformatics'

Discipline Bioinformatics

Publisher Biomed Central

Frequency Online publishing

Impact factor 2.435 (2015)

title ‘PLoS Computational Biology'

Discipline Computational biology

Publisher Public Library of Science (USA)

Frequency 12 issues/year

Impact factor 3.020 (2015)

Calculation of

Impact Factor ?

Bioinformatics Journals

title ‘Nucleic Acids Research'

Discipline Nucleic Acids

Publisher Oxford University (UK)

Frequency Biweekly

Impact factor 9.112 (20154)

title ‘Amino acids'

Discipline Amino acids and Protein science

Publisher Springer Verlag

Frequency Monthly

Impact factor 3.196 (2015)

title

‘IEEE/ACM Transactions on

Computational Biology and

Bioinformatics'

Discipline Bioinformatics

PublisherThe Association for Computing

Machinery

Frequency Bimonthly

Impact factor 1.609 (2015)

Bioinformatics Journals

title ‘PLoS one'

Discipline Computational biology (Multidisciplinary)

Publisher Public Library of Science (USA)

Frequency Articles published upon acceptance

Impact factor 3.234 (2014)

title“Journal of Molecular Graphics and

Modelling”

Discipline computational chemistry & QSAR

Publisher ELSEVIER

Impact factor 1.674 (2015)

title ‘Bioinformation'

Disciplinemathematical and computational

analysis of biological data

Publisher Biomedical Informatics

Frequency Biweekly

Impact factor 0.80 (2015)

Bioinformatics Journals

http://www.conferencealerts.com/

http://www.biostec.org/

Published: June 29, 2007

DOI: 10.1371/journal.pcbi.0030116

Course information

Course aims:This course is intended for anyone who is interested to know more about

fundamental concepts, research topics and applications of

bioinformatics.

One major goal is to introduce bioinformatics data. In addition, you will learn

several analysis methods like sequence alignment and related research fields.

A third major group of learning goals is related to Microarray technology and

related data-mining methods.

Skills needed:

• A basic understanding of biochemistry and molecular biology

• Moderate experience in programming (e.g. MATLAB)

Final score:

• Mid-term exam (20%)

• Final exam (40%)

• Home-works (10%)

• Final project (15%)

• Implement a method or data analysis technique by MATLAB

• Oral presentation (15%)

• Select a topic / Find a research paper related to your topic

Course information

Literatures:

Bioinformatics: Sequence and Genome Analysis

Second Edition

By David Mount

ISBN 978-087969712-9

Course information

Literatures:

Microarray Bioinformatics

By Dov Stekel

Cambridge University Press

ISBN 978-0-51-161553-5

Course information

Literatures:

Essential MATLAB for

Engineers and Scientists

By Brian H. Hahn & Daniel T. Valentine

Academic Press

ISBN 978-0-12-374883-6

Course information