introduction to bioinformaticsvasighi/courses/bioinfo98/bioinfo_01.pdf · (ncbi) defines...
TRANSCRIPT
DNA control cellular functions
contains genes
base pairs 2,159,779 to 2,161,209 on chromosome 11
Insulin
Proteinis made up of one or more chains of amino acids
MALWMRLLPLLALLALWGPDPAAAFVNQ
HLCGSHLVEALYLVCGERGFFYTPKTRR
EAEDLQVGQVELGGGPGAGSLQPLALEG
SLQKRGIVEQCCTSICSLYQLENYCN
Proteinchain should be fold in to a 3D structure to do an action
MALWMRLLPLLALLALWGPDPAAAFVNQ
HLCGSHLVEALYLVCGERGFFYTPKTRR
EAEDLQVGQVELGGGPGAGSLQPLALEG
SLQKRGIVEQCCTSICSLYQLENYCN
informaticsBioBiology
Biochemistry
Biological Data
Applied Mathematics
Computer Science
Knowledge Extraction
StatisticsMedicine
Bioinfomatics
Biochemistry
Molecular Biology
Biophysics
Pharmacology
Medicine
Pathology
Computer science
Statistics
Mathematics
Computer
Science
Biology
and
Medicine
Mathematics
and
Statistics
Basics in molecular
and cell biology
Measurement
techniques
Programming
Databases
Algorithms
Calculus
Probability
Linear algebra
Bioinformatics
Biological sequence
analysis
Biological databases
Analysis of gene
expression
Modeling protein
structure and
function
Gene, protein and
metabolic networks
…
Prof. Juho Rousu, 2006
Bioinformatics is the science of information and information flow
in biological systems, especially the use of computational
methods in genetics and genomics.
The National Center for Biotechnology Information
(NCBI) defines bioinformatics as:
" Bioinformatics is the field of science in which
biology, computer science, and information technology
merge into a single discipline.
There are three important sub-disciplines within
bioinformatics:
- the development of new algorithms and statistics to
assess relationships in large data sets;
- the analysis and interpretation of various types of
data including nucleotide and amino acid sequences,
protein domains, and protein structures;
- the development and implementation of tools that
enable efficient access and management of different
types of information."
Method Inform Med 4/2001
the aims of bioinformatics are three-fold: organizing biological databases
to develop tools and resources for data analysis
to use these tools to analyze the data and interpret the
results in a biologically meaningful manner
PLOS Computational Biology, 2014
Bioinformatics User
Bioinformatics Scientist
Bioinformatics Engineer
General categories in Bioinformatics
Databases
Building DBs
Querying
Text String Comparison
Pairwise Alignment
Multiple Alignment
Text Search
Finding Patterns
AI / Machine Learning
Clustering
Data mining
GeometryMatching structure
Computational Geometry
Computer Vision
Physical Simulation
Newtonian Mechanics
Electrostatics
Numerical Algorithms
Simulation
Biologically-inspired computation, e.g., genetic
algorithms and neural networks
However, application of neural networks and
other biologically-inspired methods to solve
some biological problem, could be called
bioinformatics
What is not bioinformatics?
Origin of bioinformatics
The first protein sequence reported was that of bovine
insulin in 1956, consisting of 51 residues.
Nearly a decade later, the first nucleic acid sequence was
reported, that of yeast tRNAalanine with 77 bases.
In 1965, Dr Margaret Dayhoff gathered all
the available sequence data to create the
first bioinformatics database (Atlas of Protein
Sequence and Structure).
The Protein Data Bank (PDB) followed in 1972
with a collection of ten X-ray crystallographic
protein structures.
The SWISSPROT protein sequence database
began in 1987.
The GenBank sequence
database was created in
1982.
It is include collection of all
publicly available nucleotide
sequences and their protein
translations.
From DNA to Genome
Watson and Crick
DNA modelinsulin protein
Sanger DNA
sequencing
PCR (Polymerase
Chain Reaction)
1955
1960
1965
1970
1975
1980
1985
ARPANET
(early Internet)
PDB
(Protein Data Bank)
Sequence
alignment
GenBank database
Dayhoff’s Atlas
27
1995
1990
2000
SWISS-PROT
databaseNCBI
World Wide Web
BLAST
FASTA
EBI
Human Genome
Initiative
First human
genome draft
First bacterial
genome
Yeast genome
From DNA to Genome
Biological DatabasesThe European Molecular Biology Laboratory (EMBL)
Nucleotide sequencs, Protein Sequences, Structural information for proteins and
ligands.
http://www.ebi.ac.uk/
Biological Databases
ExPASy is the SIB Bioinformatics Resource Portal which provides access to
scientific databases and software tools
http://www.expasy.org/
PubMed is a free database accessing the MEDLINE
database of citations, abstracts and some full text articles
on life sciences and biomedical topics.
Biological Databases
http://www.ncbi.nlm.nih.gov/pubmed/
Bioinformatics Journals
title 'Bioinformatics'
Discipline Computational biology
Publisher Oxford Journals (UK)
Frequency 24 issues/year
Impact factor 5.766 (2015)
title ‘BMC Bioinformatics'
Discipline Bioinformatics
Publisher Biomed Central
Frequency Online publishing
Impact factor 2.435 (2015)
title ‘PLoS Computational Biology'
Discipline Computational biology
Publisher Public Library of Science (USA)
Frequency 12 issues/year
Impact factor 3.020 (2015)
Calculation of
Impact Factor ?
Bioinformatics Journals
title ‘Nucleic Acids Research'
Discipline Nucleic Acids
Publisher Oxford University (UK)
Frequency Biweekly
Impact factor 9.112 (20154)
title ‘Amino acids'
Discipline Amino acids and Protein science
Publisher Springer Verlag
Frequency Monthly
Impact factor 3.196 (2015)
title
‘IEEE/ACM Transactions on
Computational Biology and
Bioinformatics'
Discipline Bioinformatics
PublisherThe Association for Computing
Machinery
Frequency Bimonthly
Impact factor 1.609 (2015)
Bioinformatics Journals
title ‘PLoS one'
Discipline Computational biology (Multidisciplinary)
Publisher Public Library of Science (USA)
Frequency Articles published upon acceptance
Impact factor 3.234 (2014)
title“Journal of Molecular Graphics and
Modelling”
Discipline computational chemistry & QSAR
Publisher ELSEVIER
Impact factor 1.674 (2015)
title ‘Bioinformation'
Disciplinemathematical and computational
analysis of biological data
Publisher Biomedical Informatics
Frequency Biweekly
Impact factor 0.80 (2015)
Course information
Course aims:This course is intended for anyone who is interested to know more about
fundamental concepts, research topics and applications of
bioinformatics.
One major goal is to introduce bioinformatics data. In addition, you will learn
several analysis methods like sequence alignment and related research fields.
A third major group of learning goals is related to Microarray technology and
related data-mining methods.
Skills needed:
• A basic understanding of biochemistry and molecular biology
• Moderate experience in programming (e.g. MATLAB)
Final score:
• Mid-term exam (20%)
• Final exam (40%)
• Home-works (10%)
• Final project (15%)
• Implement a method or data analysis technique by MATLAB
• Oral presentation (15%)
• Select a topic / Find a research paper related to your topic
Course information
Literatures:
Bioinformatics: Sequence and Genome Analysis
Second Edition
By David Mount
ISBN 978-087969712-9
Course information
Literatures:
Microarray Bioinformatics
By Dov Stekel
Cambridge University Press
ISBN 978-0-51-161553-5
Course information
Literatures:
Essential MATLAB for
Engineers and Scientists
By Brian H. Hahn & Daniel T. Valentine
Academic Press
ISBN 978-0-12-374883-6
Course information