application of compute science in tics
TRANSCRIPT
-
8/4/2019 Application of Compute Science in tics
1/41
CIS427: Introduction to
BioinformaticsThe application of computer science in
Bioinformatics
Dr V. C. Osamor
Department of Computer & Information Science(Bioinformatics unit)Covenant University
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
2/41
Objectives
To expose the students to possible areasof computer techniques applicable inBioinformatics.
To expose students to how prediction isimportant in solving Biology problem.
To expose students to how machine
learning is applicable in BioinformaticsTo expose students to how Expert Systemis applicable in Bioinformatics.
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
3/41
The prediction Problem
Make any sentence from English language
We know that letters do not occur in English at random(e.g. e is more common than z)
Predicting symbols is fundamental to a wide range ofimportant applications (e.g. encryption, compression)
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
4/41
Prediction in bioinformatics
This will largely require development of your own
computational tools or the use of existing tools inareas such as:
Predicting the location of genes in DNA
Predicting gene roles in an organismPredicting errors in a genetic transcription
Predicting the function of proteins
Predicting diseases from molecular samplesAnything that involves making a judgment; a
yes/no decision about whether some sampledatum does or does not have some property.
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
5/41
Representation
DATA - DATA
0101011101100101011001010111010000101101
to the computer, everything is binary!
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
6/41
0101011101100101011001010111010000101101
0101101100100111111011010011010000101101
A A C G T C A T T C G A T G A T T C G A
Just as we can teach a computer to predictthings about a sequence of letters in Englishprose, we can also teach it to predict thingsabout other sequenceslike a genetic
sequence
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
7/41
A genetic prediction problem
ttgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgcta
atggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacagcgcatcagacgcata
cgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagc
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
8/41
A genetic prediction problem
ttgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgctaatggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacagcgcatcagacgcatacgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagctgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgctaatggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacagcgcatcagacgcatacgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagtgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgctaatggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacag
cgcatcagacgcatacgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagtgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgctaatggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacagcgcatcagacgcatacgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagtgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgctaatggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacagcgcatcagacgcatacgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagtgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgctaatggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacagcgcatcagacgcat
acgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagtgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgctaatggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacagcgcatcagacgcatacgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagtgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgctaatggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacagcgcatcagacgcatacgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagtgttgcgcacccacaccagttatatagagacgaactc
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
9/41
A genetic prediction problem
A gene encodes a protein
It is a blueprint that provides biochemicalinstructions on how to construct a sequence ofamino acids so as to make a working proteinthat will perform some function in the organism
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
10/41
A genetic prediction problem
encoding regionuntranslated region
transcription
factor RNARNA
RNA
RDr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
11/41
-
8/4/2019 Application of Compute Science in tics
12/41
A genetic prediction problem
untranslated regionttgcaatcggcgctacgcttcaaaatttattatattcccggc
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
13/41
A genetic prediction problem
ttgcaatcggcgctacgcttcaaaatttattatattcccggc
What transcription factors bind to this gene?
Where is the transcription factor binding site?
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
14/41
A genetic prediction problem
ttgcaatcggcgctacgcttcaaaatttattatattcccggc
Clues: A binding site is often a short general pattern
E.g. CCGATNATCGG
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
15/41
A genetic prediction problem
ttgcaatcggcgctacgcttcaaaatttattatattcccggc
Clues: The patterns are often reverse complements
E.g. CCGATNATCGGGGCTANTAGCC
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
16/41
A genetic prediction problem
ttgcaatcggcgctacgcttcaaaatttattatattcccggc
Clues: Where there is one binding site, often there is
another nearby.
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
17/41
A genetic prediction problem-ALGORITHMS and DATA
STRUCTURES
All of these properties are the kinds of things forwhich computer science has developedalgorithms and data structures to identify quicklyand efficiently, and therefore it is exactly the
kind of problem computer scientists should beable to solve.
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
18/41
Proteomics
Three consecutive nucleotides in the coding regionform a codon i.e. encode an amino acid.
A string of amino acids makes a protein.
3 nucleotides, 4 possibilities each: ACTG
43
= 64 possible codons
But there are only 20 amino acids!
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
19/41
proteomics
Glycine: GGA, GGC, GGG, GGTTyrosine: TAT, TACMethionine: ATG
There is quite a bit of redundancy in codons.
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
20/41
Amidegroup
Carboxylgroup
R group
Amino Acid
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
21/41
Amino Acid
glycine
tyrosine
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
22/41
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
23/41
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
24/41
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
25/41
Artificial Intelligence
Computers do thingsonly human brainscan otherwise do
expertsystem
expert
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
26/41
Artificial Intelligence & MachineLearning
Computers do thingsonly human brainscan otherwise do
learningsystem
expertsystem
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
27/41
Machine learning
creating computer programs that get better with experiencelearn how to make expert judgmentsdiscover previously hidden, potentially useful information (data
mining)
What is machine learning?
How does it work?
user provides learning system with examples of concept to belearned
induction algorithm infers a characteristic model of the examplesmodel is used to predict whether or not future novel instances arealso examples and it does this very consistently, and very, veryquickly!
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
28/41
Statistical and Mathematicaltechniques
Statistical and mathematical techniques,ranging from exact, heuristics, fixedparameter and approximation algorithms
for problems based on parsimony models
To Markov Chain Monte Carlo algorithms
for Bayesian analysis of problems basedon probabilistic models.
Dr V.C. Osamor CIS427
http://en.wikipedia.org/wiki/Heuristicshttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Bayesian_analysishttp://en.wikipedia.org/wiki/Bayesian_analysishttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Heuristics -
8/4/2019 Application of Compute Science in tics
29/41
OTHER BIOINFORMATICSAPPLICATIONS
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
30/41
Bioinformatics Applications
Bioinformatics was applied in the creation andmaintenance of a database to store biologicalinformation at the beginning of the "genomicrevolution", such as nucleotide and amino acidsequences.
Development of this type of database involvednot only design issues but the development ofcomplex interfaces whereby researchers couldboth access existing data as well as submit newor revised data.
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
31/41
Biotechnology
Biologists know proteins, computerscientists know machine learning
Together, they can find out a lot of hiddeninformation about genes and proteins
Biotechnology is a multi-billion dollarindustry
Biotechnology is one of the best fundedareas of scientific research
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
32/41
Modelling a Biological system
There are two fundamental ways of modelling aBiological system (e.g. living cell) both comingunder Bioinformatic approaches.
Static Sequences - Proteins, Nucleic acids and Peptides
Structures - Proteins, Nucleic acids, Ligands(including metabolites and drugs) and Peptides
Interaction data among the above entities includingmicroarray data and Networks of proteins,metabolites
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
33/41
Modelling Biological System
Dynamic
Systems Biology comes under this categoryincluding reaction fluxes and variable
concentrations of metabolites Multi-Agent Based modelling approaches
capturing cellular events such as signalling,transcription and reaction dynamics
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
34/41
Major Research Efforts
Major research efforts in the field includesequence alignment, gene finding, genomeassembly,
drug design, drug discovery, protein structurealignment, protein structure prediction,
prediction of gene expression and protein-
protein interactions,genome-wide association studies
and the modeling of evolution.
Dr V.C. Osamor CIS427
a or researc areas
http://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformatics -
8/4/2019 Application of Compute Science in tics
35/41
a or researc areas 1 Sequence analysis
2 Genome annotation
3 Computational evolutionary biology
4 Analysis of gene expression
5 Analysis of regulation
6 Analysis of protein expression 7 Analysis of mutations in cancer
8 Comparative genomics
9 Modeling biological systems 10 High-throughput image analysis 11 Structural Bioinformatic Approaches
- Prediction of protein structure, Molecular Interaction ,Docking algorithms
Dr V.C. Osamor CIS427
S A l i
http://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformatics -
8/4/2019 Application of Compute Science in tics
36/41
Sequence AnalysisThis sequence information is analyzed to
determine genes that encode polypeptides(proteins), RNA genes, regulatory sequences,structural motifs, and repetitive sequences.
A comparison of genes within a species orbetween different species can show similaritiesbetween species (the use of molecularsystematics to construct phylogenetic trees).
With the growing amount of data, it becameimpractical to analyze DNA sequencesmanually. Today, computer programs such as
BLAST are used. Dr V.C. Osamor CIS427
Genome Annotation/Gene finding
http://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/BLASThttp://en.wikipedia.org/wiki/BLASThttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Species -
8/4/2019 Application of Compute Science in tics
37/41
Genome Annotation/Gene findingAnnotation is the process of marking the genesand other biological features in a DNA
sequence. The first genome annotationsoftware system was designed in 1995 by Dr.Owen White, who was part of the team at The
Institute for Genomic Research that sequencedand analyzed a free-living organism thebacterium Haemophilus influenzae.
He built a software system to find the genes(places in the DNA sequence that encode aprotein), the transfer RNA, and other features,and to make initial assignments of function tothose enes. Dr V.C. Osamor CIS427
http://en.wikipedia.org/wiki/Genome_projecthttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/Genome_project -
8/4/2019 Application of Compute Science in tics
38/41
Computational Evolutionary Bio
Evolutionary biology is the study of theorigin and descent of species, as well astheir change over time. Informatics hasassisted evolutionary biologists in severalkey ways; it has enabled researchers to:
Trace the evolution
Compare GenomesComputational prediction model ofpopulation over time
Dr V.C. Osamor CIS427
http://en.wikipedia.org/wiki/Evolutionary_biologyhttp://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Informatics_%28academic_field%29http://en.wikipedia.org/wiki/Informatics_%28academic_field%29http://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Evolutionary_biology -
8/4/2019 Application of Compute Science in tics
39/41
Comparative Genomics
The core of comparative genome analysisis the establishment of thecorrespondence between genes (orthology
analysis) or other genomic features indifferent organisms.
Dr V.C. Osamor CIS427
-
8/4/2019 Application of Compute Science in tics
40/41
-
8/4/2019 Application of Compute Science in tics
41/41
High throughput Imaging
Image analysis systems augment anobserver's ability to make measurementsfrom complex set of images, by improving
accuracy, objectivity, or speed. Biomedicalimaging is becoming more important forboth diagnostics and research.
http://en.wikipedia.org/wiki/Accuracyhttp://en.wikipedia.org/wiki/Objectivity_%28science%29http://en.wikipedia.org/wiki/Diagnosticshttp://en.wikipedia.org/wiki/Diagnosticshttp://en.wikipedia.org/wiki/Objectivity_%28science%29http://en.wikipedia.org/wiki/Accuracy