creativecommons/licenses/by-sa/2.0
DESCRIPTION
http://creativecommons.org/licenses/by-sa/2.0/. Bioinformatics. Prof:Rui Alves [email protected] 973702406 Dept Ciencies Mediques Basiques , 1st Floor , Room 1.08 Website:http :// web.udl.es / usuaris /pg193845/ testsite / - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/1.jpg)
http://creativecommons.org/licenses/by-sa/2.0/
![Page 2: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/2.jpg)
Bioinformatics
Prof:Rui [email protected]
973702406Dept Ciencies Mediques Basiques,
1st Floor, Room 1.08Website:http://web.udl.es/usuaris/pg193845/testsite/
Course Website: http://web.udl.es/usuaris/pg193845/Courses/Bioinfo_Biomed_2011/
![Page 3: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/3.jpg)
Language of the course
• Mine: English
• Slides: English
• Webpage: English
• Yours: Whichever you choose as long as I understand it. ALWAYS ASK WHEN YOU DON’T UNDERSTAND SOMETHING!!
![Page 4: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/4.jpg)
Web Page of the course
http://web.udl.es/usuaris/pg193845/Bioinfo_Biomed_2011/
• There, you will find all the information about your tasks, links to bioinformatics resources, and the lecture.
• It will be up from tomorrow onwards.
![Page 5: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/5.jpg)
Goals of this course
• Give you an integrated view of how to use computers and informatics to gain a systemic understanding of biological systems at the molecular level.
• Integrate bioinformatics, mathematical modelling and other areas of computational biology to save lab work and address problems that can not yet be solved at the lab.
![Page 6: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/6.jpg)
What this course will be
• A course to teach you how to think about problem, not a course to teach you how to use programs.
![Page 7: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/7.jpg)
Course Plan
• First part of the course (2-3 weeks): Broad introduction to bioinformatics and computational biology in molecular biology.
• Second part of the course: Problems for you to solve in group at home, + in-depth lectures about the different subjects you need to solve the problems.
![Page 8: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/8.jpg)
Evaluation Plan • 5 tasks in groups of four. At the end of each task
you deliver a paper as a group. (overall, all tasks will account for 70% of final grade).
• Final exam (with two sections) where a problem will be posed to each of you and you will have to outline how you would solve it (20%).
• My discretion (10%).• CAUTION: YOU NEED TO HAVE AT LEAST 6 IN
EACH TASK, AND IN EACH SECTION OF THE FINAL EXAM.
![Page 9: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/9.jpg)
Index
• Why bioinformatics?• Ontologies & Classification
schemes• Databases and servers
![Page 10: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/10.jpg)
Why Bioinformatics?Or
Things to do when it is raining and you want to have an integrated view about biological systems…
Prof:Rui [email protected]
973702406Dept Ciencies Mediques Basiques,
1st Floor, Room 1.08Website:http://web.udl.es/usuaris/pg193845/
Course Website: http://web.udl.es/usuaris/pg193845/Courses/Bioinfo_Biomed_2011/
![Page 11: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/11.jpg)
What obvious problems do large scale sets create?
• Imagine the 6 500 000 000 human beings born within the last 130 years and still alive.
• By and large a majority of them has had and education.
• What problems need solving to ensure that education?
Knowledge1 – Organize Knowledge2 – Organize its transmission
![Page 12: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/12.jpg)
First problem: organizing knowledge
• We do not need to know all there is to know in order to be productive in society.
• Furthermore we can not learn everything at the same time.
• Problem: How to organize knowledge into bite-sized packages that can be consecutively parceled out, and from which one can build upon?
![Page 13: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/13.jpg)
Organizing knowledge
Communication(Read, write, count)
Humanities
Sciences
…
![Page 14: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/14.jpg)
Second problem: organizing the transmission of knowledge
• The school system is a way in which the most people can be trained with the least societal effort
Not effective
![Page 15: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/15.jpg)
School and Books are the servers and databases of educating people
Users
Database
Server
New Server:
You
![Page 16: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/16.jpg)
Hey, it’s raining!!! Why don’t we try and figure out how all the little molecular pieces in a cell work together?!?!?!
Understanding biological systems
You’re WRONG!!!!!
I need more data!!! How do I plan what to
do now?
![Page 17: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/17.jpg)
The “omics” revolution in molecular biology
• Over many decades, a huge amount of biological data has accumulated.
• Unlike the “KNOWLEDGE” we discussed before, this data is not well organized and the connections between the different parcels of data are obscure.
• The omics revolution has compounded this problem 1000 fold because data now accumulates faster than ever.
![Page 18: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/18.jpg)
What is the “omics” revolution in molecular biology?
• The omics revolution is a period of about ten years in which
several different technologies that can be applied to study
the complement molecular landscape of cells!!!
• Genomics
• Proteomics
• Metabolomics
• Et caeteromics
![Page 19: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/19.jpg)
Understanding biological systems
I need more data!!! Why
don’t they give it to me
![Page 20: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/20.jpg)
The “omics” revolution in molecular biology
• (We!!) Biologists want the data to make sense and they (we) want it now!!!
![Page 21: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/21.jpg)
Comparison between the two problems
People organized the Knowledge transmission system and its connections over milenia of trial and error.
It is impossible for people to organize the biological knowledge brought about by omics in the 20 years that have passed since the beginning of the omics era.
![Page 22: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/22.jpg)
Why?
• Data is not well classified.
• Data is not well connected.
• Data is not well understood.
• Not enough people to do it in a short amount of time.
![Page 23: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/23.jpg)
New types of servers and databases are required for very fast organization and
data mining
Users
Database
Server
BIOINFORMATICS!!
![Page 24: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/24.jpg)
• Development and application of computational/informatic tools to the solution of biological problems
• The Standard of internet Bioinformatics:
What is Bioinformatics?
L A M PINUX
PACHE
Y
SQL
ERL
HPYTHON
Operating system
Internet server Database
server
Programing
language(s)
![Page 25: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/25.jpg)
• JAVA facilitates that the servers launch a smaller number of processes by using the client’s machines for calculus and allowing for a larger number of simultaneous connections.
• TOMCAT “talks” very well with JAVA.
The standards are changing
L T M JINUX
OMCAT
Y
SQL
AVA
Operating system
Internet server Database
server
Programing
language(s)
![Page 26: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/26.jpg)
What does a computer need to be effective?
• Well classified data• Ontologies, Classification schemes
• Well organized data• Databases, servers
• Good users
![Page 27: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/27.jpg)
Index
• Why bioinformatics?• Ontologies & Classification
schemes• Databases and servers
![Page 28: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/28.jpg)
Ontologies and classification schemes for data
Prof:Rui [email protected]
973702406Dept Ciencies Mediques Basiques,
1st Floor, Room 1.08Website:http://web.udl.es/usuaris/pg193845/testsite/
![Page 29: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/29.jpg)
Biological Classification Schemes
• What is an Ontology (in the Biological sense)?
A set of definitions of controlled vocabularies with hierarchical relationships to one another, that can easily be dealt with by computers
![Page 30: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/30.jpg)
What are Bio-Ontologies?
Biological Ontologies (Bio-ontologies) can be defined as a complex
hierarchical structure in which biological concepts are
described by their meanings (definitions) and relationships to
each other.
There are many Bio-Ontologies available and in use by databases.
The Plant Ontology, along with other ontologies such as the
Gene Ontology, are included in the open source Open
Biological Ontologies project at Sourceforge.
http://obofoundry.org/
![Page 31: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/31.jpg)
The Gene Ontology
The most well-known example of a bio-ontology is the Gene Ontology
(GO; http://www.geneontology.org) which describes three
biological domains: cellular component (where the gene product
locates), molecular function (what the gene product does) and
biological process (the cellular, developmental or physiological
events the gene product is involved in).
GO are used to describe gene products. Because these descriptions are
independent of species-specific nomenclature and uniformly
applied, it is possible to make meaningful and efficient
comparisons of genes across diverse taxa.
![Page 32: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/32.jpg)
Three “Super Categories of GO
• Molecular Function (what)– Tasks performed at the molecular level
• Biological Process (why)– How it pertains to the organism
• Cellular Component (where)– Its location
![Page 33: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/33.jpg)
Example
• Gene Name: BRCA1• Molecular Function: protein binding• Biological Process: DNA Replication and
Chromosome Cycle• Cellular Component: nucleus
![Page 34: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/34.jpg)
Structure of GO• How to define the relationship between concepts?• Example: How to relate the terms: “cell” “nucleus”
“membrane”
![Page 35: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/35.jpg)
How is GO Annotated?
• Manual– Humans sifting through primary literature
• Electronic– Assign GO Terms using already existing
information in databases.
![Page 36: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/36.jpg)
Evidence Code for GO Annotation
IEA Inferred from Electronic Annotation
ISS Inferred from Sequence Similarity
IEP Inferred from Expression Pattern
IMP Inferred from Mutant Phenotype
IGI Inferred from Genetic Interaction
IPI Inferred from Physical Interaction
IDA Inferred from Direct Assay
RCA Inferred from Reviewed Computational Analysis
TAS Traceable Author Statement
NAS Non-traceable Author Statement
IC Inferred by Curator
ND No biological Data available
Detailed info available from:
http://www.geneontology.org/doc/GO.evidence.html
![Page 37: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/37.jpg)
How to use GO in data analysis
• Simple Queries• Find over-represented GO categories in a list of
genes– Search Biological “Themes”
• Binning– Obtain a broad view of the distribution of major GO
terms in a list of genes.• Clustering Genes on GO terms
– Group together functionally related genes based on GO terms.
![Page 38: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/38.jpg)
GO Tools
• NetFlix – Get GO Annotation• AmiGO – Browser and Simple Queries• GoTermMapper – Binning(Go Slim)• GeneToolBox –
– Finding over-represented GO categories– Clustering based on similar GO terms – Query for Gene with Similar Function.
![Page 39: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/39.jpg)
GO is not very good
• EC numbers• Protein classification schemes• TF classification schemes• Transport proteins classification schemes• Etc.
![Page 40: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/40.jpg)
The EC number database
![Page 41: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/41.jpg)
The BRENDA database
![Page 42: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/42.jpg)
The TF classification database
![Page 43: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/43.jpg)
The signal transduction classification database
![Page 44: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/44.jpg)
The transport proteins classification database
All these classifications are reminiscente of the Dewey classification system for books!!!! (Remember public libraries?)
![Page 45: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/45.jpg)
A general protein classification database
![Page 46: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/46.jpg)
How close are we to have good, comprehensive & universally used
classifications?• Far!!!!!
• BMC Bioinformatics + Bioinformatics publish papers with proposals for new ontologies and classifications almost every month in one are or another of molecular biology.
• Wet lab molecular biologists still not won to the cause of single name for single entity…
• There is hope! The situation is much better than 5 years ago!!!
![Page 47: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/47.jpg)
What does a computer need to be effective?
• Well classified data• Ontologies, Classification schemes
• Well organized data• Databases, servers
![Page 48: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/48.jpg)
Index
• Why bioinformatics?• Ontologies & Classification
schemes• Databases and servers
![Page 49: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/49.jpg)
Databases & Servers
Prof:Rui [email protected]
973702406Dept Ciencies Mediques Basiques,
1st Floor, Room 1.08Website:http://web.udl.es/usuaris/pg193845/testsite/
![Page 50: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/50.jpg)
What is a Database?
• A database is a collection of data organized in such a way that it is easy to store in a computer and to mine by appropriate software
• A database is usually organized as a set of tables in which information about an object is stored
• The tables are related to each other in different ways.
![Page 51: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/51.jpg)
What does database technology allow?
•Making information useful
•Avoiding "accidental disorganisation”
•Making information easily accessible and integrated with the rest of our work
![Page 52: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/52.jpg)
S(tructured)Q(uery)L(anguange)
• ANSI (American National Standards Institute) standard computer language for accessing and manipulating database systems.
• SQL statements are used to retrieve and update data in a database.
• Includes:– Data Manipulation Language (DML)– Data Definition Language (DDL)
![Page 53: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/53.jpg)
Web Databases
• Data is accessible through Internet• Have different underlying database
models• Example: biological databases
– Molecular data: NCBI, Swissprot, PDB, KEGG, GO
– Protein interaction : DIP , BIND– Organism specific: Mouse , Worm, Yeast– Literature: Pubmed– Disease: OMIM
![Page 54: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/54.jpg)
How to make databases useful
• Attach it to a server• Let people use to mine for knowledge
![Page 55: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/55.jpg)
An example of WAMP• A simple bioinformatics class server
![Page 56: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/56.jpg)
An example of WAMP• A simple bioinformatics class server
![Page 57: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/57.jpg)
An example of WAMP• A simple bioinformatics class server
![Page 58: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/58.jpg)
An example of WAMP• A simple bioinformatics class server
Wireless
Apache
![Page 59: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/59.jpg)
An example of WAMP• The bioinformatics class server
Wireless
Apache
MySQL
PHP
![Page 60: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/60.jpg)
How close are we to have good, comprehensive & universally used
data repositories?• Not far at all!!!!!
• NCBI, KEGG, Protein databank, SGD, Uniprot,….
• Problems:• Redundant data over many databases…• Conflicting information due to the use of different
data sources, standards, and classifications
![Page 61: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/61.jpg)
A glimpse at a useful present
Data Sources
Data warehouse
Relational tools
Online analytical processing tools
Applications
![Page 62: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/62.jpg)
A glimpse of a useful present
![Page 63: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/63.jpg)
A glimpse of possible futures
![Page 64: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/64.jpg)
A glimpse of possible futures
![Page 65: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/65.jpg)
The future
• Cloud computing
• Distributed computation
• Artificial inteligence methods to facilitate data search, analysis and mining
![Page 66: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/66.jpg)
Summary
• Why bioinformatics:– Because there is simply too much data out
there for human being to deal with without computer assistance.
– Because many of the calculations to extract knowledge from the data would take too long without computers.
• How to do bioinformatics:– Organize data well using appropriate
classification systems.– Use databases and server technology.
![Page 67: creativecommons/licenses/by-sa/2.0](https://reader036.vdocuments.us/reader036/viewer/2022062719/568130b2550346895d96c834/html5/thumbnails/67.jpg)
A glimpse at a useful present