Download - Introduction to Bioinformatics
The Queensland Brain Institute |
Introduction to BioinformaticsA tale of myths and legends
April 12, 2023
[Freevector]
The Queensland Brain Institute | April 12, 2023
“Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline.”
National Center for Biotechnology Information(NCBI)
The Queensland Brain Institute | April 12, 2023
Areas where bioinformatics is applied
Genomics– Genomic feature prediction– Sequencing data analysis
Proteomics– Protein 3D structure modeling– Drug design
Systems Biology– Gene set enrichment– Pathway analysis
Phenotype– Image analysis– Integration
The Queensland Brain Institute | April 12, 2023
Approach
1. Biological Question2. Generate Data3. Translate into a computer
solvable task4. Develop an algorithm5. Implement algorithm6. Run algorithm7. Condense result in human
readable form8. Answer Biological
Question
Example1. Genes regulated by protein
X 2. ChIP-Seq data3. “Align reads and identify
clusters in the genome”4. Choose data structures5. Write source code6. Align reads7. Write script to summarize
results genome wide 8. Report protein’s binding
sites
“Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline.”
NCBI
The Queensland Brain Institute | April 12, 2023
The challenges in bioinformatics
• Acceptance by biological collaborators when all that matters for the publication is the biology
• Retaining quality work– Workflows poorly annotated in papers– Programs poorly written– No reproducibility
• Keeping up-to-date– New programs are published every week– New formats because no time to evaluate existing
standards– New databases because existing ones full of noise
The Queensland Brain Institute | April 12, 2023
Bioinformatics a mythical creature?
Christos OuzounisHead of the Computational Genomics Group at the European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Cambridge UK
The Queensland Brain Institute | April 12, 2023
Myth #1: Anybody can do it!
• Assumption: Most bioinformatics analysis can be done by using web applications and commercial programs with GUIs
http://www.broadinstitute.org/cancer/software/genepattern/index.html http://main.g2.bx.psu.edu/
Ouzounis C. Two or three myths about bioinformatics. Bioinformatics. 2000Mar;16(3):187-9. PubMed PMID: 10869011.
The Queensland Brain Institute | April 12, 2023
Customized answers require expert input
• Anyone can do predefined analysis with web pages and off-the-shelf programs, however– Using tools without understanding the methodology is
dangerous➲ “Anyone” needs to understand algorithmic papersE.g. Program might produce output that has a certain bias, not knowing this the
researchers could publish this artificial bias as biological result.
– A standard bioinformatics tool that works well for general tasks does not exist
E.g. Local-Alignment Algorithm (1981) vs. PCR (1983) in NGS
– Only novel tools/pipelines can provide customized answers➲ “Anyone” needs to be proficient in programming to write the
required algorithms/scriptsE.g. You might have to settle for comparing your features with known genes because the
program is not able to compare to novel transcripts
Smith, Temple F.; and Waterman, Michael S. (1981). "Identification of Common Molecular Subsequences". Journal of Molecular Biology 147: 195–197.
!
The Queensland Brain Institute | April 12, 2023
Myth #2: Bioinformatics is a service
• Assumption: Bioinformatics merely supports the experimental research and can be a disconnect service
Hypothesis Experiment EyeballingBiology Experimental Design
Hypothesis Experiment
Data analysis
Biology Experimental Design
Bioinformatics
Traditional Biology
High Throughput Biology (assumption)
Evaluation
Evaluation
The Queensland Brain Institute | April 12, 2023
!Interdisciplinary analysis requires an interdisciplinary team throughout
• Standard data analysis can be a service task, however – Having a service performed without knowing the methodology
is dangerous. ➲ “Service” needs to make scientific decisions to take the
assumptions under which the data was produced into account.Repeating a statistical test for all genes requires an E-value to be calculated.
– Producing data not suitable for the planned analysis is wasteful.
➲ “Service” needs to have scientific input in the experimental design to ensure the data can be analyzed.
Comparing the distribution of mapped reads of runs with different read lengths will result in a difference that is due to the mapping bias of different read lengths.
Experiment Analysis EvaluationExperimental DesignHypothesis
High Throughput Biology
The Queensland Brain Institute | April 12, 2023
Myth #4: Bioinformatics is quick
• Assumption: bioinformatics analysis can be done quickly because computers are involved.
http://www.ads-links.com/images/wp/keyboard-fast.gif
The Queensland Brain Institute | April 12, 2023
Bioinformatics analysis is a scientific experiment in itself
• Bioinformatics is faster than manual work, however– Quick tasks accumulated take a long time
Task: Map 15 million reads of 76 bp length against the complete human genome (hg18)
• Manual: couple of decades• Brute-Force: couple of years• BLAST (1995): couple of days• Modern Aligners: BWA ~ 4 h
– Bioinformatics is a proper scientific experiment in itself requires time for experimental design, development of controls, parameter tuning, evaluation, and summarizing.
!
Bao S, Jiang R, Kwan W, Wang B, Ma X, Song YQ. Evaluation of next-generation sequencing software in mapping and assembly. J Hum Genet. 2011 Apr 28. PubMed PMID: 21525877.
The Queensland Brain Institute | April 12, 2023
Myth #5: All dry-lab research does the same
• Assumption: bioinformatics is interchangeable with other dry lab research area because they all “analyze data”.
• Assumption: All biological research areas are interchangeable because they all “work with samples”.
The Queensland Brain Institute | April 12, 2023
Three things to remember
1) Bioinformatics requires dedication and continuity2) Bioinformatics data analysis is a full research
experiment in itself3) We get the most out of our research if we work
as a interdisciplinary research team throughout
Experiment Analysis EvaluationExperimental DesignHypothesis
The Queensland Brain Institute | April 12, 2023
Next week:
Abstract: An introduction to second generation sequencing will be given with focus on the production informatics: The basic approach of read-mapping and feature extraction will be introduced and challenges associated with sequencing errors discussed.
http://web.qbi.uq.edu.au/labs/gseq/analysis/bioinformatics-seminar-series/
The Queensland Brain Institute | April 12, 2023
TIP
Puts several images in one fileconvert -adjoin unicorn.png
unicorn.png unicorn.png adjoin.pdf
Joins several images into one imageconvert –append unicorn.png unicorn.png unicorn.png append.pdf