Transcript
Page 1: Introduction to Bioinformatics

The Queensland Brain Institute |

Introduction to BioinformaticsA tale of myths and legends

April 12, 2023

[Freevector]

Page 2: Introduction to Bioinformatics

The Queensland Brain Institute | April 12, 2023

“Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline.”

National Center for Biotechnology Information(NCBI)

Page 3: Introduction to Bioinformatics

The Queensland Brain Institute | April 12, 2023

Areas where bioinformatics is applied

Genomics– Genomic feature prediction– Sequencing data analysis

Proteomics– Protein 3D structure modeling– Drug design

Systems Biology– Gene set enrichment– Pathway analysis

Phenotype– Image analysis– Integration

Page 4: Introduction to Bioinformatics

The Queensland Brain Institute | April 12, 2023

Approach

1. Biological Question2. Generate Data3. Translate into a computer

solvable task4. Develop an algorithm5. Implement algorithm6. Run algorithm7. Condense result in human

readable form8. Answer Biological

Question

Example1. Genes regulated by protein

X 2. ChIP-Seq data3. “Align reads and identify

clusters in the genome”4. Choose data structures5. Write source code6. Align reads7. Write script to summarize

results genome wide 8. Report protein’s binding

sites

“Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline.”

NCBI

Page 5: Introduction to Bioinformatics

The Queensland Brain Institute | April 12, 2023

The challenges in bioinformatics

• Acceptance by biological collaborators when all that matters for the publication is the biology

• Retaining quality work– Workflows poorly annotated in papers– Programs poorly written– No reproducibility

• Keeping up-to-date– New programs are published every week– New formats because no time to evaluate existing

standards– New databases because existing ones full of noise

Page 6: Introduction to Bioinformatics

The Queensland Brain Institute | April 12, 2023

Bioinformatics a mythical creature?

Christos OuzounisHead of the Computational Genomics Group at the European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Cambridge UK

Page 7: Introduction to Bioinformatics

The Queensland Brain Institute | April 12, 2023

Myth #1: Anybody can do it!

• Assumption: Most bioinformatics analysis can be done by using web applications and commercial programs with GUIs

http://www.broadinstitute.org/cancer/software/genepattern/index.html http://main.g2.bx.psu.edu/

Ouzounis C. Two or three myths about bioinformatics. Bioinformatics. 2000Mar;16(3):187-9. PubMed PMID: 10869011.

Page 8: Introduction to Bioinformatics

The Queensland Brain Institute | April 12, 2023

Customized answers require expert input

• Anyone can do predefined analysis with web pages and off-the-shelf programs, however– Using tools without understanding the methodology is

dangerous➲ “Anyone” needs to understand algorithmic papersE.g. Program might produce output that has a certain bias, not knowing this the

researchers could publish this artificial bias as biological result.

– A standard bioinformatics tool that works well for general tasks does not exist

E.g. Local-Alignment Algorithm (1981) vs. PCR (1983) in NGS

– Only novel tools/pipelines can provide customized answers➲ “Anyone” needs to be proficient in programming to write the

required algorithms/scriptsE.g. You might have to settle for comparing your features with known genes because the

program is not able to compare to novel transcripts

Smith, Temple F.; and Waterman, Michael S. (1981). "Identification of Common Molecular Subsequences". Journal of Molecular Biology 147: 195–197.

!

Page 9: Introduction to Bioinformatics

The Queensland Brain Institute | April 12, 2023

Myth #2: Bioinformatics is a service

• Assumption: Bioinformatics merely supports the experimental research and can be a disconnect service

Hypothesis Experiment EyeballingBiology Experimental Design

Hypothesis Experiment

Data analysis

Biology Experimental Design

Bioinformatics

Traditional Biology

High Throughput Biology (assumption)

Evaluation

Evaluation

Page 10: Introduction to Bioinformatics

The Queensland Brain Institute | April 12, 2023

!Interdisciplinary analysis requires an interdisciplinary team throughout

• Standard data analysis can be a service task, however – Having a service performed without knowing the methodology

is dangerous. ➲ “Service” needs to make scientific decisions to take the

assumptions under which the data was produced into account.Repeating a statistical test for all genes requires an E-value to be calculated.

– Producing data not suitable for the planned analysis is wasteful.

➲ “Service” needs to have scientific input in the experimental design to ensure the data can be analyzed.

Comparing the distribution of mapped reads of runs with different read lengths will result in a difference that is due to the mapping bias of different read lengths.

Experiment Analysis EvaluationExperimental DesignHypothesis

High Throughput Biology

Page 11: Introduction to Bioinformatics

The Queensland Brain Institute | April 12, 2023

Myth #4: Bioinformatics is quick

• Assumption: bioinformatics analysis can be done quickly because computers are involved.

http://www.ads-links.com/images/wp/keyboard-fast.gif

Page 12: Introduction to Bioinformatics

The Queensland Brain Institute | April 12, 2023

Bioinformatics analysis is a scientific experiment in itself

• Bioinformatics is faster than manual work, however– Quick tasks accumulated take a long time

Task: Map 15 million reads of 76 bp length against the complete human genome (hg18)

• Manual: couple of decades• Brute-Force: couple of years• BLAST (1995): couple of days• Modern Aligners: BWA ~ 4 h

– Bioinformatics is a proper scientific experiment in itself requires time for experimental design, development of controls, parameter tuning, evaluation, and summarizing.

!

Bao S, Jiang R, Kwan W, Wang B, Ma X, Song YQ. Evaluation of next-generation sequencing software in mapping and assembly. J Hum Genet. 2011 Apr 28. PubMed PMID: 21525877.

Page 13: Introduction to Bioinformatics

The Queensland Brain Institute | April 12, 2023

Myth #5: All dry-lab research does the same

• Assumption: bioinformatics is interchangeable with other dry lab research area because they all “analyze data”.

• Assumption: All biological research areas are interchangeable because they all “work with samples”.

Page 14: Introduction to Bioinformatics

The Queensland Brain Institute | April 12, 2023

Three things to remember

1) Bioinformatics requires dedication and continuity2) Bioinformatics data analysis is a full research

experiment in itself3) We get the most out of our research if we work

as a interdisciplinary research team throughout

Experiment Analysis EvaluationExperimental DesignHypothesis

Page 15: Introduction to Bioinformatics

The Queensland Brain Institute | April 12, 2023

Next week:

Abstract: An introduction to second generation sequencing will be given with focus on the production informatics: The basic approach of read-mapping and feature extraction will be introduced and challenges associated with sequencing errors discussed.

http://web.qbi.uq.edu.au/labs/gseq/analysis/bioinformatics-seminar-series/

Page 16: Introduction to Bioinformatics

The Queensland Brain Institute | April 12, 2023

TIP

Puts several images in one fileconvert -adjoin unicorn.png

unicorn.png unicorn.png adjoin.pdf

Joins several images into one imageconvert –append unicorn.png unicorn.png unicorn.png append.pdf


Top Related