the recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome...

45

Upload: others

Post on 28-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic
Page 2: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic
Page 3: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

The recent flood of data from genome sequences and functionalgenomics has given rise to new field, bioinformatics, which combines elements of biology and computer science.

Page 4: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

I - Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying “informatics” techniques (derived from disciplines such as applied math's, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale.

Page 5: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

II - Bioinformatics is the use of computers and statistics to make sense out of the huge mounds of data that are accumulating from high-throughput biological and chemical experiments, such as sequencing of whole genomes, DNA microarray chips, two-hybrid experiments, and tandem mass spectrometry.

In other words, today bioinformatics is an applied science. We use computer programs to make inferences from the data archives of modern molecular biology, to make connections among them, and to derive useful and interesting predictions.

Page 6: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

is a bioinformaticsIn short,management information system for molecular biology and has many practical applications.

Page 7: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Analyses in bioinformatics predominantly focus on three types of large datasets available in molecular biology: macromolecular structures, genome sequences, and the results of functional genomics experiments.

Page 8: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Bioinformatics employs a wide range of computational techniques including sequence and structural alignment, database design and data mining, macromolecular geometry, phylogenetictree construction, prediction of protein structure and function, gene finding, and expression data clustering.

Page 9: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

The emphasis is on approaches integrating a variety of computational methods and heterogeneous data sources. Finally, bioinformatics is a practical discipline.

Page 10: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

In general,the aims of Bioinformatics are three – folds :

at its simplest bioinformatics organizes data in a way that allows researchers to access existing information and to submit new entries as they are produced, e.g. the Protein Data Bank for 3D macromolecular structures. While data-curation is an essential task, the information stored in these databases is essentially useless until analyzed. Thus the purpose of bioinformatics extends much further.

Page 11: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

is to develop tools and resources that aid in the analysis of data. For example, having sequenced a particular protein, it is of interest to compare it with previously characterized sequences.

Page 12: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

is to use these tools to analyze the data and interpret the results in a biologically meaningful manner.

Page 13: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Traditionally, biological studies examined individual systems in detail, and frequently compared those with a few that are related. In bioinformatics, we can now conduct global analyses of all the available data with the aim of uncovering common principles that apply across many systems and highlight novel features.

Page 14: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Fig.1 Plot showing the growth of scientific publications in bioinformaticsbetween 1973 and 2000. The histogram bars(left vertical axis) counts the total number of scientific articles relating tobioinformatics, and the black line (right vertical axis) gives the percentage of the annual total of articles relating to bioinformatics. The data are taken fromPubMed.

Page 15: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

You can work at the interface between biochemistry, computer science, and mathematics, creating new solutions for high-throughput chemistry, designing analysis systems for drug design, and many other things.

Page 16: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Bioinformatics derives knowledge from computer analysis of biological data. These can consist of the information stored in the genetic code, but also experimental results from various sources, patient statistics, and scientific literature.Research in bioinformatics includes method development for storage, retrieval, and analysis of the data.

Page 17: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Bioinformatics is a rapidly developing branch of biology and is highly interdisciplinary, using techniques and concepts from informatics, statistics, mathematics, chemistry, biochemistry, physics, and linguistics. It has many practical applications in different areas of biology and medicine.

Page 18: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.

Page 19: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Biology in the 21st century is being transformed from a purely lab-based science to an information science as well.

Page 20: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological information generated by the scientific community. This deluge of genomic information has, in turn, led to an absolute requirement for computerized databases to store, organize, and index the data and for specialized tools to view and analyze the data.

Page 21: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

A biological database is a large, organized body of persistent data, usuallyassociated with computerized software designed to update, query, and retrieve components of the data stored within the system.

Page 22: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

For example, a record associated with a nucleotide sequence database typically contains information such as contact name, the input sequence with a description of the type of molecule, the scientific name of the source organism from which it was isolated, and often, literature citations associated with the sequence. For researchers to benefit from the data stored in a database, two additional requirements must be met: easy access to the information, and a method for extracting only that information needed to answer a specific biological question.

Page 23: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Analyses in bioinformatics focus on three types of datasets: genome sequences, macromolecular structures, and functionalgenomics experiments.

But bioinformatic analysis is also applied to various other data, e.g. taxonomy trees, relationship data from metabolic pathways, the text of scientific papers, and patient statistics.

Page 24: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Bioinformatics has a large impact on biological research. Giant research projects such as the human genome project would be meaningless without the bioinformatics component. The goal of sequencing projects, for example, is not to corroborate or refute a hypothesis, but to provide raw data for later analysis. Once the raw data are available, hypotheses may be formulated and tested in silico. In this manner, computer experiments may answer biological questions which cannot be tackled by traditional approaches.

Page 25: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

• Three key areas are so important:

• 1- the organization of knowledge in databases,

• 2- sequence analysis, and • 3- structural bioinformatics.

Page 26: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

The rationale for applying computational approaches to facilitate the understanding of various biological processes includes a more global perspective in experimental design.

Page 27: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

The ability to capitalize on the emerging technology of database-mining - the process by which testable hypotheses are generated regarding the function or structure of a gene or protein of interest by identifying similar sequences in better characterized organisms.

Page 30: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

The human genome will have profound effects on the fields of biomedical research and clinical medicine. Every disease has a genetic component. This may be inherited (as is the case with an estimated 3000-4000 hereditary disease including Cystic Fibrosis and Huntingtons disease) or a result of the body's response to an environmental stress which causes alterations in the genome (eg. cancers, heart disease, diabetes.).

Page 31: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

The completion of the human genome means that we can search for the genes directly associated with different diseases and begin to understand the molecular basis of these diseases more clearly. This new knowledge of the molecular mechanisms of disease will enable better treatments, cures and even preventative tests to be developed.

Page 32: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Clinical medicine will become more personalized with the development of the field ofpharmacogenomics. This is the study of how an individual's genetic inheritance affects the body's response to drugs. At present, some drugs fail to make it to the market because a small percentage of the clinical patient population show adverse affects to a drug due to sequence variants in their DNA.

Page 33: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

As a result, potentially lives saving drugs never make it to the marketplace. Today, doctors have to use trial and error to find the best drug to treat a particular patient as those with the same clinical symptoms can show a wide range of responses to the same treatment. In the future, doctors will be able to analyze a patient's genetic profile and prescribe the best available drug therapy and dosage from the beginning.

Page 34: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

With the specific details of the genetic mechanisms of diseases being unraveled, the development of diagnostic tests to measure a person's susceptibility to different diseases may become a distinct reality. Preventative actions such as change of lifestyle or having treatment at the earliest possible stages when they are more likely to be successful, could result in huge advances in our struggle toconquer disease.

Page 35: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

In the not too distant future, the potential for using genes themselves to treat disease may become a reality. Gene therapy is the approach used to treat, cure or even prevent disease by changing the expression of a persons genes. Currently, this field is in its infantile stage with clinical trials for many different types of cancer and other diseases ongoing.

Page 36: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

At present all drugs on the market target only about 500 proteins. With an improved understanding of disease mechanisms and using computational tools to identify and validate new drug targets, more specific medicines that act on the cause, not merely the symptoms, of the disease can be developed. These highly specific drugs promise to have fewer side effects than many of today's medicines.

Page 37: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Comparative genetics of the plant genomes has shown that the organization of theirgenes has remained more conserved over evolutionary time than was previously believed. These findings suggest that information obtained from the model crop systems can be used to suggest improvements to other food crops. At present the complete genomes ofArabidopsis thaliana (water cress) and Oryza sativa (rice) are available.

Page 38: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Genes from Bacillus thuringiensis that can control a number of serious pests have been successfully transferred to cotton, maize and potatoes. This new ability of the plants to resist insect attack means that the amount of insecticides being used can be reduced and hence the nutritional quality of the crops is increased.

Page 39: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Scientists have recently succeeded in transferring genes into rice to increase levels of Vitamin A, iron and other micronutrients. This work could have a profound impact in reducing occurrences of blindness and anemia caused by deficiencies in Vitamin A and iron respectively. Scientists have inserted a gene from yeast into the tomato, and the result is a plant whose fruit stays longer on the vine and has an extended shelf life..

Page 40: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Progress has been made in developing cereal varieties that have a greater tolerance for soil alkalinity, free aluminum and iron toxicities. These varieties will allow agriculture to succeed in poorer soil areas, thus adding more land to the global production base. Research is also in progress to produce crop varieties capable of tolerating reduced water conditions.

Page 41: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Sequencing projects of many farm animals including cows, pigs and sheep are now well under way in the hope that a better understanding of the biology of these organisms will have huge impacts for improving the production and health of livestock and ultimately have benefits for human nutrition.

Page 42: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Analyzing and comparing the genetic material of different species is an important method for studying the functions of genes, the mechanisms of inherited diseases and species evolution.Bioinformatics tools can be used to make comparisons between the numbers, locations and biochemical functions of genes in different organisms.

Page 43: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

Organisms that are suitable for use in experimental research are termed model organisms. They have a number of properties that make them ideal for research purposes including short life spans, rapid reproduction, being easy to handle, inexpensive and they can be manipulated at the genetic level.

Page 44: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic

An example of a human model organism is the mouse. Mouse and human are very closely related (>98%) and for the most part we see a one to ne correspondence between genes in the two species. Manipulation of the mouse at the molecular level and genome comparisons between the two species can and is revealing detailed information on the functions of human genes, the evolutionary relationship between the two species and the molecular mechanisms of many human diseases.

Page 45: The recent flood of data from field combines elements of ... 100.pdf · types of datasets: genome sequences, macromolecular structures, and functional genomics experiments. But bioinformatic