What is Bioinformatics: Bioinformatics Definitions
Definitions: What is Bioinformatics?
- The Tight Definition
- The Loose Definition
- Definitions of fields related to bioinformatics
Roughly, bioinformatics describes any use of computers to handle biological information.
In practice, the definition used by most people is narrower; bioinformatics to them is a synonym for "computational molecular biology"---the use of computers to characterize the molecular components of living things.
Most biologists talk about "doing bioinformatics" when they use computers to store, retrieve, analyze or predict the composition or the structure of biomolecules. As computers become more powerful you could probably add simulate to this list of bioinformatics verbs. "Biomolecules" include your genetic material---nucleic acids---and the products of your genes: proteins. These are the concerns of "classical" bioinformatics, dealing primarily with sequence analysis.
Fredj Tekaia at the Institut Pasteur offers this definition of bioinformatics:
"The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information."
It is a mathematically interesting property of most large biological molecules that they are polymers; ordered chains of simpler molecular modules called monomers. Think of the monomers as beads or building blocks which, despite having different colours and shapes, all have the same thickness and the same way of connecting to one another.
Monomers that can combine in a in a chain are of the same general class, but each kind of monomer in that class has its own well-defined set of characteristics.
Many monomer molecules can be joined together to form a single, far larger, macromolecule. Macromolecules can have exquisitely specific informational content and/or chemical properties.
According to this scheme, the monomers in a given macromolecule of DNA or protein can be treated computationally as letters of an alphabet, put together in pre-programmed arrangements to carry messages or do work in a cell.
The greatest achievement of bioinformatics methods, the Human Genome Project, is currently being completed. Because of this the nature and priorities of bioinformatics research and applications are changing. People often talk portentously of our living in the " post-genomic" era. My personal view is that this will affect bioinformatics in several ways:
- Now we possess multiple whole genomes we can look for differences and similarities between all the genes of multiple species. From such studies we can draw particular conclusions about species and general ones about evolution. This kind of science is often referred to as comparative genomics.
- There are now technologies designed to measure the relative number of copies of a genetic message (levels of gene expression) at different stages in development or disease or in different tissues. Such technologies, such as DNA microarrays will grow in importance.
- Other, more direct, large-scale ways of identifying gene functions and associations (for example yeast two-hybrid methods) will grow in significance and with them the accompanying bioinformatics of functional genomics.
- There will be a general shift in emphasis (of sequence analysis
especially) from genes themselves to gene products.
This will lead to:
- attempts to catalogue the activities and characterize interactions between all gene products (in humans): proteomics ).
- attempts to crystallize and or predict the structures of all proteins (in humans): structural genomics.
- fewer DNA double-helices in bad sci-fi movies.
- What some people refer to as research or medical informatics, the management of all biomedical experimental data associated with particular molecules or patients---from mass spectroscopy, to in vitro assays to clinical side-effects---will move from the concern of those working in drug company and hospital I.T. (information technology) into the mainstream of cell and molecular biology and migrate from the commercial and clinical to academic sectors.
This FAQ concentrates on classical bioinformatics, but will, I hope, grow to cover more of the "post-genomic" aspects of the field. It is worth noting that all of the above non-classical areas of research depend upon established sequence analysis techniques.
There are other fields---for example medical imaging / image analysis which might be considered part of bioinformatics. There is also a whole other discipline of biologically-inspired computation; genetic algorithms, AI, neural networks. Often these areas interact in strange ways. Neural networks, inspired by crude models of the functioning of nerve cells in the brain, are used in a program called PHD to predict, surprisingly accurately, the secondary structures of proteins from their primary sequences.
What almost all bioinformatics has in common is the processing of large amounts of biologically-derived information, whether DNA sequences or breast X-rays.
"an interdisciplinary field which applies techniques from the physical sciences to understanding biological structure and function"
More information about the various facets of the discipline can be found at the society's site hosted at Birkbeck College, London.
Mike Goodrich wrote to ask what the status of biophysics was given the definition of computational biology submitted by Paul Schulte (below). A recent article in The Scientist [free registration required] dealt with this question---thanks to Jo Wixon (Managing Editor of Comparative and Functional Genomics) for the reference.
Computational biologists might object (please do), but, I find that people use "computational biology" when discussing that subset of bioinformatics (in the broadest sense) closest to the field of classical general biology.
Computational biologists interest themselves more with evolutionary, population and theoretical biology rather than cell and molecular biomedicine. It is inevitable that molecular biology is profoundly important in computational biology, but it is certainly not what computational biology is all about (see next paragraph). In these areas of computational biology it seems that computational biologists have tended to prefer statistical models for biological phenomena over physico-chemical ones. This is often wise...
One computational biologist (Paul J Schulte) did object to the above and makes the entirely valid point that this definition derives from a popular use of the term, rather than a correct one. Paul works on water flow in plant cells. He points out that biological fluid dynamics is a field of computational biology in itself. He argues that this, and any application of computing to biology, can be described as "computational biology" (see also the "loose" definition of bioinformatics below). Where we disagree, perhaps, is in the conclusion he draws from this---which I reproduce in full:
"Computational biology is not a "field", but an "approach" involving the use of computers to study biological processes and hence it is an area as diverse as biology itself."
"I do not think all biological computing is bioinformatics, e.g. mathematical modelling is not bioinformatics, even when connected with biology-related problems. In my opinion, bioinformatics has to do with management and the subsequent use of biological information, particular genetic information."
The Medical Informatics FAQ (no relation) provides the following definition:
"Biomedical Informatics is an emerging discipline that has been defined as the study, invention, and implementation of structures and algorithms to improve communication, understanding and management of medical information."
That FAQ also points here
Aamir Zakaria, the author of the FAQ, emphasises that medical informatics is more concerned with structures and algorithms for the manipulation of medical data, rather than with the data itself.
This suggests that one difference between bioinformatics and medical informatics as disciplines lies with their approaches to the data; there are bioinformaticists interested in the theory behind the manipulation of that data and there are bioinformatics scientists concerned with the data itself and its biological implications. (I believe that a good bioinformatics researcher should be interested in both of these aspects of the field.)
Medical informatics, for practical reasons, is more likely to deal with data obtained at "grosser" biological levels---that is information from super-cellular systems, right up to the population level---while most bioinformatics is concerned with information about cellular and biomolecular structures and systems.
On both of these points I'd be happy for any medical informatics specialists to correct me.
The Web advertisement for Cambridge Healthtech Institute's Sixth Annual Cheminformatics conference describes the field thus:
"the combination of chemical synthesis, biological screening, and data-mining approaches used to guide drug discovery and development"
but this, again, sounds more like a field being identified by some of its most popular (and lucrative) activities, rather than by including all the diverse studies that come under its general heading.
The story of one of the most successful drugs of all time, penicillin, seems bizarre, but the way we discover and develop drugs even now has similarities, being the result of chance, observation and a lot of slow, intensive chemistry. Until recently, drug design always seemed doomed to continue to be a labour-intensive, trial-and-error process. The possibility of using information technology, to plan intelligently and to automate processes related to the chemical synthesis of possible therapeutic compounds is very exciting for chemists and biochemists. The rewards for bringing a drug to market more rapidly are huge, so naturally this is what a lot of cheminformatics works is about.
Here is a page with a commercial slant which links to some interesting discussions of the term "cheminformatics", what it means, whether or not it exists as a distinct discipline, and even whether it should be replaced by "chemoinformatics".
The span of academic cheminformatics is wide and is exemplified by the interests of the cheminiformatics groups at the Centre for Molecular and Biomolecular Informatics at the University of Nijmegen in the Netherlands. These interests include:
- Synthesis Planning
- Reaction and Structure Retrieval
- 3-D Structure Retrieval
- Computational Chemistry
- Visualisation Tools and Utilities
Genomics is a field which existed before the completion of the sequences of genomes, but in the crudest of forms, for example the oft-re-referenced estimate of 100 000 genes in the human genome derived from a(n) (in)famous piece of "back of an envelope" genomics, guessing the weight of chromosomes and the density of the genes they bear. Genomics is any attempt to analyze or compare the entire genetic complement of a species or species (plural). It is, of course possible to compare genomes by comparing more-or-less representative subsets of genes within genomes.
Mathematical biology is easier to distinguish from bioinformatics than computational biology. Mathematical biology also tackles biological problems, but the methods it uses to tackle them need not be numerical and need not be implemented in software or hardware. Indeed, such methods need not "solve" anything; in mathematical biology it would be considered reasonable to publish a result which merely establishes that a biological problem belongs to a particular general class.
The distinction between bioinformatics and mathematical biology was illuminated by an email I received from Alex Kasman at the College of Charleston. According to his working definition, he distinguished bioinformatics which (under the tight definition at least)...
"...seems to focus almost exclusively on specific algorithms that can be applied to large molecular biological data sets..."
...from mathematical biology which...
"...includes things of theoretical interest which are not necessarily algorithmic, not necessarily molecular in nature, and are not necessarily useful in analyzing collected data."
A recent review on proteomics in the journal Nature defined the field this way:
"The term proteome was first coined to describe the set of proteins encoded by the genome1. The study of the proteome, called proteomics, now evokes not only all the proteins in any given cell, but also the set of all protein isoforms and modifications, the interactions between them, the structural description of proteins and their higher-order complexes, and for that matter almost everything 'post-genomic'."
Michael J.Dunn, the Editor-in-Chief of Proteomics defines the "proteome" as:
"the PROTEin complement of the genOME"
and proteomics to be concerned with:
"qualitative and quantitative studies of gene expression at the level of the functional proteins themselves"
"an interface between protein biochemistry and molecular biology"
Characterizing the many tens of thousands of proteins expressed in a given cell type at a given time---whether measuring their molecular weights or isoelectric points, identifying their ligands or determining their structures---involves the storage and comparison of vast numbers of data. Inevitably this requires bioinformatics. Here is a constructively skeptical review by Lukas Huber.
Pharmacogenomics is the application of genomic approaches and technologies to the identification of drug targets. Examples include trawling entire genomes for potential receptors by bioinformatics means, or by investigating patterns of gene expression in both pathogens and hosts during infection, or by examining the characteristic expression patterns found in tumours or patients samples for diagnostic purposes (possibly in the pursuit of potential cancer therapy targets).
The term "pharmacogenomics" is used for the more "trivial"---but arguably more useful---application of bioinformatics approaches to the cataloguing and processing of information relating to pharmacology and genetics, for example the accumulation of information in databases like this one. (Thanks to Ivanovi.)
All individuals respond differently to drug treatments; some positively, others with little obvious change in their conditions and yet others with side effects or allergic reactions. Much of this variation is known to have a genetic basis. Pharmacogenetics is a subset of pharmacogenomics which uses genomic/bioinformatic methods to identify genomic correlates, for example SNPs (Single Nucleotide Polymorphisms), characteristic of particular patient response profiles and use those markers to inform the administration and development of therapies. Strikingly, such approaches have been used to "resurrect" drugs thought previously to be ineffective, but subsequently found to work with in subset of patients. They can also be used for optimizing the doses of chemotherapy for particular patients.