What is Bioinformatics: Bioinformatics Definitions
information below is taken from Bioinformatics
FAQ, prepared by Damian
Counsell, UK Medical Research Council Human Genome Mapping
Project Resource Centre.
What is Bioinformatics?
of Bioinformatics: What is bioinformatics?
bioinformatics describes any use of computers to handle
practice, the definition used by most people is narrower; bioinformatics
to them is a synonym for "computational molecular biology"---the
use of computers to characterize the molecular components of
What is Bioinformatics?---The
biologists talk about "doing bioinformatics" when
they use computers to store, retrieve, analyze or predict the composition or the structure of biomolecules. As computers become
more powerful you could probably add simulate to this
list of bioinformatics verbs. "Biomolecules" include
your genetic material---nucleic acids---and the products of
your genes: proteins. These are the concerns of "classical"
bioinformatics, dealing primarily with sequence analysis.
Tekaia at the Institut Pasteur offers this
definition of bioinformatics:
mathematical, statistical and computing methods that aim to
solve biological problems using DNA and amino acid sequences
and related information."
is a mathematically interesting property of most large biological
molecules that they are polymers; ordered chains
of simpler molecular modules called monomers.
Think of the monomers as beads or building blocks which, despite
having different colours and shapes, all have the same thickness
and the same way of connecting to one another.
that can combine in a in a chain are of the same general class,
but each kind of monomer in that class has its own well-defined
set of characteristics.
monomer molecules can be joined together to form a single, far
larger, macromolecule. Macromolecules can have
exquisitely specific informational content and/or chemical properties.
to this scheme, the monomers in a given macromolecule of DNA
or protein can be treated computationally as letters
of an alphabet, put together in pre-programmed arrangements
to carry messages or do work in a cell.
greatest achievement of bioinformatics methods, the Human Genome Project, is currently
being completed. Because of this the nature and priorities of
bioinformatics research and applications are changing. People
often talk portentously of our living in the " post-genomic" era. My personal view is that this will
affect bioinformatics in several ways:
we possess multiple whole genomes we can look for differences
and similarities between all the genes of multiple species.
From such studies we can draw particular conclusions about
species and general ones about evolution. This kind of science
is often referred to as comparative genomics.
are now technologies designed to measure the relative number
of copies of a genetic message (levels of gene expression)
at different stages in development or disease or in different
tissues. Such technologies, such as DNA
microarrays will grow in importance.
more direct, large-scale ways of identifying gene functions
and associations (for example yeast two-hybrid methods) will grow in significance and
with them the accompanying bioinformatics of functional
- There will be a general shift in emphasis (of sequence analysis
especially) from genes themselves to gene products.
This will lead to:
to catalogue the activities and characterize interactions
between all gene products (in humans): proteomics ).
to crystallize and or predict the structures of all proteins
(in humans): structural genomics.
DNA double-helices in bad sci-fi movies.
some people refer to as research or medical
informatics, the management of all biomedical experimental
data associated with particular molecules or patients---from
mass spectroscopy, to in vitro assays to clinical side-effects---will
move from the concern of those working in drug company and
hospital I.T. (information technology) into the mainstream
of cell and molecular biology and migrate from the commercial
and clinical to academic sectors.
This FAQ concentrates on classical bioinformatics, but will,
I hope, grow to cover more of the "post-genomic" aspects
of the field. It is worth noting that all of the above non-classical
areas of research depend upon established sequence analysis
What is Bioinformatics?---The
are other fields---for example medical imaging / image analysis
which might be considered part of bioinformatics. There is also
a whole other discipline of biologically-inspired computation; genetic algorithms, AI, neural networks. Often these areas
interact in strange ways. Neural networks, inspired by crude
models of the functioning of nerve cells in the brain, are used
in a program called PHD to predict, surprisingly accurately,
the secondary structures of proteins from their primary sequences.
almost all bioinformatics has in common is the processing of
large amounts of biologically-derived information, whether DNA
sequences or breast X-rays.
Definitions of Fields Related
biology itself grew out
of biophysics.The British Biophysical Society defines biophysics as:
interdisciplinary field which applies techniques from the
physical sciences to understanding biological structure and
about the various facets of the discipline can be found
at the society's site hosted
at Birkbeck College, London.
Goodrich wrote to ask what the status of biophysics was given
the definition of computational biology submitted by Paul Schulte
(below). A recent
article in The Scientist [free registration required] dealt with this question---thanks
to Jo Wixon (Managing Editor of Comparative
and Functional Genomics) for the reference.
is Computational Biology?
biologists might object (please
do), but, I find that people use "computational biology"
when discussing that subset of bioinformatics (in the broadest
sense) closest to the field of classical general biology.
biologists interest themselves more with evolutionary, population
and theoretical biology rather than cell and molecular biomedicine.
It is inevitable that molecular biology is profoundly important
in computational biology, but it is certainly not what
computational biology is all about (see next paragraph). In
these areas of computational biology it seems that computational
biologists have tended to prefer statistical models for biological
phenomena over physico-chemical ones. This is often wise...
computational biologist (Paul J Schulte) did object to the above
and makes the entirely valid point that this definition derives
from a popular use of the term, rather than a correct one. Paul
works on water flow in plant cells. He points out that biological
fluid dynamics is a field of computational biology in itself.
He argues that this, and any application of computing to biology,
can be described as "computational biology" (see also
the "loose" definition
of bioinformatics below). Where we disagree, perhaps, is
in the conclusion he draws from this---which I reproduce in
biology is not a "field", but an "approach"
involving the use of computers to study biological processes
and hence it is an area as diverse as biology itself."
Durbin, Head of Informatics at the Wellcome Trust Sanger Institute,
expressed an interesting opinion on this distinction in an interview:
do not think all biological computing is bioinformatics, e.g. mathematical modelling is not bioinformatics, even when connected
with biology-related problems. In my opinion, bioinformatics
has to do with management and the subsequent use of biological
information, particular genetic information."
What is Medical Informatics?
The Medical Informatics
FAQ (no relation) provides the following definition:
Informatics is an emerging discipline that has been defined
as the study, invention, and implementation of structures
and algorithms to improve communication, understanding and
management of medical information."
FAQ also points here
Zakaria, the author of the FAQ, emphasises that medical informatics
is more concerned with structures and algorithms for the manipulation
of medical data, rather than with the data itself.
suggests that one difference between bioinformatics and medical
informatics as disciplines lies with their approaches to the
data; there are bioinformaticists interested in the theory behind
the manipulation of that data and there are bioinformatics
scientists concerned with the data itself and its biological
implications. (I believe that a good bioinformatics researcher
should be interested in both of these aspects of the field.)
informatics, for practical reasons, is more likely to deal with
data obtained at "grosser" biological levels---that
is information from super-cellular systems, right up to the
population level---while most bioinformatics is concerned with
information about cellular and biomolecular structures and systems.
both of these points I'd be happy for any medical informatics
specialists to correct me.
What is Cheminformatics?
Web advertisement for Cambridge Healthtech Institute's Sixth
Annual Cheminformatics conference describes the field thus:
combination of chemical synthesis, biological screening, and
data-mining approaches used to guide drug discovery and development"
this, again, sounds more like a field being identified by some
of its most popular (and lucrative) activities, rather than
by including all the diverse studies that come under its general
The story of one of the most successful drugs of all time, penicillin,
seems bizarre, but the way we discover and develop drugs even
now has similarities, being the result of chance, observation
and a lot of slow, intensive chemistry. Until recently, drug
design always seemed doomed to continue to be a labour-intensive,
trial-and-error process. The possibility of using information
technology, to plan intelligently and to automate processes
related to the chemical synthesis of possible therapeutic compounds
is very exciting for chemists and biochemists. The rewards for
bringing a drug to market more rapidly are huge, so naturally
this is what a lot of cheminformatics works is about.
Here is a
page with a commercial slant which links to some interesting
discussions of the term "cheminformatics", what it
means, whether or not it exists as a distinct discipline, and
even whether it should be replaced by "chemoinformatics".
span of academic cheminformatics is wide and is exemplified
by the interests of the cheminiformatics groups at the Centre
for Molecular and Biomolecular Informatics at the University of Nijmegen in the Netherlands.
These interests include:
and Structure Retrieval
Tools and Utilities
University's Cheminformatics Web page,
for another example, concerns itself with cheminformatics as
the use of the Internet in chemistry.
What is Genomics?
is a field which existed before the completion of the sequences
of genomes, but in the crudest of forms, for example the oft-re-referenced
estimate of 100 000 genes in the human genome derived from a(n)
(in)famous piece of "back of an envelope" genomics,
guessing the weight of chromosomes and the density of the genes
they bear. Genomics is any attempt to analyze or compare the
entire genetic complement of a species or species (plural).
It is, of course possible to compare genomes by comparing more-or-less
representative subsets of genes within genomes.
is Mathematical Biology?
biology is easier to distinguish from bioinformatics than computational
biology. Mathematical biology also tackles biological problems,
but the methods it uses to tackle them need not be numerical
and need not be implemented in software or hardware. Indeed,
such methods need not "solve" anything; in mathematical
biology it would be considered reasonable to publish a result
which merely establishes that a biological problem belongs to
a particular general class.
distinction between bioinformatics and mathematical biology
was illuminated by an email I received from Alex Kasman at the College
of Charleston. According to his working definition, he distinguished bioinformatics which (under the tight
definition at least)...
to focus almost exclusively on specific algorithms that can
be applied to large molecular biological data sets..."
...from mathematical biology which...
things of theoretical interest which are not necessarily algorithmic,
not necessarily molecular in nature, and are not necessarily
useful in analyzing collected data."
What is Proteomics?
review on proteomics in the journal Nature defined the field
term proteome was first
coined to describe the set of proteins encoded by the
genome1. The study of the proteome, called proteomics, now
evokes not only all the proteins in any given cell, but also
the set of all protein isoforms and modifications, the interactions
between them, the structural description of proteins and their
higher-order complexes, and for that matter almost everything
J.Dunn, the Editor-in-Chief of Proteomics defines the "proteome" as:
PROTEin complement of the genOME"
proteomics to be concerned with:
and quantitative studies of gene expression at the level of
the functional proteins themselves"
interface between protein biochemistry and molecular biology"
the many tens of thousands of proteins expressed in a given
cell type at a given time---whether measuring their molecular
weights or isoelectric points, identifying their ligands or
determining their structures---involves the storage and comparison
of vast numbers of data. Inevitably this requires bioinformatics.
Here is a
constructively skeptical review by Lukas Huber.
What is Pharmacogenomics?
is the application of genomic approaches and technologies to
the identification of drug targets. Examples include trawling
entire genomes for potential receptors by bioinformatics means,
or by investigating patterns of gene expression in both pathogens
and hosts during infection, or by examining the characteristic
expression patterns found in tumours or patients samples for
diagnostic purposes (possibly in the pursuit of potential cancer
term "pharmacogenomics" is used for the more "trivial"---but
arguably more useful---application of bioinformatics approaches
to the cataloguing and processing of information relating to
pharmacology and genetics, for example the accumulation of information
in databases like this one.
(Thanks to Ivanovi.)
What is Pharmacogenetics?
individuals respond differently to drug treatments; some positively,
others with little obvious change in their conditions and yet
others with side effects or allergic reactions. Much of this
variation is known to have a genetic basis. Pharmacogenetics
is a subset of pharmacogenomics which uses genomic/bioinformatic
methods to identify genomic correlates, for example SNPs (Single Nucleotide Polymorphisms), characteristic
of particular patient response profiles and use those markers
to inform the administration and development of therapies. Strikingly,
such approaches have been used to "resurrect" drugs
thought previously to be ineffective, but subsequently found
to work with in subset of patients. They can also be used for
optimizing the doses of chemotherapy for particular patients.