Definitions:
What is Bioinformatics?
Definition
of Bioinformatics: What is bioinformatics?
Roughly,
bioinformatics describes any use of computers to handle
biological information.
In
practice, the definition used by most people is narrower; bioinformatics
to them is a synonym for "computational molecular biology"---the
use of computers to characterize the molecular components of
living things.
What is Bioinformatics?---The
Tight Definition
"Classical" bioinformatics
Most
biologists talk about "doing bioinformatics" when
they use computers to store, retrieve,
analyze or predict the composition
or the structure of biomolecules. As computers become
more powerful you could probably add simulate to this
list of bioinformatics verbs. "Biomolecules" include
your genetic material---nucleic acids---and the products of
your genes: proteins. These are the concerns of "classical"
bioinformatics, dealing primarily with sequence analysis.
Fredj
Tekaia at the Institut Pasteur offers this
definition of bioinformatics:
"The
mathematical, statistical and computing methods that aim to
solve biological problems using DNA and amino acid sequences
and related information."
It
is a mathematically interesting property of most large biological
molecules that they are polymers; ordered chains
of simpler molecular modules called monomers.
Think of the monomers as beads or building blocks which, despite
having different colours and shapes, all have the same thickness
and the same way of connecting to one another.
Monomers
that can combine in a in a chain are of the same general class,
but each kind of monomer in that class has its own well-defined
set of characteristics.
Many
monomer molecules can be joined together to form a single, far
larger, macromolecule. Macromolecules can have
exquisitely specific informational content and/or chemical properties.
According
to this scheme, the monomers in a given macromolecule of DNA
or protein can be treated computationally as letters
of an alphabet, put together in pre-programmed arrangements
to carry messages or do work in a cell.
"New" bioinformatics
The
greatest achievement of bioinformatics methods, the Human Genome Project, is currently
being completed. Because of this the nature and priorities of
bioinformatics research and applications are changing. People
often talk portentously of our living in the "
post-genomic" era. My personal view is that this will
affect bioinformatics in several ways:
- Now
we possess multiple whole genomes we can look for differences
and similarities between all the genes of multiple species.
From such studies we can draw particular conclusions about
species and general ones about evolution. This kind of science
is often referred to as comparative genomics.
- There
are now technologies designed to measure the relative number
of copies of a genetic message (levels of gene expression)
at different stages in development or disease or in different
tissues. Such technologies, such as DNA
microarrays will grow in importance.
- Other,
more direct, large-scale ways of identifying gene functions
and associations (for example
yeast two-hybrid methods) will grow in significance and
with them the accompanying bioinformatics of functional
genomics.
-
There will be a general shift in emphasis (of sequence analysis
especially) from genes themselves to gene products.
This will lead to:
- attempts
to catalogue the activities and characterize interactions
between all gene products (in humans): proteomics
).
- attempts
to crystallize and or predict the structures of all proteins
(in humans): structural genomics.
- fewer
DNA double-helices in bad sci-fi movies.
- What
some people refer to as research or medical
informatics, the management of all biomedical experimental
data associated with particular molecules or patients---from
mass spectroscopy, to in vitro assays to clinical side-effects---will
move from the concern of those working in drug company and
hospital I.T. (information technology) into the mainstream
of cell and molecular biology and migrate from the commercial
and clinical to academic sectors.
This FAQ concentrates on classical bioinformatics, but will,
I hope, grow to cover more of the "post-genomic" aspects
of the field. It is worth noting that all of the above non-classical
areas of research depend upon established sequence analysis
techniques.
What is Bioinformatics?---The
Loose definition
There
are other fields---for example medical imaging / image analysis
which might be considered part of bioinformatics. There is also
a whole other discipline of biologically-inspired computation;
genetic algorithms, AI, neural networks. Often these areas
interact in strange ways. Neural networks, inspired by crude
models of the functioning of nerve cells in the brain, are used
in a program called PHD to predict, surprisingly accurately,
the secondary structures of proteins from their primary sequences.
What
almost all bioinformatics has in common is the processing of
large amounts of biologically-derived information, whether DNA
sequences or breast X-rays.
Definitions of Fields Related
to Bioinformatics
What
is Biophysics?
Molecular
biology itself grew out
of biophysics.The British Biophysical Society
defines biophysics as:
"an
interdisciplinary field which applies techniques from the
physical sciences to understanding biological structure and
function"
More information
about the various facets of the discipline can be found
at the society's site hosted
at Birkbeck College, London.
Mike
Goodrich wrote to ask what the status of biophysics was given
the definition of computational biology submitted by Paul Schulte
(below). A recent
article in The Scientist
[free registration required] dealt with this question---thanks
to Jo Wixon (Managing Editor of Comparative
and Functional Genomics) for the reference.
What
is Computational Biology?
Computational
biologists might object (please
do), but, I find that people use "computational biology"
when discussing that subset of bioinformatics (in the broadest
sense) closest to the field of classical general biology.
Computational
biologists interest themselves more with evolutionary, population
and theoretical biology rather than cell and molecular biomedicine.
It is inevitable that molecular biology is profoundly important
in computational biology, but it is certainly not what
computational biology is all about (see next paragraph). In
these areas of computational biology it seems that computational
biologists have tended to prefer statistical models for biological
phenomena over physico-chemical ones. This is often wise...
One
computational biologist (Paul J Schulte) did object to the above
and makes the entirely valid point that this definition derives
from a popular use of the term, rather than a correct one. Paul
works on water flow in plant cells. He points out that biological
fluid dynamics is a field of computational biology in itself.
He argues that this, and any application of computing to biology,
can be described as "computational biology" (see also
the "loose" definition
of bioinformatics below). Where we disagree, perhaps, is
in the conclusion he draws from this---which I reproduce in
full:
"Computational
biology is not a "field", but an "approach"
involving the use of computers to study biological processes
and hence it is an area as diverse as biology itself."
Richard
Durbin, Head of Informatics at the Wellcome Trust Sanger Institute,
expressed an interesting opinion on this distinction in an
interview:
"I
do not think all biological computing is bioinformatics, e.g.
mathematical modelling is not bioinformatics, even when connected
with biology-related problems. In my opinion, bioinformatics
has to do with management and the subsequent use of biological
information, particular genetic information."
What is Medical Informatics?
The
Medical Informatics
FAQ (no relation) provides the following definition:
"Biomedical
Informatics is an emerging discipline that has been defined
as the study, invention, and implementation of structures
and algorithms to improve communication, understanding and
management of medical information."
That
FAQ also points here
Aamir
Zakaria, the author of the FAQ, emphasises that medical informatics
is more concerned with structures and algorithms for the manipulation
of medical data, rather than with the data itself.
This
suggests that one difference between bioinformatics and medical
informatics as disciplines lies with their approaches to the
data; there are bioinformaticists interested in the theory behind
the manipulation of that data and there are bioinformatics
scientists concerned with the data itself and its biological
implications. (I believe that a good bioinformatics researcher
should be interested in both of these aspects of the field.)
Medical
informatics, for practical reasons, is more likely to deal with
data obtained at "grosser" biological levels---that
is information from super-cellular systems, right up to the
population level---while most bioinformatics is concerned with
information about cellular and biomolecular structures and systems.
On
both of these points I'd be happy for any medical informatics
specialists to
correct me.
What is Cheminformatics?
The
Web advertisement for Cambridge Healthtech Institute's Sixth
Annual Cheminformatics conference describes the field thus:
"the
combination of chemical synthesis, biological screening, and
data-mining approaches used to guide drug discovery and development"
but
this, again, sounds more like a field being identified by some
of its most popular (and lucrative) activities, rather than
by including all the diverse studies that come under its general
heading.
The story
of one of the most successful drugs of all time, penicillin,
seems bizarre, but the way we discover and develop drugs even
now has similarities, being the result of chance, observation
and a lot of slow, intensive chemistry. Until recently, drug
design always seemed doomed to continue to be a labour-intensive,
trial-and-error process. The possibility of using information
technology, to plan intelligently and to automate processes
related to the chemical synthesis of possible therapeutic compounds
is very exciting for chemists and biochemists. The rewards for
bringing a drug to market more rapidly are huge, so naturally
this is what a lot of cheminformatics works is about.
Here is a
page with a commercial slant which links to some interesting
discussions of the term "cheminformatics", what it
means, whether or not it exists as a distinct discipline, and
even whether it should be replaced by "chemoinformatics".
The
span of academic cheminformatics is wide and is exemplified
by the interests of the cheminiformatics groups at the Centre
for Molecular and Biomolecular Informatics at the University of Nijmegen in the Netherlands.
These interests include:
- Synthesis
Planning
- Reaction
and Structure Retrieval
- 3-D
Structure Retrieval
- Modelling
- Computational
Chemistry
- Visualisation
Tools and Utilities
Trinity
University's Cheminformatics Web page,
for another example, concerns itself with cheminformatics as
the use of the Internet in chemistry.
What is Genomics?
Genomics
is a field which existed before the completion of the sequences
of genomes, but in the crudest of forms, for example the oft-re-referenced
estimate of 100 000 genes in the human genome derived from a(n)
(in)famous piece of "back of an envelope" genomics,
guessing the weight of chromosomes and the density of the genes
they bear. Genomics is any attempt to analyze or compare the
entire genetic complement of a species or species (plural).
It is, of course possible to compare genomes by comparing more-or-less
representative subsets of genes within genomes.
What
is Mathematical Biology?
Mathematical
biology is easier to distinguish from bioinformatics than computational
biology. Mathematical biology also tackles biological problems,
but the methods it uses to tackle them need not be numerical
and need not be implemented in software or hardware. Indeed,
such methods need not "solve" anything; in mathematical
biology it would be considered reasonable to publish a result
which merely establishes that a biological problem belongs to
a particular general class.
The
distinction between bioinformatics and mathematical biology
was illuminated by an email I received from Alex Kasman at the College
of Charleston. According to his working definition, he distinguished
bioinformatics which (under the tight
definition at least)...
"...seems
to focus almost exclusively on specific algorithms that can
be applied to large molecular biological data sets..."
...from
mathematical biology which...
"...includes
things of theoretical interest which are not necessarily algorithmic,
not necessarily molecular in nature, and are not necessarily
useful in analyzing collected data."
What is Proteomics?
A
recent
review on proteomics in the journal Nature defined the field
this way:
"The
term proteome was first
coined to describe the set of proteins encoded by the
genome1. The study of the proteome, called proteomics, now
evokes not only all the proteins in any given cell, but also
the set of all protein isoforms and modifications, the interactions
between them, the structural description of proteins and their
higher-order complexes, and for that matter almost everything
'post-genomic'."
Michael
J.Dunn, the Editor-in-Chief of Proteomics
defines the "proteome" as:
"the
PROTEin complement of the genOME"
and
proteomics to be concerned with:
"qualitative
and quantitative studies of gene expression at the level of
the functional proteins themselves"
that
is:
"an
interface between protein biochemistry and molecular biology"
Characterizing
the many tens of thousands of proteins expressed in a given
cell type at a given time---whether measuring their molecular
weights or isoelectric points, identifying their ligands or
determining their structures---involves the storage and comparison
of vast numbers of data. Inevitably this requires bioinformatics.
Here is a
constructively skeptical review by Lukas Huber.
What is Pharmacogenomics?
Pharmacogenomics
is the application of genomic approaches and technologies to
the identification of drug targets. Examples include trawling
entire genomes for potential receptors by bioinformatics means,
or by investigating patterns of gene expression in both pathogens
and hosts during infection, or by examining the characteristic
expression patterns found in tumours or patients samples for
diagnostic purposes (possibly in the pursuit of potential cancer
therapy targets).
The
term "pharmacogenomics" is used for the more "trivial"---but
arguably more useful---application of bioinformatics approaches
to the cataloguing and processing of information relating to
pharmacology and genetics, for example the accumulation of information
in databases like this one.
(Thanks to Ivanovi.)
What is Pharmacogenetics?
All
individuals respond differently to drug treatments; some positively,
others with little obvious change in their conditions and yet
others with side effects or allergic reactions. Much of this
variation is known to have a genetic basis. Pharmacogenetics
is a subset of pharmacogenomics which uses genomic/bioinformatic
methods to identify genomic correlates, for example SNPs (Single
Nucleotide Polymorphisms), characteristic
of particular patient response profiles and use those markers
to inform the administration and development of therapies. Strikingly,
such approaches have been used to "resurrect" drugs
thought previously to be ineffective, but subsequently found
to work with in subset of patients. They can also be used for
optimizing the doses of chemotherapy for particular patients.
|