bioinformatics jobs, bioinformatics career, what is bioinformatics, biotech companies, pharmaceutical companies, etc.

::Bioinformatics Resources::

   Home > Blog

DNA Barcoding in Plants & Its Potential Applications

Today, barcodes conceived by Bernard Silver, a graduate student at Drexel Institute of Technology in Philadelphia, PA, and his friends Norman Woodland and Jordin Johanson in 1948 are used universally. They play a critical role for identification purposes, relational information, and tracking. They are especially useful because scanners are relatively inexpensive, extremely accurate and highly efficient at obtaining and transmitting information from barcodes and their databases.

Natural barcodes (a short strand of deoxyribonucleic acid (DNA) (the genetic code unique to each living organism and some viruses) that consists of between 300-800 base pairs (bps) - Adenine (A)-Thymidine (T), and Cytosine (C)-Guanine (G)) that can be represented by different colors) also exist and are well established in the animal kingdom. Through sequencing of the cytochrome oxidase 1 (CO1) gene (inspired by biologist Paul Hebert's futile efforts dating back to the 1970s to identify 2000 species of moth in Papua New Guinea (because of their taxonomic and morphological similarities), his "retreat to water fleas" (of which there are only 200 species) and subsequent 2003 paper in which he described "the diversity of life as a 'harsh burden' to biologists" and suggested, "every species on Earth... be assigned a simple DNA bar code so it would be easy to tell them apart" as written in Scanning Life (National Geographic, May 2010)), which is present in the mitochondrial DNA of every multi-cellular organism, scientists are able to readily determine phylogeny (identification) on a molecular level and store it in databases for easy retrieval. Per P.M. Hollingsworth, DNA bar-coding plants in biodiversity hot spots: Progress and outstanding questions (Heredity, 9 April 2008) "DNA bar-coding is now routinely used for organismal identification" in animals and "has contributed to the discovery of new species."

However, per Mark W. Chase, Nicolas Salamin, Mike Wilkinson, James M. Dunwell, Rao Prasad Kesanakurthi, Nadia Haidar, and Vincent Savolainen, Land plants and DNA barcodes: short-term and long-term goals (Philosophical Transactions Of The Royal Society, 2005) this has not been the case with plants until recently since their CO1 gene does not have the ability to serve as a barcode gene and because they "have had the reputation of being problematic for DNA bar-coding" due to "low levels of variability" and lack of variation in "plastid phylogenetic markers." This view prevailed until 2008 when a team led by Dr. Vincent Savolainen of Imperial College London's Department of Life Sciences and The Royal Botanic Gardens, Kew, studied the functionality of the megakaryocyte-associated tyrosine-protein kinase (matK) gene located in the intron of trnK chloroplast genes found in plant leaves. Their research found that the matK gene (which "contained significant species-level genetic variability and divergence, conserved flanking sites for developing PCR (polymerase chain reaction, a process that enables scientists to produce millions of copies of a specific DNA sequence in about two hours while bypassing the need to use bacteria to amplify DNA) primers for wide taxonomic application, [and] a short sequence length... to facilitate... DNA extraction and amplification") as reported by W. John Kress and David L. Erickson, DNA barcodes: Genes, genomics, and bioinformatics (PNAS. Vol. 105, No. 8. 26 February 2008) and in Polymerase Chain Reaction (PCR) (Gene Almanac. Dolan DNA Learning Center and Cold Spring Harbor Laboratory, Inc. 2009) could be used to differentiate between at least 90% of all plants, including those that appeared identical to the human eye, known as cryptic species because of their identical appearance and genetic differences.

The matK gene, though, was found ineffective in distinguishing between up to 10% of plant species because of two major factors:

1. When variation resulting from "rapid bursts of speciation" was small, and 
2. Based on Anna-Marie Lever's article, DNA 'barcode' revealed in plants (BBC News, 6 February 2008), when plants were hybrids whose genome was rearranged through natural and artificial cross-breeding, which "confuse[d] matK gene information"

When discovery that the matK gene could serve as a natural barcode in plants was made, its location was consistent with that in animals - the barcode genes in both are located in cellular energy centers outside the nucleus (mitochondria serve as "tiny powerhouses" in animal cells while chloroplasts are involved in plant photosynthesis) since per Anna-Marie Lever, DNA 'barcode' revealed in plants, "nuclear genes usually evolve too rapidly to distinguish between [organisms] of the same species." However, consistent with mitochronidrial genes in animals, "chloroplast genes [in plants] evolve at a slower rate, allowing for [distinguishment between the same species, and] fast enough for differences to occur in the DNA code between species."

The only exception between plants and animals is the range of effectiveness for their respective barcode genes. The CO1 gene can be effectively used to determine and record phylogeny in nearly 100% of animal species while the matK gene is ineffective in about 10% of plant species. The key reason for the 90% effective range with regard to the matK gene can be attributed to natural crossbreeding, which is significantly more common to plants than animals. Because of this, matK gene information needs to be supplemented by data from another gene. Although studies utilizing trnH-psbA genes that share similar characteristics to matK showed promise (when sequencing of matK and trnH-psbA was utilized involving plants of the nutmeg family (Myristicaceae) the effective range for correct identification rose to approximately 95%), a panel of 52 leading barcoding scientists opted on using the ribulose-bisphosphate carboxylase (rbcL) gene (also located in plant chloroplasts) outlined in a 2009 paper published in Proceedings of the National Academy of Sciences as reported by Daniel Cressey, DNA barcodes for plants a step closer (Nature, 27 July 2009), to effectively complete the barcode for the 10% group.

While discovery of the phylogenetic usefulness of the matK gene is relatively novel, studies indicating the phylogenetic usefulness of the rbcL gene date back as far as 1986 when Jane Aldrich, Barry Cherney, Ellis Merlin and Jeff Palmer reported in Nucleic Acids Research that sequencing of rbcL genes showed petunia and tobacco and alfalfa and peas are 97.3% and 94.1% genetically identical when comparing their bps.

Additional studies, to name two, added further evidence of the phylogenetic usefulness of the rbcL gene. One, reported by Mitsuyasu Hasebe, Tomokyuki Omori, Miyuki Nakazawa, Toshio Sano, Masahiro Kato, and Kunio Iwatsuki in rbcL Gene sequences provide evidence for the evolutionary lineages of leptosporangiate ferns (Proceedings of the National Academy of Sciences, June 1994) utilized PCR-amplified rbcL fragments in 58 species of leptosporangiate ferns, which belong to the pteridophyte class (vascular plants that produce spores to reproduce in lieu of flowers and seeds) which has the longest evolutionary history of any vascular land plant (consequently enduring the greatest loss of plylogentically useful data) to capture their evolutionary links. The other, reported by Hiroaki Setoguchi, Takeshi Asakawa Osawa, Jean-Christophe Pintaud, Tanguy Jaffré, and Jean-Marie Veillon in Phylogenetic relationships within Araucariaceae based on rbcL gene sequences (American Journal of Botany, 1998) utilized rbcL gene sequencing to successfully determine the pylogenetic relationship between 29 species of Araucariaceae (a sample representing nearly every existing species of the ancient family of conifers that achieved maximum diversification during the Jurassic (c. 199.6± 0.6 to 145.5± 4 million years ago (Ma) and Cretaceous periods (c. 145.5 ± 4 to 65.5 ± 0.3 Ma)).

During the study that led to the discovery that a plant's matK gene could serve as a primary barcode gene, Dr. Savolainen's team compared eight potential candidate genes and analyzed more than 1600 plant DNA samples obtained from the tropical forests of Costa Rica and the temperate region of Kruger National Park, South Africa, two of the world's leading biodiversity hotspots.

Through sequencing of the matK gene (which has a slightly different code for plants of different species and a near identical code for plants of the same species), they were able to distinguish between a thousand orchid species - plants known for their difficulty to differentiate because of their near identical appearance, especially when sterile. Consequently, per Plant DNA 'Barcode' identified (Medical News Today, 6 February 2008), "...what was previously assumed to be one species of orchid was [found to be] two distinct species that live on different slopes of the mountains [with] differently shaped flowers adapted for different pollinating insects."

Scientific analysis, in which the matK gene was divided into five sectors has determined that the sector 3 (known as 3') region is the most effective area in providing useful phylogenetic information. When broken down further, 140 out of the 306 bps of the 3' region were phylogenetically informative.

Establishment of the matK gene's barcode function supplemented by use of the rbcL gene, represents a major breakthrough in plant science since it offers a diverse range of potential applications that can be used by scientists and plant taxonomists/systematists as well as an opportunity to close the large gap that presently exists between plant and animal barcoding.

Such potential applications include but are not limited to:

1. Accurate identification of plant species, especially those of cryptic species that are difficult to differentiate that could potentially lead to discovery of new species. Presently as stated by Anna-Marie Lever, DNA 'barcode' revealed in plants only a "few experts [can] accurately identify the plant composition of biodiverse hotspots." 
2. Accurate identification of botanic components in foods and medicines. 
3. Detection of undesirable plant material in processed foods by health inspectors. 
4. Tracking of plant species (e.g. migration). 
5. Locating of endangered species for habitat preservation. 
6. Detection of illegal transport/trade of endangered species to protect them from potential harm. 
7. Confirmation or identification of plant-insect associations. 
8. Expansion and facilitation of botanical medical research.

However, before this can be achieved, the following steps must be taken:

1. Establishment of a genetic database that can be uploaded into a portable scanner so that data can be readily available based on the analysis of a mere leaf/tissue sample. To enhance identification of known species and speed up discovery of new species, such a database must be massive and available online. 
2. Establishment of a search method or algorithm to search and access DNA barcode information from an online database. 
3. Establishment of a set of reference standards (which includes barcoding based solely on bp extractions from matK and rbcL genes) utilizing existing plant DNA specimens held at botanical gardens, herbariums, museums and other DNA repositories. For example, the Consortium for the Barcode of Life based at the Smithsonian Institution's National Museum of Natural History in Washington, D.C. has identified over two million of the estimated ten million species of plants, animals, and fungi (many still unnamed) while the Royal Botanic Gardens, Kew currently holds 23,000+ plant DNA samples. This is especially important since fresh collection efforts aimed at obtaining the DNA of every plant species, are impractical since they would require significant effort and time. 
4. Collection efforts conducted in accordance with international laws (e.g. Convention of Biological Diversity) to protect habitats and ensure specimen integrity. Per W. John Kress and David L. Erickson, DNA barcodes: Genes, genomics, and bioinformatics, such collection efforts are necessary since existing specimens are limited in quantity and may consist of degraded DNA. Currently collection efforts are being made in temperate (Plummers Island, MD and New York City, among others) and tropical (Forest Dynamics Plot, Panama, among others) regions.

With a stamp of approval from the United Nations that declared 2010 "the International Year of Biodiversity," vigorous global efforts aimed at barcoding 500,000 out of the 1.7 million named species of plants, animals, and fungi by 2015, are being pursued by international teams of scientists as well as by groups/projects such as the Consortium for the Barcode of Life and soon, the International Barcode of Life (iBol) project, which is slated to launch in July 2010. Furthermore, Paul Hebert, the biologist who inspired the barcode movement and a major participant in the iBol project, per Scanning Life (National Geographic, May 2010) declared, "the approach is scalable to the planet [so that by 2025 every] species humans encounter frequently will [have been] barcoded."

With regard to plants, as technology is enhanced to exploit the genetic code of matK and rbcL genes through the establishment of a uniform database, production of inexpensive portable scanners capable of analyzing leaf/tissue samples and matching the DNA barcode with database information, the field of botanical phylogenetics and research will benefit greatly especially since plant identification and classification will be available to more than a few experts. In addition, such identification and classification will be more accurate than that provided by sole reliance on visual examination and physical morphology (especially with regard to cryptic species) while endangered species will be able to be easily tracked and better protected, and people will have greater assurance pertaining to the food, drinks, and/or medicine they consume.

Additional References:

José A. Jurado-Rivera, Alfried P. Vogler, Chris A.M. Reid, Eduard Petitpierre, and Jesús Gómez-Zurita. DNA barcoding insect-host plant associations. The Royal Society. 17 October 2008.

Khidir W. Hilu and Hongping Liang. The MatK Gene: Sequence Variation And Application In Plant Systematics. American Journal of Botany 84(6). 1997.

Steve Newmaster, Aron Fazekas, Royce Steeves, and John Janovec. Testing plant barcoding regions in South American wild nutmeg trees. Botany 2008.

William Sutherland is a published poet and writer. He is the author of three books, "Poetry, Prayers & Haiku" (1999), "Russian Spring" (2003) and "Aaliyah Remembered: Her Life & The Person behind the Mystique" (2005) and has been published in poetry anthologies around the world. He has been featured in "Who's Who in New Poets" (1996), "The International Who's Who in Poetry" (2004), and is a member of the "International Poetry Hall of Fame." He is also a contributor to Wikipedia, the number one online encyclopedia and has had an article featured in "Genetic Disorders" Greenhaven Press (2009).


© Copyright by All rights reserved.