Tuesday, April 15, 2014


Regular readers of my blog might have noticed that I have a weak spot for the DIY movement especially in biology and biotechnology. I try to follow the news in this sector and secured them a spot in the DNA Barcoding Bulletin that we produce quarterly. That being said, I was surprised that I didn't find out about the BioCoder newsletter that is published by O’Reilly.

O'Reilly is an american media company that mostly publishes books on computer technology topics. Their distinctive brand features a woodcut of an animal on many of their book covers. If you don't know what I am talking about I suggest you type in "O’Reilly covers" in a Google Image search. The animal illustrations are quite beautiful. I have a few of their books in my shelf and my favorite is a book on BLAST showing a coelacanth on the cover (see image below).

So what is BioCoder? Here is what their website has to say about that:
We’re at the start of a revolution that will transform our lives as radically as the computer revolution of the 70s. The biological revolution will touch every aspect of our lives: food and health, certainly, but also art, recreation, law, business, and much more.

BioCoder is the newsletter of that revolution. It’s about biology as it moves from research labs into startup incubators, hacker spaces, and even homes. It’s about a very old programming language that we’re just beginning to understand, and that’s written in a code made up of organic chemicals. It’s the product of a sharing community of scientists that stretches from grade school to post docs and university faculty.

The new spring issue contains an article about DNA Barcoding of fungi coming from a DIY lab in Victoria here in Canada. The organism choice clearly tells me that DIY people are up for challenges and not necessarily aiming for the low hanging fruits. The article is pretty interesting also from a technical standpoint and there is a part II following in the next issue. 

Great new resource. Well, not that new. This was their third issue. So, it is new to me but I am sure not news to the DIY biohack community.

Monday, April 14, 2014

When pharmaceuticals become too effective

Sepsid fly
The veterinary pharmaceutical ivermectin has been used for more than thirty years all over the world to combat parasites like roundworms, lice and mites in livestock and pets. The active ingredient belongs to the chemical group of avermectins, which generally disrupt cell transport. However, when ivermectin is used in high dosage excess quantities are excreted in the faeces of treated animals which also harms dung-degrading beneficial insects like dung beetles and dung flies. This has a profound impact on the the functioning of surrounding ecosystems. In extreme cases the dung is not decomposed and the pasture is destroyed.

Since 2000 public regulators in many countries therefore mandate standardized safety tests for the use of avermectin derivatives. A research team consisting of scientists from the University of Zurich and an ecotoxicology company in Germany, has now shown that the currently used safety tests are not able to sufficiently prevent environmental damage. Even closely related dung organisms react with varying degrees of sensitivity to the same veterinary pharmaceutical.

The group examined 23 species of sepsid flies that typically live in cow dung. It turns out that individual species vary by a factor of 500 in their sensitivity to ivermectin. Standardized safety tests typically performed in toxicology laboratories today are based on single, arbitrarily selected dung organisms. This poses the considerable risk that more sensitive species will continue to be harmed by ivermectin and that important ecosystem functions will suffer long-term damage as a consequence. In order to prevent this, safety tests should be extended to include a representative selection of all dung-degrading organisms, if not the entire community:

We close by reiterating that sepsid flies are very well suited as test organisms for any toxic residues in the dung of livestock or other large vertebrates, due to their ease and speed of rearing and handling. While the choice of a particular species will be crucial because species vary strongly in sensitivity, use of several local species can offset the arbitrariness of choice to some degree, rendering overall representative results. Sepsids as ecotoxicological test organisms could be particularly useful and economical in the tropics, where high-tech laboratory equipment is often not available.

By including more species in the tests costs for the authorization process would increase especially because all relevant organisms would need to be properly identified. For that reason the authors suggest to include DNA Barcoding in the test protocol as its inclusion would represent a rather modest increase in costs.

Friday, April 11, 2014

A different take on Escargot

Gastropod shells and bodies extracted after microwaving
And today for something completely different. Let's start with a description of the problem:
Extracting DNA from gastropods presents particular difficulties due to the capacity of the living animal to retract into the shell, resulting in poor penetration of the ethanol into the tissues. Because the shell is essential to establish the link between sequences and traditional taxonomic identity, cracking the shell to facilitate fixation is not ideal. 

This sounds very familiar to me. While working on my masters project I had to remove tissue from coiled shells of a number terrestrial gastropods and some of those specimens were quite small and delicate. Most of the time I was working with a dissecting probe which tip I had bent to be able to reach the fully retracted animal. A very tedious and not always successful method to retrieve a tiny tissue sample for DNA analysis. Over the years a variety of methods to retrieve tissue without damaging the shell have been developed but for the most part they are suffering from the same problem. Due to the fact that they all take a fair bit of time they are not useful for large scale surveys or expeditions.

In a new paper a group of French researchers present an alternative method for the easy, efficient and nondestructive tissue removal from shells. It involves the use of a regular microwave oven. The use of microwaves in molecular biology is actually not unknown and has been applied in the extraction of DNA from viruses, bacteria, soil micro-organisms, and animal tissue. The colleagues placed the living gastropods in a microwave oven in which the electromagnetic radiation very quickly heats both the animal and the water trapped inside the shell, which results in the separation of the muscles that anchor the animal to the shell. If done properly, the body can be removed intact from the shell and the shell voucher is undamaged as well. The authors conducted comparative tests to find out if microwaving the snail tissue will have any effect on DNA extraction or subsequent PCRs. They couldn't find any difference in DNA quantity or quality.

The method was then implemented on a large scale during expeditions, resulting in higher percentage of DNA extraction success. The microwaves are also effective for quickly and easily removing other molluscs from their shells, that is, bivalves and scaphopods. Workflows implementing the microwave technique show a three- to fivefold increase in productivity compared with other methods.

That seems to be worth the effort. I wish we had thought of that 12 years ago.

Thursday, April 10, 2014

...and another record

Paul Hebert documenting the exciting find
Today's post shows the way in which sampling programs launched for barcode programs can deliver unexpected surprises.

Back in 2006 our institute decided to engage in the International Polar Year, a large scientific program that focused on the Arctic and the Antarctic and officially covered two full annual cycles from March 2007 to March 2009. Our contribution was to develop a comprehensive biodiversity inventory for a sub-arctic region, in our case Churchill, Manitoba. There is a variety of reasons to chose this spot of all in Canada. Churchill is situated along the Hudson Bay seacoast at the meeting of three major biomes: marine, northern boreal forest, and tundra which makes it biologically very interesting. Furthermore, Churchill is home to an accessible and active research centre/station which provides accommodations, meals, equipment rentals, and logistical support to researchers. A lot of stations in Canada's North have been closed over the past years and only recently it was decided to build a new one in the High Arctic. Another contributing factor was that our department ran arctic ecology courses at this particular station enabling us to engange students in the inventory work. The goal was rather simple - a comprehensive inventory of all live in the Churchill region and explicitly using DNA Barcoding to accomplish this. 

I had the chance to participate in two expeditions and I vividly remember the first one of them in 2006. We had about 20 highly motivated students and at least 10 senior researchers. One day we encountered a moth that was actually a rather rare visitor to the region and a recent paper now proofs that it was actually the most northerly find ever. The Black Witch Moth (Ascalapha odorata) is a seasonal migrant to more northerly regions of North America but it was never found that far north. It is thought to breed in Central America and the southernmost United States. 

So after the German altitude record not long ago we have another record for the books.

Wednesday, April 9, 2014

Barcodes to validate Mitogenomes

mtDNA (image 'stolen' here)
Today I found an article published in Mitchondrial DNA which tackles a problem that I have encountered myself in a couple of situations. Unfortunately, the article is hiding behind a paywall even for me at a university with rather good library access. This is particularly frustrating given the rather important message and recommendations provided in the publication.

The researchers deal with the issue coming from the misidentification of biological samples used for generating entire mitogenomes. As a consequence mitgenomes are attributed to incorrect species. This can have even more profound implications if the misidentified sequence ends up as the reference mitochondrial genome for the species in public curated databases such as the RefSeq section of GenBank. Large genomic databases are often used for annotation of unknown genes. As a result errors propagate quickly and a wrong species ID will spread across the entire database. That is not a fault of the people that designed and operate databases but rather of the data submitters that often fail to do their due diligence. Other problems that could reduce the quality of mitgenomic data are the potential occurrence of NUMTs or contamination which are also known issues in DNA Barcoding research. 

The example the colleagues used are in relation to a recently published sequence of the complete mitochondrial genome of a bat called Leschenault’s rousette (Rousettus leschenaultii), allegedly providing the second mitogenome for this genus of pteropodid bats in addition to the available Egyptian fruit bat (Rousettus aegyptiacus). By re-analyzing the mitogenome in comparison with available mitochondrial sequences, the authors were able to show that this sequence does not belong to Rousettus leschenaultii and that it is most probably a second mitochondrial genome for Rousettus aegyptiacus

I can relate to this as I share this experience. When I started working on an analysis of mitochondrial fish genes the first step was to download all available mitogenomes or better all coding genes of those. As my analysis also included a comparison of divergence values for different lengths of COI I started with identification runs on BOLD for each sequence and sure enough I found two mitogenomes that were not identified correctly. The identifications were actually way off and could not be explained by any variation in the gene region. I knew the sequences on BOLD were properly identified by experts and when I looked at the original publication that used the mitogenomic data I couldn't find any information on the location the fish were collected let alone any voucher information. In the end I had to leave the sequences out of any further analysis.

The problem is not new and supposedly well known in the scientific community although it might have been underestimated. What makes this paper unique though is a set of recommendations provided to help with quality control and the short list that follows should be hanging over each desk belonging to a genomic researcher:

(1) Provide detailed information on the origin of the sample used for mitogenomic sequencing. 
Ideally the sample should be attached to a specimen voucher deposited in a recognized museum and accessible through multi-institution, multi-collection databases.

(2) Conduct a phylogenetic analysis of the new mitogenome in the context of closely related species.
We therefore suggest using the 20 phylogenetically closest taxa that should allow for a clear depiction of both the evolutionary affinities of the new mitgenome and the degree of divergence as compared to its closest relatives.

(3) Provide a barcoding identification assessment of the sample thanks to a ML tree based on the closest available sequences.
..the strength of these databases [BOLD and Genbank] relies on the detection of misidentified sequences provided that sequences are available for the same marker for different individuals and populations of a given taxon.

Tuesday, April 8, 2014

New tools review

Today a post about new developments from the world of DNA barcoding informatics. I selected three publications of the last few months that actually provide some new package worth to be tested by the community. Without further ado my little collection of new bioinformatic releases.

This idea starts with the notion that a modern DNA barcoding approach should incorporate the multispecies coalescent. The multispecies coalescent model was developed as a framework to infer species phylogenies from multilocus genetic data collected from multiple individuals. It assumes that speciation occurs at a specific point in time, after which two new species evolve in total isolation. However, in reality speciation may occur over an extended period of time, during which the two sister lineages likely remain in some sort of contact. Inferring phylogenies with multiple species under those conditions is actually very difficult and requires a fair amount of computation time. Using the approach with DNA Barcode data is a little simpler as one element of complexity has been removed by using only one gene region or as in this publication just two as the authors make the following bold statement:  recent developments make a barcoding approach that utilizes a single locus outdated. I beg to differ as I don't see the point of sequencing a plethora of loci when one does the trick already but that is a topic for another post sometime. The nice thing about this approach is the fact that it utilizes already existing software and algorithms. Everyone familiar with this approach should find it easy to follow their recipe that uses BPP (Yang and Rannala 2010) and *BEAST to produce a guide tree for the subsequent BPP analysis. 

This coalescent-based *BEAST/BPP approach was used to identify species boundaries. The colleagues used a test set of Sarcophaga species to compare a distance based approach with their new method: We found that, of  the 31 species of Sarcophaga examined..., 27 could be reliably distinguished by barcoding when a 4% sequence divergence threshold was applied. The four problematic taxa were S. megafilosia, S. meiofilosia, S. crassipalpis and S ruficornis. S. megafilosia and S. meiofilosia had an interspecific divergence of 2.81%, while S. crassipalpis and S ruficornis had an interspecific divergence of 3.75%. The success rate of barcoding for this set of taxa is thus 87%, while the *BEAST/BPP approach had a success rate of 100%. 

The only question I have is why divergence values of 2.81% and 3.75% where considered problematic in the first place. That can only happen if a fixed value is used to define species boundaries. Who does that?

ExcaliBAR is a small routine to facilitate one important initial step in DNA Barcoding analyses, namely the determination of the barcoding gap between pairwise genetic distances among and within species, based on original distance matrices computed by MEGA software. In addition, the software is able to rename sequences downloaded via the standard user interfaces of public databases such as GenBank, without the need of developing and applying specific scripts for this purpose.
This is an interesting little tool although I have to admit that aside from the very useful renaming of sequence names which make the resulting file compatible with other software I don't see the full advantage. From what I understood reading the paper the routine takes a MEGA output containing a pairwise distance matrix. ExcaliBAR then calculates intra- or interspecific pairwise distances that can be exported e.g. into Excel to determine a threshold above which sequences are likely to represent different species. The authors claim that the program is actually performing better than other software such as ABGD or SpideR. They even have the guts to state that similar to the other program the ‘Barcode Gap Analysis’ option on BOLD was not devised to handle large datasets. I beg to differ as BOLD is probably still the best option available to deal with large datasets and it has been criticized for using distance based methods to accomplish this. ExcaliBAR still needs about 30 min to process a matrix generated from 5000 DNA Barcodes. Not bad but are they really better than others?

One criticism provided in the publication on ExcaliBAR was about the fact that some programs are using R. R is a free software programming language and software environment for statistical computing and graphics, and yes, there is a bit of a learning curve involved to develop the mastery of using it properly.

Adhoc is a new method to deal with incomplete reference libraries of DNA barcodes is based on ad hoc distance thresholds that are calculated for each library considering the estimated probability of relative identification errors. By using each sequence of a reference library as a query against all other reference sequences the program can calculate the relative identification error (RE) of the best close match method. Prior to that Adhoc generates some basic descriptive statistics of the imported dataset providing  two tables containing species names, full sequence identifiers, and numbers of sequences and haplotypes for each species. It also returns the length of each reference sequence, calculates all pairwise distances and separates intra- and interspecific pairwise comparisons. In their publication the authors also provide a very important disclaimer:
This method has been developed for specimen identification. It is intended to optimise the identification success rate by adapting the distance threshold according to a RE estimated from a particular reference library. Hence, using this method for species delimitation requires a careful interpretation of the output.

I think that disclaimer should be found on every bioinformatic tool for DNA Barcoding.

Friday, April 4, 2014

Cover cropping

Cover crops are typically defined as crops used to protect agricultural soils and to improve soil productivity. Historically, farmers have relied on green manure crops to add nutrients and organic matter to their soil. Typically, green manure crops are grown for a specific period of time, and then plowed under and incorporated into the soil while green or shortly after flowering. Cover crops have also been used to protect the soil from wind and water erosion, to interrupt disease cycles and suppress weeds; and sometimes as supplemental feed for livestock or to provide an additional food source for pollinators and other beneficial insects. 

This traditional form of plant diversification may also promote natural regulation of agricultural pests by supporting alternative prey that in turn enable the increase of generalist arthropod predator densities and diversities. The larger the densities of these predators, the higher the consumption of herbivore pests - provided that the pest remains the favorite prey. However, predator diet composition changes induced by cover cropping are poorly understood.

A group of French researchers used a metabarcoding approach to assess the diet of eight ground-dwelling predators commonly found in banana plantations in Martinique. They used a shortened fragment of COI from the gut contents of predators to identify their prey and to identify predators of the major pest of banana, Cosmopolites sordidus.  The researchers were particularly interested in differences in the composition of predator diets between a bare soil plot and a cover cropped plot of the banana plantation as the cover crop Brachiaria decumbens is increasingly used to control weeds and improve physical soil properties. They were able to demonstrate that the use of a cover crop in banana plantations altered the arthropod food web, with significant changes in the frequency of consumption of some of the prey. An increase in alternative prey in the diet of the predators induces a diet shift that seems to dampen the positive effects of cover crops on pest regulation. The predators actually increase consumption of non-pests without increasing consumption of pests. 

The study closes with a general assessment of the use of metabarcoding for research on trophic interactions:
In conclusion, it is essential to disentangle trophic interactions in order to achieve a better understanding of ecosystem resilience and persistence following disturbances, such as plant diversification. DNA metabarcoding allows direct inference of trophic interactions and enables the assessment of arthropod diet. Although the method has limitations, including the inability to discriminate between direct predation, secondary predation, and scavenging, it has the potential to be very useful for describing arthropod food webs. Here, we identified new and unexpected trophic interactions in the predator–prey system in banana plantations. The accurate determination of trophic networks will challenge current models of trophic interactions and will contribute to food web theory and ecosystem management. In addition to its application to individual food webs, DNA metabarcoding could be used to link different food webs, such as those that describe micro-organisms, plants, arthropods, and larger animals.