Thursday, January 24, 2013

The seven deadly sins of DNA Barcoding (1)

Failure to test clear hypotheses


As promised I will have a closer look at a recent publication that listed deficiencies that they identified as common in DNA Barcoding research.Seven deadly sins were identified. I decided to give each 'sin' it's own blog post in which I will try to briefly comment on it. And now without further due the first sin.

According to the authors Collins and Cruickshank one of the gravest sins in many DNA Barcoding studies is the lack of clearly stated, objective hypotheses. I've heard similar criticism before especially in barcoding related research. Actually some of those concerns have already been raised when researchers began to sequence larger amounts of DNA such as in the Human Genome Project. Objectively the criticism has some merit. It is based on the classical hypothetical-deductive model which states that scientific inquiry progresses by formulating a hypothesis that could be falsified by a test on observable data. Scientific hypotheses are generally based on previous observations that cannot satisfactorily be explained with the available theories. Unfortunately many colleagues are not willing to accept that the massive accumulation of sequence information also qualifies as previous observation in this respect and some (not the authors of the paper at hand) dismiss DNA Barcoding studies and theses as endeavors without leading hypothesis and the authors as "stamp-collectors".

I strongly recommend that the assembly of DNA libraries should be considered a hypothesis generating exercise. Today we are in the fortunate situation that biological observation is not limited to visual perception and experiments alone. Genetic information is relatively easy to retrieve and enables us to ask questions because we have assembled large amounts of standardized data. As a consequence the barcode community has been approaching journals (PLoSONE, Molecular Ecology Resources) that are willing to publish articles that are called data release paper. Those are outlets for researchers that have assembled DNA Barcode libraries for their work and the community at large. In the classical hypothetical-deductive environment there would be no venue to present the data to the public let alone getting credit for it as the work was primarily a collection of data points which were generated to derive hypotheses from it. Data release paper usually come with only a limited number of analyses and are very descriptive. Nevertheless, this strategy has two main advantages. One being the fact that the data are much earlier released and available to the public through databases such as BOLD and Genbank (Sharing, sharing, sharing!). The other plus is that a researcher is credited for work that is not necessarily following age-old standards (it's all about incentives!).

Collins and Cruickshank also state "If the data collected are intended to be used as an identification tool, then they should be tested as such. Conversely, if a study aims to test the suitability of DNA barcoding as a biodiversity assessment tool (species discovery), then hypotheses of species richness should be estimated independently of the taxonomic names, and then compared a posteriori." 

I believe the main criticism is that many DNA Barcoding studies do not explicitly test the identification success. Usually a more descriptive analysis is given that shows that there is sufficient variation between species and very low variation within them. The famous "barcode gap" is what authors present and not necessarily a statistical test based on simulations or by using independent data. We could start heated discussions if it needs such tests if a barcode gap is clearly present especially in sufficiently sampled groups and/or environments just because only a hypothetical-deductive study is a good study

As for the second part of the paragraph  I am a bit puzzled as this would also question the more traditional, morphology-based of species discovery. For me there is no big difference between discovering a new species based on DNA differences or morphological variations. In both cases it is essential to show that these differences are characteristic for the respective species, backed up by other characteristics, and different enough to actually qualify for species and not just result from a more pronounced population structure.

In summary I do understand where the criticism comes from and I do not necessarily agree with it in all points. Actually when it comes to the hypothetical-deductive model I would love to see some more flexibility also for the sake of many students that have to fight through a lot of skepticism when they decide to take on a DNA Barcoding project.

One thing for sure - I don't think the failure to test clear hypotheses is a deadly sin in the context of DNA Barcoding and the paper at hand. It is certainly not a grave one. 


  1. Glad the paper's generating some discussion!

    I certainly agree that not all studies need to be hypothetical-deductive, but it would be nice if this was clearly stated. Therefore, I think data release papers are great idea. Perhaps in hindsight, mentioning and explaining this type of publication may have been a beneficial addition to the 'seven sins' paper ...

  2. For those interested in publishing BARCODE data release papers and the process of science I recommend the following links:

    Consortium for the Barcode of Life guidelines for authors:

    Berkeley interactive flowchart on how science works:

    This latter link is highly interactive and shows the fullness of the scientific enterprise beyond (and encompassing) the linear hypothetic-deductive model. Notably, barcoding relates to all aspects of science, from "testing ideas" to "exploration and discovery", "community analysis and feedback" and "benefits and outcomes". As Collins and Cruickshank suggest, barcode papers would do well to more clearly articulate their objectives, particularly in light of this expanded view of the scientific process.

  3. Nice point, well made. I am often surprised at the resistance that appears regarding 'non-traditional' approaches. If only the resistance could be reserved for the 'irrational non-traditional' approaches that occur from time to time.