There is a recent post from GBIF on Bluesky @gbif.org on Bluesky about some metabarcoding datasets for Sweden and Madagascar Search
I took a peek at the results from Madagascar and there a quite a few cases of European and/or Palearctic species being found in Madagascar (and nowhere else), where the Madagascar records based soley on the metabarcoding results, e.g.
I’m guessing that something here has gone rather wrong, resulting in a bunch of bad taxonomic identifications. Presumably the metabarcoding sequences have been clustered with existing barcodes, and those are likely geographically biased.
Apart from the obvious question (“why didn’t anyone publishing the data have a look at these patterns as wonder what was going on?”) I’d really like to drill down and investigate further. But it is not obvous to me how I go from a GBIF occurrence for metabarcoding, such as Occurrence Detail 5162473492 to the actual sequence. Is there a simple way to do this, or do I need to do some database forensics?
Obviously metabarcoding is a huge source of biodiversity data. But it is hard to have confidence in its results when we see distributions like this with no obvious way of being able to work out what happened.
Looks like the sequences for each occurrence are in the source data file https://www.gbif.se/ipt/archive.do?r=iba_co1_litter_2019_mg but NOT in the GBIF version Search, so it looks as if there has been an error in processing this file (the DNA sequences have not been imported).
Spoke too soon, the sequences are there, but the web interface doesn’t make them very obvious. There is a scrollable panel and the DNA is away off to the right. Hence based on a quick glance it appeared that there wasn’t any DNA in the record.
Perhaps the ENA accession records can provide some clues?
“We present the raw sequencing data from the Insect Biome Atlas project (IBA). Over 12 months, weekly Malaise trap samples were collected at 203 locations in Sweden and at 50 locations in Madagascar. This was complemented with soil and litter samples from each site. The field samples comprise 4,749 Malaise trap, 192 soil and 192 litter samples from Sweden and 2,566 Malaise trap and 190 litter samples from Madagascar. Samples were processed using mild lysis or homogenization, followed by DNA metabarcoding of COI (418 bp). This data allows characterizing the terrestrial arthropod faunas of Sweden and Madagascar.”
1 Like
I’ve done a bit more work, see A metabarcoding mess and the importance of just looking at the data and it looks like the main problems are with one of the Madagascar datasets (the litter samples). I’m in touch with the authors of the paper and they are trying to figure out what happened. I guess the three obvious candidates are:
- lab contamination
- lack of reference sequences from Madagascar taxa in BOLD
- software error in the pipeline used to process the data
Hope to know more soon.