I’ve written a blog post looking at a case where metagenomic data results in GBIF’s map for the parasitic plant Rafflesia (famous for it’s giant flowers) showing occurrences in the ocean: https://iphylo.blogspot.com/2019/12/gbif-metagenomics-and-metacrap.html
Turns out the identification is based on a short sequence for a picoplankton being matched to a flowering plant. I worry that these sorts of errors may be widespread, and that they are hard to track down (I had to get original sequence data and do a BLAST search).
I’m not arguing against including meta genomic data, quite the opposite, but this stuff has errors, and those errors may “leak” into unexpected places. GBIF already has enough issues with data quality, so perhaps we could think of ways to minimise the impact of spurious identifications of metagenomic data.