Is GBIF's shared content aligned with its uses? Discuss

On the GigaScience blog for 2024-05-20, Scott Edmunds writes:

Problems with scientific data collection are hindering efforts to halt mass extinction and biodiversity loss. We require this data to produce accurate models, create policies based on these model[s] to address this loss, and also to determine and stop those who are responsible. The GBIF (Global Biodiversity Framework) platform has become the go-to home for this type of data, massively growing since its setup by OECD in 1999 to now hosting close to 3 billion records. Because it is scalable Citizen Science has become GBIF’s biggest source of data, particularly citizen science-derived data from the massively popular smartphone app-driven eBird and iNaturalist projects. These provide large volumes of data to GBIF but are also introducing biases in data types (birds are very much emphasized for example).

The bird bias is extraordinary. As of 2024-05-21, 1.95 B of the 2.57 B HumanObservation occurrence records shared with GBIF are for birds. That’s three out of four records. In the current GBIF “stock” of 2.94 B occurrence records, two out of three are class Aves (1.98 B).

Another bias in GBIF is the enormous one towards citizen-science observations. As Edmunds notes, these are now by far the biggest component of GBIF occurrence data. The graph below shows the number of records added to the GBIF total from HumanObservation and PreservedSpecimen sources, dated in each of the recent 5-year periods, and before 1990.

Occurrence records dated before 1990 in GBIF are almost evenly split between observations and specimens in collections (museums and herbaria). In each subsequent 5-year period the dominance of new observations over new specimen records has been growing. The current ratio is approaching 400 to 1. The vast majority of the new observations come from citizen scientists rather than from biological specialists, although the overlap between these categories is significant.

Growing in importance, too, are DNA-derived occurrences (MaterialSample records), which like human observations are present-day snapshots of species distributions.

Given all this, it’s not clear how collections data shared with GBIF can remain relevant to the biodiversity conservation goals outlined by Edmunds.

Policy makers (and biologists) already know that many species ranges have been shrinking for centuries as the human population grows at the expense of the natural world. The ranges of invasive species have expanded for the same reason. Collections data can help document these changes if that’s necessary, but is it? For conservation purposes it’s more important to know what the current situation is, and targeted fieldwork, whether by citizen scientists (as in the jellyfish survey described by Edmunds) or expert biologists, is the most productive way to get up-to-date occurrence data.

Museums and herbaria will benefit if voucher specimens are collected as well, but I suspect that the occurrence records from such projects will become available for conservation purposes before the collections can update the datasets they share with GBIF.

I’m not arguing here that citizen science is pushing natural history collections into irrelevance. Collections will remain key resources for taxonomists, ecologists and taxon enthusiasts. These groups want as much data as they can find, past and present. Unlike conservation planners they’re (usually) willing to patiently query and correct the many gaps and errors in the collections data shared with GBIF. They will also (usually) be interested in data from collections that share only some of what they hold with GBIF, or that share nothing at all. The magnificent Biodiversity Heritage Library project will help them extract data from formerly hard-to-find literature.

Conservation specialists and planners, on the other hand, want occurrence records now. They get them from what Edmunds calls the “the go-to home for this type of data”, GBIF. And what GBIF shares is overwhelmingly one kind of data: recent observations.

Agree? Disagree?

Robert Mesibov (“datafixer”);


This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.