From the white paper:
Over the last two decades there have been enormous efforts to mobilize biodiversity data, which have resulted in the availability of massive amounts of published data that can be readily discovered, accessed and freely used for onward applications. The process of building meaningful EBVs that can inform indicators needs data that can be reliably tracked across not just organism, space and time but also provenance; the latter includes relevant, complete and searchable metadata about the inventory process and the methods that produced those data. Much of the data shared through biodiversity data platforms lack one or more of those four components, which limits or excludes their use in the creation of EBVs and biodiversity indicators. Furthermore, much of the data currently shared correspond to incidental records and lack any defined inventory or survey methods.
For EBVs that use multi-variable analyses to aggregate and homogenize data across species, space and time, a taxon name, an event date and a set of coordinates are not enough to account for any bias or deficiencies in the available data. One way to help to overcome these biases is to publish occurrence and event records with metadata that describes the collection methodology and processes that are as rich as possible. However, this type of correction will only be useful for certain types of analyses. Many species occurrence records that represent only the presence of the species (i.e. incidental records) will still not be useful for EBVs that must account for data that enables inference about absence of species. For these EBVs, well-documented monitoring or inventory event data is needed.
As more monitoring data becomes available, expanded best practice guidelines should include, but may not be limited to, how to share quality metadata containing details of the sampling methods employed, the scope, and descriptions and provenance of the collected data. To make this practical, biodiversity data platforms will need to review and amend current data-sharing standards and practices, and upgrade their infrastructures to host and display new types of data and data formats such as is the case with GBIF´s ongoing consultation to review its current data model. An example of a new standard that is under review for implementation is the Humboldt extension to Darwin Core (Guralnick et al. 2017, Sica & Zermoglio 2021). Furthermore, data publishing institutions could be encouraged to create “sub-collections” of their data that meet these metadata requirements that they could publish separately from their larger corpus of data. An increased focus on the publication of past and current monitoring and inventory datasets with the expressed purpose of supporting EBV and biodiversity indicator creation would require strengthened ties with the research and monitoring communities that produce those data.