Introducing Rule-based Annotations - GBIF Data Blog

datafixer · February 10, 2026, 6:26pm

Rule-based annotations is an experimental tool that will allow users to mark certain occurrence data as suspicious. The main goal of the project is to facilitate data cleaning and user feedback to publishers.

Nicely built and implemented, but the two goals are mistaken.

First, the “cleaning” shown in the blog post is not “cleaning”, it’s filtering (Filtering isn’t cleaning). The clean_download function

Returns cleaned data with suspicious records removed or flagged

The removal of suspicious records does nothing to clean those records, and the flagging of suspicious records does not guarantee that the user of the results will do anything about the records.

The second mistaken goal is providing another means for end-users to contact data publishers in hopes that the publishers will investigate and correct, if required, any suspicious records.

I wrote “another” because GBIF already flags issues, and has been doing so for many years. The great majority of data publishers take no action whatsoever on GBIF-flagged issues. This can easily be demonstrated by tracking individual issues through successive versions of datasets shared with GBIF.

GBIF annotations are primarily for data consumers, not publishers, and the primary use of the annotations is for data filtering, not data cleaning. I suggest editing the post’s summary to

Rule-based annotations is an experimental tool that will allow users to mark certain occurrence data as suspicious. The main goal of the project is to expand data filtering capabilities.

Topic		Replies	Views
GBIF Issues & Flags - GBIF Data Blog Data blog	14	7105	May 22, 2024
Filtering isn't cleaning Data Use	22	1408	October 3, 2023
GBIF Rule-based annotations - Technical Support Hour for Nodes Data Use NodesSupportHour	0	16	March 12, 2026
GBIF's data quality workflow (GBIF technical support hour for nodes) Data Publishing NodesSupportHour	5	586	March 15, 2024
Identifying potentially related records - How does the GBIF data-clustring feature work? - GBIF Data Blog Data blog	19	7831	June 1, 2023

Introducing Rule-based Annotations - GBIF Data Blog

Related topics