Publishers share datasets, but also manage data quality. GBIF provides access to the use of biodiversity data, but also flags suspicious or missing content. Users use data, but also clean and remove records. Each play an important role in managing and improving data quality..
Thanks for this very useful post on GBIF issues and flags. This is really a cool feature of GBIF.org portal and hardly used by the Data Publishers. Would it be possible to include GRisCol issues (Collection/institution match none)?
There is no way to download aggregated views (metrics) of issues and flags metrics. You would have to create a regular download and compute the counts yourself. You can also look at the web portal for simple counts.
Thanks for this! Super useful. I would be keen to figure out how an easy way to exclude collections from zoos and botanic gardens, since that seems to be a common origin of geographic outliers. There is another post on that here: Understanding basis of record - a living specimen becomes a preserved specimen - GBIF Data Blog but it’s a reasonably common source of error that’s hard to avoid at the moment.
There is not easy way to exclude zoos and botanical gardens entirely. You can get close with a few filters.
I would have a look at the R function CoordinateCleaner::cc_inst().
You can also filter by the establishmentMeans column removing records with “MANAGED”. Keep in mind that this can often be left empty, so probably good to keep empty values as well.
This will not remove all of the zoo and botanical garden records, but it will get you very close. You could also try to do some outlier analysis.
There is also basisOfRecord = LIVING_SPECIMEN, which you know about from reading the previous article probably.
This blog post might give you more filtering idea: