GBIF Issues & Flags - GBIF Data Blog

Publishers share datasets, but also manage data quality. GBIF provides access to the use of biodiversity data, but also flags suspicious or missing content. Users use data, but also clean and remove records. Each play an important role in managing and improving data quality..


This is a companion discussion topic for the original entry at https://data-blog.gbif.org/post/issues-and-flags/

Thanks for this very useful post on GBIF issues and flags. This is really a cool feature of GBIF.org portal and hardly used by the Data Publishers. Would it be possible to include GRisCol issues (Collection/institution match none)?

Thanks @andre, we added the GRSciColl-related flags to the post. I hope it helps!

Hi, first of all thanks for the post.

I’ve got a question, is there a way to download metrics from GBIF specially the ones related to the Issues and Flags topic ?

There is no way to download aggregated views (metrics) of issues and flags metrics. You would have to create a regular download and compute the counts yourself. You can also look at the web portal for simple counts.

thanks for the awesome information.

Thanks for this! Super useful. I would be keen to figure out how an easy way to exclude collections from zoos and botanic gardens, since that seems to be a common origin of geographic outliers. There is another post on that here: Understanding basis of record - a living specimen becomes a preserved specimen - GBIF Data Blog but it’s a reasonably common source of error that’s hard to avoid at the moment.

There is not easy way to exclude zoos and botanical gardens entirely. You can get close with a few filters.

I would have a look at the R function CoordinateCleaner::cc_inst().

You can also filter by the establishmentMeans column removing records with “MANAGED”. Keep in mind that this can often be left empty, so probably good to keep empty values as well.

This will not remove all of the zoo and botanical garden records, but it will get you very close. You could also try to do some outlier analysis.

There is also basisOfRecord = LIVING_SPECIMEN, which you know about from reading the previous article probably.

This blog post might give you more filtering idea:

Hi, hope you´re well.

I think there is a Issue missing on the Others issues section corresponding to Occurrence status inferred from basis of record.

Thank you for yours previous answer

Thanks! I added a definition for Occurrence status inferred from Basis Of Record, which is a newer flag.

Hi again!.

I’m using this info on my thesis, is there a way to cite this content in a more precise way ?, or a suggestion to cite this?

Our citation guidelines provide details for most any circumstance—in your case, check this one.