How to identify citizen science occurrences in GBIF?

Hey folks,

Trying to identify different citizen science datasets in GBIF and use their occurrences. Besides having a list of datasets from Chandler et al. 2016 and then trying to select those, I do not know a consistent way of downloading citizen science occurrences from GBIF.

I have seen in some posts the use of HUMAN OBSERVATIONS, but those do not necessarily mean it is a citizen science program, right?

Any thoughts?

@pserra, GBIF harvests datasets that are built with Darwin Core categories, and there is no DwC category for type-of-publisher, like “citizen-science project”. However, processed GBIF datasets also have a non-DwC field called “publisher”. In the 21st-century Amphibia dataset I wrote about, this field has 166 entries with publisher names like “” (635159 records), “” (199000) and “Froglife” (2769). You can select citizen-science projects from these names with either a look-up table of projects or a bit of googling. How you do the selecting will depend on your data-managing software. Email me if you’d like command-line suggestions.

1 Like

Unfortunately, that’s not a tight solution because some publishers of citizen-science records are institutions (museums etc) that run nature observation programs, so you would need to check all the “publisher” entries from a HUMAN_OBSERVATION set of occurrence records if you wanted to be thorough!

1 Like

Hi @pserra
I try to tag all the citizen science datasets I can find. The list is available here:
It can also be downloaded by clicking on this link:

This is a semi automated tagging based on this approach: Finding citizen science datasets on GBIF - GBIF Data Blog
I try to update the tags twice a year. It is far from perfect but maybe it can help?

1 Like

@mgrosjean thanks a lot.

I however do not know if I am doing anything wrong here because when I download the dataset I get >85k datasets. When I look some datasets I see some forest inventory datasets in there – which are not citizen science projects. Also, when I try to select for some species, like a tree where we clearly have some records that are not citizen science, it does not seem to subset the dataset.

I just want to make sure I am not doing something wrong here. I understand this is based on machine tagging.

Example for Abies balsamifera code in R:

abba_GBIF ← read_delim(‘sampleGIBF_abbal/0213220-230224095556074.csv’)
[1] 19361
mgj_ds ← read_delim(‘mgrosjean_CSdatasets/gbif_datasets.tsv’)
[1] 85281

cs_abba = abba_GBIF %>%
filter (datasetKey %in% mgj_ds$dataset_key)
nrow (cs_abba)
[1] 19361
all (abba_GBIF$datasetKey %in% mgj_ds$dataset_key)
[1] TRUE

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.