Proper handling of missing values through predicates (rgbif)

Hi! I’m using the R package rgbif to request data to GBiF, and in order to get a smaller and more manageable download, I am using predicates to remove from the start observations with specific establishmentMeans or basisOfRecord.
Browsing around the web I have found a couple of webpages that seem to differ in how to perform such a request, specifically in how to handle missing values. Are these different implementations expected to yield different downloads? Can the GBiF fields of establishmentMeans and/or bassiOfRecord contain missing values? I have compared the downloads generated with the first two syntaxes (1.5 milion occurrences, 240 taxa), and found no relevant difference. In particular, the number of missing values was the same for establishmentMeans and basisOfRecord → 0.

pred_or(pred_not(pred(“establishmentMeans”, “MANAGED”)), pred_not(pred_notnull(“establishmentMeans”))),

  • The simple version

pred_not(pred(“establishmentMeans”, “MANAGED”))

pred_or(
pred_not(pred_in(“establishmentMeans”,c(“MANAGED”))),
pred_isnull(“establishmentMeans”)
)

1 Like

@MarcRiera what is the error message you get? Did you fix the problem on your own?

Hi! The error I got was something like: Couldn’t find function “pred_isnull”
I sort of avoided the problem, rather than finding the right (best?) way to use the predicates. I just made a request to GBiF without predicates relating to establishmentMeans or basisOfRecord, and filtered the dataset with tidyverse after download.
Cheers

@MarcRiera pred_isnull is only available in the latest version of rgbif, so that is probably why it is not working. Download the latest version and it should work.

Thank you for your reply. Out of curiosity, are the different implementations expected to yield different downloads?

Your download would depend on what predicated were used, in what sequence and what was resultant final query, and not the implementation or version of rgbif.