Is there a way to search the Catalogue number field in the GBIF occurrence user interface using predicates, e.g. “like”? For example, I want to see all of the specimen lots associated with a collecting event, and they have very helpful catalog numbers in the format of “LACMIP 42801.1,” “LACMIP 42801.2,” etc. I would love to be able to search for “LACMIP 42801” and retrieve all of them. I can see how to accomplish this on the occurrence API–though only because the default search behavior is less strict not because I can figure out how to search with a predicate–but not the user interface. Am I missing something, or is this type of search not possible in the user interface? Thank you for any suggestions!
Note that this is essentially the same as Rich’s earlier, unanswered question here.
Hi
Short answer: It is not possible to do so no. But it makes perfect sense why it would be useful. We are working to make it possible.
Our API version 1 has been around for many years (10 or more I believe).
The occurrence part of that API supports download and search.
For downloads we support searching with predicates. And those predicates allow you to do a LIKE filter. Such as catalogNumber=LACMIP 42801* . When you write a predicate you decide if it should be an EQUAL filter or a LIKE filter.
But the search API does not support that. It is always EQUAL. So searching catalogNumber for * will return occurrences that have the value *. there is currently 3 of those https://api.gbif.org/v1/occurrence/search?catalogNumber=*
The endpoint https://api.gbif.org/v1/occurrence/search/catalogNumber?q=LACMIP%2042801&limit=100 that you refer to is a suggest service that we use to suggest values to the user as they type in the catalogNumber. An autocomplete service. So your best option is to use the UI and type LACMIP 42801 and then select all the available values. I can see it only suggests the top 10 or so. Let me change that so at least you get more suggestions to choose from.
We are also rewriting our search, to a version that supports wildcards to e.g. catalogNumber
Hope that helps a bit
UPDATE: I have deployed a fix that includes more suggestions (50) - it isn’t a general fix, but at least it helps in many cases. Such as this one. You can now (with some effort) select all thuse values from the suggest
example Search
Thank you so much for the immediate and thorough reply, and interim solution of displaying more then 10 suggested values in the UI! Sounds like the API v2 and rewritten UI search will be very useful
@ekrimmel I read your request, and related to your desire to look for patterns in biodiversity data. I can see how GBIF’s powerful search engine can help to find matching data in their current snapshot of published biodiversity data.
And, I also realize that GBIF changes all the time, and so I did a similar search using Preston (sort of like a git for biodiversity data) and was able to produce the following results for the 2022-02-01 versioned collection (i.e. hash://sha256/4000d2a1af6da5b46f374038d884f91768782a1905d4a75fff3c8c3bb6629913) of darwin core archives registered with GBIF and iDigBio with urls containing lacm in about 2 minutes**
Looks like LACMIP 2533 and related decimals LACMIP 2533.[something] were mentioned quite a bit in the lacm records. . .
Neat thing about this is that I can cite and archive the exact source data used to produce the result, and reproduce them without having to rely on some web service.
I do realize that the UI for Preston is a bit basic (e.g., command-line), but offers quite some powerful discovery techniques for those versed on the command-line / programming. I imagine that a UI can be developed on top of these versioned datasets to increase the reach of these tools.
For now, GBIF’s incredible tools definitely offer a better user experience, as long as you don’t worry about versioning or reproducing results 5-10 years from now.
You should be able to exactly reproduce attached results below with recipes provided above.
Hey thanks, Jorrit! Great to see another type of solution.
Eventually, it would be awesome if GBIF’s download DOIs could incorporate Preston’s ability to actually capture a version of the data so that use of individual specimens could be tracked, versus tracking use at a dataset level as the current download DOI system enables. I know one could do this right now with Preston and hashing–thanks, Jorrit!–but there is just a lot to be said for an intuitive UI and an integrated data pipeline/system–thanks, GBIF!
One use case for why we need a better way to search on catalog numbers is that often a collections manager will want to direct a researcher to a subset of specimen records, and catalog number may be a really useful way to circumscribe that subset. In this case, accessing data via command line won’t cut it; the researcher almost always is in an exploratory phase and a UI that facilitates visualization and discovery beyond the known set of records is essential. Maybe this will evolve as new generations of researchers approach their work with more programmatic skills. But for now the ability to email a research a direct link to search results in a UI is an amazing time saver as the digital equivalent of physically pulling out a selection of specimens.
A second use case is that a collections manager wants to have a researcher cite specimens in a publication (e.g. here). In this case, a solution a la what Preston can do would be ideal. For many collections, the dataset activity tracking from GBIF is meeting an important need by demonstrating the value of digitized, mobilized data. Using multiple modalities (e.g. GBIF dataset activity + a separate list of dataset citations not mediated by GBIF) to track activity doesn’t sound difficult, but in the reality for most collections staff, it’s just one thing too many. Again, lots of room for this to evolve!