A couple of years ago, we asked the community about publishing soundscapes through GBIF, so we are wondering if there are some best practices to publish this data at this momment.
We want to publish a biological collection of environmental sounds and want to know any experiences or recommendations to better represent this data, I was thinking of using eventCore for all soundscapes with occurrences when there is any possible identification.
But we are not 100% sure because there are several ways to do it and many questions regarding this, as: Can we adapt CampTrap DP to soundscapes? Should we think about a new data model for soundscapes? or we can adapt what is already available and maybe do a quick guide on how to publish this data in GBIF?
Hi @EstebanMH-SiB I am not aware of any best practise guide to publishing soundscape in GBIF. I believe that so far, each data providers have made their own datasets how they see fit.
We’ve had casual conversations about this in the context of the data model and with at least one potential partner network that could bring a large influx of data.
Should note, too, the likely potential audience among the authors of this recent paper:
Interesting, I think, especially to look for a pragmatic data field to indicate if a soundscape includes multiple species and, if so, to map those identified species. Hereby a descriptive comparison of randomly selected GBIF occurrence records from mentioned data providers:
In latter mapping method, it’d be more logical to add a data field for the (calculated/estimated/derived) amount of individuals per species heard on the soundscape, if they’re auditively distinguishable. Defining an automatic method to consistently make those distinctions appear to be challenging, particularly if there’s no visual documentation of the observation.
I apologize I have not responded sooner to this- it came in at a busy time. For the SanctSound project that you linked to the most important thing to us was to include a coordinateUncertaintyInMeters that accounted for the fact that the recorder can be quite far away from the identified animal and we aggregated the detections to once per day per species because, according to the scientists that provided the data, that would be safest for preventing overestimation. We also only had events when there were occurrences.
I don’t know if CamtrapDP would be a good fit for soundscapes or not. Hopefully Peter will chime in. I think it has some things in it that make it pretty specific to camera traps but I think the general ideas of a deployment that is stationary and associated detections would be similar.
I think the topic of soundscapes has been raised a few times in the Machine Observations Interest Group working meetings but I don’t think it’s ever been directly addressed. Perhaps it would make a good topic for discussion at the next TDWG, although I won’t be there.
I am also pinging @sformel who might have some ideas or at least interest to engage on the topic.
Right now, you should likely use Darwin Core. I see two potential models:
Event core + Occurrence extension + Audiovisual Core extension: Each soundscape is an Event (in the core), with a location, a coordinateUncertaintyInMeters (as @abbybenson points out) and a duration in eventDate. Each identified species is an Occurrence (in the extension), at the same location (so no need to repeat that), but likely a more precise duration or timestamp in eventDate. The (link to the) media files can be published in the Audiovisual Core media extension, associated with the event. The drawback of this model is that GBIF won’t show the media files for your occurrences (since they are linked to the event, not directly the occurrence).
Occurrence core + Audiovisual Core extension: Each identified species is an Occurrence (in the core), at a location and with an eventDate (many of which are repeated). You can group occurrences into deployments by having a shared eventID. To indicate that the occurrence is based on an associated soundscape, you can publish the media file in the Audiovisual Core media extension (which will contain many repeated URLs). This model is a more flattened approach of the above (hence all the repetition), but has the advantage that GBIF will show the media files associated with an occurrence. We use that model to convert Camtrap DP to Darwin Core (function, example).
Longer term, I think Camtrap DP is a good fit for soundscapes. As @abbybenson points out:
This will require testing with acoustic data use cases, so we can see how to extend certain vocabularies and maybe rename terms. The good news is that we just submitted a Horizon Europe proposal, where I’m responsible to do just that (extend Camtrap DP to acoustic and insect camera data). Let’s hope it gets funded. The bad news is that you can’t use it right now.
Thank you for highlighting this practice, Peter. For every occurrence record in GBIF, associated media files are indeed shown. Moreover, if there’re several occurrences within an event, all the event’s media files appear to be repeatedly shown for every occurrence record within that event. For instance for this event, all the same 20 images are repeated in each of the three occurrence records in GBIF. To skip few redundantly repeated media and show only the relevant associated media file(s) per occurrence, it seems that it’s just a matter of time and finer selection of the relevant media file URL(s) per occurrence in the Audiovisual Extension.