Best practices to publish soundscapes in GBIF

EstebanMH-SiB · January 12, 2024, 7:13pm

A couple of years ago, we asked the community about publishing soundscapes through GBIF, so we are wondering if there are some best practices to publish this data at this momment.

I have come across to several ways, from event core Acoustic detections of birds using the SILIC in Yushan National Park, Taiwan, occurrence core only with identified speciesSanctuary Soundscape Monitoring Project (SanctSound) Daily Aggregated Species Detections and occurrence core with “Sonus naturalis” to publish general audios without an specific species Xeno-canto - Soundscapes from around the world.

We want to publish a biological collection of environmental sounds and want to know any experiences or recommendations to better represent this data, I was thinking of using eventCore for all soundscapes with occurrences when there is any possible identification.

But we are not 100% sure because there are several ways to do it and many questions regarding this, as: Can we adapt CampTrap DP to soundscapes? Should we think about a new data model for soundscapes? or we can adapt what is already available and maybe do a quick guide on how to publish this data in GBIF?

I would like to hear your valuable opinion @abbybenson @mgrosjean @peterdesmet @dhobern @ylebras !

Thank you very much for your help and have a wonderful 2024!

mgrosjean · January 15, 2024, 12:19pm

Hi @EstebanMH-SiB I am not aware of any best practise guide to publishing soundscape in GBIF. I believe that so far, each data providers have made their own datasets how they see fit.

I can see that @jeromeko is a contact for the dataset Acoustic detections of birds using the SILIC in Yushan National Park, Taiwan perhaps he has some insights on how best to work on this type of data?

kcopas · January 15, 2024, 12:45pm

We’ve had casual conversations about this in the context of the data model and with at least one potential partner network that could bring a large influx of data.

Should note, too, the likely potential audience among the authors of this recent paper:

Looby, A., Erbe, C., Bravo, S. et al. Global inventory of species categorized by known underwater sonifery. Sci Data 10, 892 (2023). Global inventory of species categorized by known underwater sonifery | Scientific Data

scooleman · January 23, 2024, 1:32pm

Interesting, I think, especially to look for a pragmatic data field to indicate if a soundscape includes multiple species and, if so, to map those identified species. Hereby a descriptive comparison of randomly selected GBIF occurrence records from mentioned data providers:

Xeno-canto (human observation) records’ background species are mapped in the DwC term ‘Associated taxa’ (concatenated in one data field).
E.g.: Occurrence Detail 3353669526
NB: Background species can be filled in as additional details, for example: XC667543 Steere's Liocichla (Liocichla steerii) :: xeno-canto (but aren’t obligatory to fill in, for instance: XC577844 Steere's Liocichla (Liocichla steerii) :: xeno-canto).
TBRI (machine obersvation) records’ background species appear to be published as additional occurrences within the same event, e.g.: Acoustic detections of birds using the SILIC in Yushan National Park, Taiwan
i.e. 7 species in this soundscape: ZZG01_20210422_070300.wav - Google Drive

Additional remarks:

In latter mapping method, it’d be more logical to add a data field for the (calculated/estimated/derived) amount of individuals per species heard on the soundscape, if they’re auditively distinguishable. Defining an automatic method to consistently make those distinctions appear to be challenging, particularly if there’s no visual documentation of the observation.

Even automatically distinguishing sounds from different species on one recording, is also subjected to improvement in some examples:
e.g. ZZG01_20210123_073300.wav - Google Drive includes more than one species (in my opinion), while its event only lists one species:
Acoustic detections of birds using the SILIC in Yushan National Park, Taiwan
– cf. Steere's Liocichla (Liocichla steerii) :: xeno-canto

At the base, correct species identification is of course essential.

abbybenson · February 27, 2024, 12:24am

I apologize I have not responded sooner to this- it came in at a busy time. For the SanctSound project that you linked to the most important thing to us was to include a coordinateUncertaintyInMeters that accounted for the fact that the recorder can be quite far away from the identified animal and we aggregated the detections to once per day per species because, according to the scientists that provided the data, that would be safest for preventing overestimation. We also only had events when there were occurrences.

I don’t know if CamtrapDP would be a good fit for soundscapes or not. Hopefully Peter will chime in. I think it has some things in it that make it pretty specific to camera traps but I think the general ideas of a deployment that is stationary and associated detections would be similar.

I think the topic of soundscapes has been raised a few times in the Machine Observations Interest Group working meetings but I don’t think it’s ever been directly addressed. Perhaps it would make a good topic for discussion at the next TDWG, although I won’t be there.

I am also pinging @sformel who might have some ideas or at least interest to engage on the topic.

pieter · February 28, 2024, 1:33pm

I’ll forward this thread to Peter

peterdesmet · March 1, 2024, 1:44pm

Hi @EstebanMH-SiB,

Right now, you should likely use Darwin Core. I see two potential models:

Event core + Occurrence extension + Audiovisual Core extension: Each soundscape is an Event (in the core), with a location, a coordinateUncertaintyInMeters (as @abbybenson points out) and a duration in eventDate. Each identified species is an Occurrence (in the extension), at the same location (so no need to repeat that), but likely a more precise duration or timestamp in eventDate. The (link to the) media files can be published in the Audiovisual Core media extension, associated with the event. The drawback of this model is that GBIF won’t show the media files for your occurrences (since they are linked to the event, not directly the occurrence).
Occurrence core + Audiovisual Core extension: Each identified species is an Occurrence (in the core), at a location and with an eventDate (many of which are repeated). You can group occurrences into deployments by having a shared eventID. To indicate that the occurrence is based on an associated soundscape, you can publish the media file in the Audiovisual Core media extension (which will contain many repeated URLs). This model is a more flattened approach of the above (hence all the repetition), but has the advantage that GBIF will show the media files associated with an occurrence. We use that model to convert Camtrap DP to Darwin Core (function, example).

Longer term, I think Camtrap DP is a good fit for soundscapes. As @abbybenson points out:

This will require testing with acoustic data use cases, so we can see how to extend certain vocabularies and maybe rename terms. The good news is that we just submitted a Horizon Europe proposal, where I’m responsible to do just that (extend Camtrap DP to acoustic and insect camera data). Let’s hope it gets funded. The bad news is that you can’t use it right now.

Hope this helps,

Peter

scooleman · October 10, 2024, 12:32pm

Thank you for highlighting this practice, Peter. For every occurrence record in GBIF, associated media files are indeed shown. Moreover, if there’re several occurrences within an event, all the event’s media files appear to be repeatedly shown for every occurrence record within that event. For instance for this event, all the same 20 images are repeated in each of the three occurrence records in GBIF. To skip few redundantly repeated media and show only the relevant associated media file(s) per occurrence, it seems that it’s just a matter of time and finer selection of the relevant media file URL(s) per occurrence in the Audiovisual Extension.

Topic		Replies	Views
Soundscapes publishing through GBIF Data Publishing	5	1156	July 12, 2020
Could Someone Give me Advice on Integrating Local Biodiversity Data into GBIF taxonomy	2	39	January 31, 2025
Could Someone Give me Advice on Integrating Local Biodiversity Data into GBIF?	4	122	August 1, 2024
Sharing images, sounds and videos on GBIF - GBIF Data Blog data-blog	5	12086	November 5, 2021
Accessing GBIF-mediated occurrence data to conserve EDGE species - GBIF Data Blog data-blog	1	1425	September 15, 2023

Best practices to publish soundscapes in GBIF

Related topics