It is increasingly frequent to find nature soundscape recordings as tool to assess biodiversity using Passive Acoustic Monitoring (PAM) of soundscapes and authors wishing to publish these records through GBIF.
We do not have solid arguments to evaluate how appropriate are these soundscape recordings through GBIF, when it has no “occurrences”. Actually, a publisher did the exercise using the Event core:
I want to share my questions about this and know your point of view.
Is it appropriate to publish this type of data through GBIF?
Is the answer is yes, Is it correct to do it using the Event core?
I see GBIF as the aggregation platform for all evidence we have of the distribution (and as far as possible abundance) of species in time and space. Soundscapes can and should be included. They share some characteristics with satellite imagery and metagenomic data. In all cases, we have instruments that collect streams of digital information from which we can derive assertions of the presence/abundance of different species.
For metagenomics, the sequencing gives us a stream of sequences we can interpret through a combination of reference libraries like BOLD, UNITE and INSDC and suitable clustering algorithms. The quality of the data we get is a product of the quality of the original sequences, of the reference library coverage, completeness and correctness, and of the specific algorithms used.
For hyperspectral satellite imagery, we could be getting e.g. assessments of proportional tree cover by different species for a given area. Again this would be a product of the quality of the satellite imagery, the reference datasets for tree spectra, and the algorithms used.
For soundscapes, our quality will be a product of the quality of the recordings, and of the human or machine processes to diagnose species.
I would like to see the infrastructures in place that store the original sequences/images/recordings and publish versionable sampling-event (Event Core) data sets that list species (and applicable abundance measures) and link back 1) to the digital artifacts from which they derive and 2) to the algorithm versions used to derive the species lists. Over time, as reference data and algorithms improve, this processing may be repeated to increase the completeness or quality of the lists.
A key challenge, particularly in the case of soundscapes will be to document the effective spatial area being sampled. This should be represented in the metadata for the dataset/sample but could over time come to be attached more at the level of each species based on profiles of the effective range of the sounds for each species.
Thank you for this really interesting question @daiesco, and for this detailled and interesting answer @dhobern ! IMO, fully agree with Donald that Soundscapes can and should be included. For now, exactely as mentionned, for non direct “classical” occurences datafiles like DNA or satellite imagey, it appears to me that using GBIF can be quite a challenge due to 1/ the occurence orientation (even if for sure this is less and less true) and 2/ the Darwin-core specification / configuration. And I’m not sure how it is relevant to try fitting such kind of information into DwC ““forcing”” the “event core door”, as so many people do… To share raw data + derived data (any knid of data) + related tools/softwares/algorithms and others digital Research Objects, you can create “datapackage” as proposed for example by DataOne federation, using EML to fill detailled metadata on these Research Objects.Then, you can “extract” “evidence you have of the distribution of related species in time and space” in the DwC format without having to try putting all information in it if this not fit “naturally”.
Creating the French national biodiversity e-infrastructure, we are thinking particularly about such kind of “flow” between “all types of Research Objects” and populating this AMAZING GBIF knowledge base ! We made the choice to follow DataOne practices and to try creating direct links towards these pratices and populating GBIF notably through data/metadata mapping between ““Full EML”” and DwC and implementation on a graphical tool we are developping, MetaShARK. So,working with french soundscapes community, notably PAM, that is the manner I propose to do to 1/ be sure to capture and share all relevant/needed information and 2/ contribute to GBIF.
This is a hot topic for me and I will be very interested to exchange more into details on it, don’t hesitate !
I wanted to check out the soundscapes, but the links to the audio files in multimedia.txt are using RFC1918 private IP addresses and are as such unreachable, e.g.