The video is available here: Health data publishing on Vimeo
Here is the transcript of the questions during the session.
Should we highlight the datasets that are relevant for a particular theme? For example, we have a lot of data about disease vectors, venomous species, etc. but they arenāt highlighted in any specific way, how can we make easier for users to find those datasets? Should we use specific keywords in the metadata for example?
We donāt currently have specific guidelines on how to improve discoverability of health-related datasets. The first step would be to provide metadata as complete as possible. For example, as a medical entomologist interested in health-related data, I (Paloma) will look for datasets with the words āinfectious diseasesā or āparasiteā which are specific and not commonly used in data repositories.
We will be working on a list of keywords that can be used to tag and to search for relevant datasets, terms like āsurveillanceā or āparasitesā. We will communicate our recommendations as soon as possible.
On a related topic: we are trying to segment the GBIF relevant thematically by different communities. Be able to tell you which part of GBIF is health-related. Right now there is no clear why to do this. We worked on some criteria to help identify health relevant data: a combination of taxon filter and publisher identifiers. We should consider using tags provided by publishers as well. This could allow us to create thematic reports in the future.
When looking at the examples presented, a lot of the data shown seems like they could contribute to species trait information. How could we make this information available at the species level? Could this be integrated in the new data model?
This isnāt something that we havenāt been working on so far. It is possible to aggregate information from occurrences on GBIF species pages (for example, the geolocated occurrences are displayed on maps and the occurrences type specimens are emphasised), however, it requires the data to be standardised in a specific way. Right now, this would be something difficult to achieve with health-related data. For example, there are several ways to model the host-parasite relationships (with extension and without), it would be difficult to extract the information from occurrences automatically.
This could possibly be different with the outcome of the work on the new data model (particularly the work on biotic interactions).
We have a vector dataset and working with the resourceRelationship extension is quite challenging. I see that you are using the dwc:associatedTaxa ( https://dwc.tdwg.org/terms/#dwc:associatedTaxa) field in your examples. What is best: using the resourceRelationship extension or the associatedTaxa field?
The answer depends on the complexity of the host data you have. For example, if you have just the host species name, you can simply use the associatedTaxa field. If you have more complex information, you should consider the dwc:dynamicProperties or extension. It really depends on your data.
Note that currently, most extensions arenāt available in the download formats generated in the occurrence download interface.
If each host and parasite have an occurrence (with a relationship extension), we would encourage you to put them all in the same dataset so users can download everything together.
In any case, donāt hesitate to contact health@gbif.org, we can help you map your data.
We usually advise publishers to publish parasites and host as separate occurrences but it is a lot of work for them. It would be easier to publish only hosts or only parasites as occurrences and have to other species mentioned in the asscoiatedTaxa field. What would be best?
Ideally, publishers should share as much as possible, it is valuable when it comes to the āone healthā approach. Right now, on GBIF.org, there is no way to search occurrences by value in the associatedTaxa field. This means, for example, that if you published only occurrences for parasites, there is no way for users to find those occurrences by looking for the name of the host. If you want hosts and parasites to be both discoverable, they have to be both published as occurrences. This could perhaps change with the new data model but donāt know what will be possible yet.
Is it interesting to expose these health data in GloBI. Do we know if this DwC-A format allow for ingestion into GloBI ( https://www.globalbioticinteractions.org/)?
This forum thread mentions GloBI being able to ingest data from DwC-A: Field Museum and iNaturalist Extending Specimen through DwC Resource Relationships - #9 by jhpoelen. There seem to be several steps needed for this to happen, this isnāt direct, please check the GloBI documentation.
It would make sense to make sure that the interaction datasets published on GBIF would also be compatible with GloBI, especially in the context of the new data model (have a standard that works on both platform).
From Norway we also have health data for other organisms than humans like Gyrodactylus on Salmon - for which we are interested to learn best practices to expose ( https://doi.org/10.15468/rcouob)
We donāt currently have specific recommendation for non-human hosts. Having concrete examples at hand will be very helpful to develop best practise documents, thank you.
We are thinking of doing one or two webinars on best practises to publish health data with our publishers. Can we reuse the material provided by GBIF during workshop training? Do I need explicit permissions?
It depends on the material concerned. For example, most GBIF training materials and guides are published with a license. You should check the licenses associated with the documents that you would like to use. For example, the license for the GBIF Data Mobilization course is available here and the license for DNA-derived data publishing guide is available here. In doubt, you are welcome to email us and we can help you find the rightful owner. If you would like to advertise your webinar on GBIF, you can use this form to create an event page: Suggest an event for the GBIF.org calendar