Community Metrics

Join us for the next technical support hour for GBIF nodes on April 2nd at 4 pm CEST, where the topic is Community Metrics .

Community Metrics is a new initiative to develop new metrics, indicators and time series data products from GBIF-mediated data that can better support decision-making. The aim is to build on what is currently provided via GBIF´s data analytics and for this to build on work that the community has already done in this area. In the session, Andrew Rodrigues and John Waller present the idea behind the initiative and ways that the node community can actively contribute via the CommunityMetrics GitHub repository.

We will be happy to answer any question relating or not to the topic. Please feel free to post questions in advance in this thread or write to helpdesk@gbif.org.

2 Likes

Very interesting subject, thanks.
Developping new metrics is really important but should take into account the inevitable biaises of observations.
Occurrences published on GBIF only represent an infinitesimal part of what can be observed on earth. Taking into account over-monitored and under-monitored species or sites is crucial to deliver valuable indicators. Therefore, measuring the monitoring efforts of our community and the well known temporal, geographical and taxonomic gaps is essential.
Happy to further discuss this during the next technical support hour.

2 Likes

Thanks André for this. In any kind of analysis there will be underlying assumption and biases that we will have to deal with. Some of these will be easier than others to address but what will be important is supporting people with the interpretation of those metrics, which is why we want to have fully documented and open processes around the development of these metrics. We currently have no predefined idea for what metrics we might want to develop but it may well be in the scope of this work to think about can we develop a metric that allows us to evaluate to what extent new data coming into GBIF are addressing some of those inherent biases. Right now everything is up for discussion :slight_smile:

The video recording is available here: Community Metrics on Vimeo

Here is a transcript of the Q&A:

Is there a way to integrate these metrics on the hosted portals?
Not at the moment. Some metrics were derived from the B3 project (https://b-cubed.eu) and we have started some conversations around integrating some of these metrics to hosted portals. We are still in an exploratory phase.

Would it be possible to create report on a country + territories? Such as France and overseas.
This is an open question. Should country metrics include overseas territories or should each territory have their own metrics?

We need both metrics for individual territories as well as combined metrics.
As long as the territories concerned have ISO codes, we should be able to make these type of metrics.

I would like to have a list of all the new datasets published in a given year (not just examples) in the country reports. Is there a way to get a full list?
Thanks we will keep it in mind.

Should we start tagging the ideas on GitHub?
Yes! As ideas get logged, we hope to refine tags to facilitate sorting of the GitHub issues.

Will the community metrics repository be open for any collaborator or would it be only for nodes? Or should there be alternative ways to gather community input? For example, a specific GBIF address?
We could explore alternative to GitHub, for example the discourse forum (https://discourse.gbif.org) or a mail box.

In our country, we collect input by email before dispatching it to the relevant channel(s). It works quite well.
Thank you for sharing. Having the nodes collect input for their community before logging the ideas on GitHub would be the ideal model for us.

Different topic: How are quality checks applied to those metrics? For example, the number of species based on occurrences for our region is actually higher than our national checklist. Would GBIF use National checklist to have better quality metrics? If the figures don’t correspond to the numbers expected by the experts, this will create a layer of noise.
Thank you, this is a helpful comment. When we start putting out these analytics, we will have to be very transparent and document clear workflows which enabled making those metrics. We want to avoid producing maps and visuals without any context. It is important to highlight the limitations of such metrics.
A similar question is how to address the challenge of uneven sampling methods.

What about indicators at regional level?
The context for this work is National-level reporting. However, some of these metrics could be applied at different levels. This is another topic open for discussions.

How about exploring metrics that help national reporting to CBD? I assume there should be universal ones that all nodes will find it useful? I don’t have any clear idea in mind though.
All the work on analytics aims to support Target 21 (Target 21). It would be nice if we could start tagging analytics ieads with the corresponding target from the Kunming-Montreal Global Biodiversity Framework. For example, analytics on invasive species would correspond to the target 21 but also target 6 (Target 6). Flagging linking in the repository would be useful.

What is the timeline for those metrics?
Ideally, we would want to present something at the next GBIF Governing Board in October 2025. So it would be great if you could start logging some ideas now. Another deadline is the B3 project (https://b-cubed.eu/) which finishes next year. By that time, we would like to have clear workflow established to create some indicators.
The timeline also depends on what the community has.

What programming language are you thinking about for those metrics?
The language of implementation isn’t as important as the idea of the metrics (and whether it has support). In theory, any programming language could work. In practise, some might pose some challenges during implementation (we won’t necessarily know from the start). The most important is to start with the idea. The GBIF SQL download functionality allows to do a lot already.

When can we start logging issues?
Now is a good time to start. It would be great to see some activity on the repository. If you have some ideas on how best to handle the communication with your networks, please get in touch as well.

How do I currently get a node report from GBIF?
Country reports can be downloaded from country pages. For example, go to United Kingdom of Great Britain and Northern Ireland and click on “Activity report”.

What should be the frequency of the metrics produced? Yearly, monthly, daily, live?
Currently, we are making analytics on quarterly basis but the country reports are done annually. The periodicity of the analytics should be specified when logging the ideas on GitHub.

Note that we (GBIF Nodes and Secretariat) are in the very early stages of Node training for our meeting in Colombia in October and some of those will touch on what was discussed today. Here are the initial topics (to give you a flavor):

  • emerging publishing models
  • data access and reporting
  • meta barcoding and metabardocing toolkit
  • hosted portals

We have already implemented CamTrapDP and ColDP. But what is the timeframe for emerging models? People in our network are asking how they can participate in the development/implementation of this work?
We proposed a session at the Datos Vivos conference (in Bogota in October): Living Data 2025 - Conference Sessions (ID: 6798937, “Darwin Core Data Package: an updated model and format for exchanging biodiversity data”). A month before, the model will be presented and open for public review. We will have the opportunity to discuss it during the conference. The review will be open a month after the conference. The work is in progress at the moment.

Do we know if the node training will have parallel session or would anyone be able to attend everything?
We don’t know yet, there may be an opportunity for a couple of parallel sessions. Typically, each topic will be a half day session. This year the training will be technical and we will aim to have introductory and advanced options.

Different hosted portals use different iconography on their front pages. I really like the ones from the French hosted portal ( https://www.gbif.fr) and they have them locally and come from https://icones8.fr/. Could there be a common library or set of icons that anyone can use for hosted portals?
You can use whichever icon you would like. We don’t have such a library available but you are welcome to use the ones from GBIF.org. From the chat, you can check the https://thenounproject.com/. You can explore hosted portals and see if there are any you like.If you are interested in hosted portals, you are welcome to attend the related community calls where such matters are discussed. Don’t hesitate to consult GBIF hosted portals to find the next date.

More and more people seem to download and clean up data from GBIF. Then make a database and publish a paper. What is the best approach to get those changes back to GBIF. Would there be an annotation tool?
We are running a pilot for a ruled-based annotation system. There is going to be an event at the SPNHC conference 2025 about it: Workshops | Society for the Preservation of Natural History Collections (SPNHC) (Workshop 1 (WKSHO1)).

If anyone would fine it useful to make downloadable reports from the GBIF data validator, please vote and comment here: https://github.com/gbif/portal-feedback/issues/5750
Note that for now, you can publish on the GBIF test website (https://www.gbif-uat.org) and download the annotated archive.