Statistics and graphics for Nodes

mgrosjean · April 21, 2023, 11:17am

Watch the recording here:
https://vimeo.com/manage/videos/835760197

In the May issue of the support hour, the Data Products team will help you find and generate statistics and graphics for reporting on publishing and usage activities by Nodes, publishers and projects. The main goal is to let you know what is available where and how to access it.

Nodes can either ask a question by replying to this Discourse forum post or write to helpdesk@gbif.org.

kdpearso · April 26, 2023, 2:06pm

Looking forward to it! I always have trouble finding the Zoom link…can you remind me where it is posted?

mgrosjean · April 26, 2023, 2:12pm

Hi @kdpearso it is by invitation which should have been sent to your Node manager. I will forward it to you.

vechocho · April 29, 2023, 12:08am

Thank you very much this would be very helpful as me I’d like to know how to use the API or some easy way to have data from the country at

Species level
Administrative areas

Maybe some way to use the API (as I don’t know nothing about programming )

Generate graphics (R scripts maybe?)

I always look as the way @daiesco team make that amazing reports for Colombia .

mgrosjean · May 1, 2023, 12:40pm

Hi @vechocho, thank you for the questions, we will try to cover the API for metrics in the session. If you have the time before Wednesday and if you haven’t already, I strongly recommend to check our introduction to the API recorded webinar: Data Use Club Practical Sessions: Introduction to the API, rgbif and pygbif on Vimeo

It will be really helpful to understand how to get started and what you might need.

ymgan · May 2, 2023, 12:57pm

Thank you for organizing! We are interested in:

cumulative graph showing the number of datasets published per year per node
number of data published per country per area - we are thinking about regional differences (e.g. MEASO areas) in amount of data available
statistics on data use
number of downloads per record, and per dataset (could also per node)
number of publications that use data from a node

Thank you and see you tomorrow!

estebanmhGBIF · May 4, 2023, 2:46pm

Thank you very much @mgrosjean for the space. I was wondering if you can share with us the word document you used in the presentation, to have the links and explore them further.

Also, from SiB Colombia we want to share this python script that we have built, is used to call all resources published by a country and extract some metrics (number of records per dataset, cites by organization, publication date and last modified). It is really tailored to our needs, but it has comments (in spanish) along the process. We hope it is useful to someone or can serve as inspiration.

vechocho · May 5, 2023, 2:56pm

thank you very much Esteban

mgrosjean · May 11, 2023, 6:43pm

Thank you @estebanmhGBIF ! We will share the document, transcript of the questions as well as the edited video as soon as we can. We are a bit behind schedule.

mgrosjean · June 27, 2023, 1:30pm

The document that was used in the video with all the links for the examples:

Country page presentation and reminder: https://www.gbif.org/country/EC/summary
https://www.gbif.org/occurrence/charts?publishing_country=EC&advanced=1&occurrence_status=present Another way of viewing data from a country by way of the occurrence search – highlight custom filters.
1. Not an official feature – but a neat trick to make custom charts:
  
  i. Send a list of datasets for which the occurrences appear in a search:
  a. Replace the “search” in the URL by “datasets”: https://www.gbif.org/occurrence/search?country=EC&taxon_key=6
  b. Make a custom chart with the parameter “d=datasetKey”: https://www.gbif.org/occurrence/charts?country=EC&taxon_key=6&d=datasetKey
  
  ii. Make and send interactive custom charts:
  The parameters are actually hidden in the URLs, which means that you can send a link to someone, but the parameters won’t be visible on their web browser once they click on the link. The charts have three possible parameters:
  - d: the first dimension (datasetKey, country, month, speciesKey, etc.)
  - d2: the second dimension
  - t: the type of chart (TABLE|COLUMN|PIE|LINE)
```
 You can combine these together to create different charts, for example:
     - `https://www.gbif.org/occurrence/charts?country=EC&taxon_key=6`
     - `https://www.gbif.org/occurrence/charts?country=EC&taxon_key=6&d=speciesKey&d2=month&t=COLUMN`
     - `https://www.gbif.org/occurrence/charts?country=EC&taxon_key=6&d=basis_of_record&t=PIE`
```
https://analytics-files.gbif.org/: Access data and work with it yourself → country → Ecuador → etc.
Dataset search interface: https://www.gbif.org/dataset/search?q= (projects, datasets, download csv)
Literature search interface: https://www.gbif.org/resource/search?contentType=literature (multiple datasets: https://www.gbif.org/resource/search?contentType=literature&gbifDatasetKey=c3413793-cd8e-4f74-b0b7-e1f0c155102c&gbifDatasetKey=d1756e5c-0667-4b48-bdbb-3d59abf2bcab&gbifDatasetKey=e898796e-d96c-426f-aa67-dc4476b3e37f&gbifDatasetKey=b4c773d1-9360-4cf3-a01b-42ecfb98bdd6&gbifDatasetKey=c7bcf931-2494-4206-91b0-bbd00700cae2)
Download activity report from publisher page: https://www.gbif.org/publisher/1b5350a0-e445-4a39-b9ae-07d6b152c204
Download usage statistics by country: https://www.gbif.org/developer/occurrence

Occurrence API facets for breakdown of numbers by taxon or area, etc.
https://api.gbif.org/v1/occurrence/search?country=EC&taxonKey=1&facet=year&limit=0
https://api.gbif.org/v1/occurrence/search?country=EC&taxonKey=1&facet=gadmLevel1Gid&limit=0&facetLimit=100
Plausible · gbif.org (Google Analytics-like statistics)

The questions and answers during the session

You mentioned aggregation of metrics by project but there aren’t a lot of project pages on GBIF, how to you get a project registered so data can be searched by project identifier?

Project pages like this one are for projects for which receive GBIF-mediated funds (for example, projects in the context of BID or BIFA). They allow us to keep track of the projects and aggregate metrics easily. We don’t make project pages for any project.
With that in mind, you can still use project identifiers. You can add to a dataset a project identifier in the projectID section of the dataset metadata. This will allow you to search, occurrences, datasets and citation based on that project identifier regardless of whether you have a project page. For example, the project identifier Boyaca_BIO isn’t associated with any project page. Yet, you can find all the datasets associated with that identifier here, all the occurrence here and all the citations here.
In other words, you don’t need to have a project page in order to use a project identifier.

Is it possible to download a list of species for a dataset?

Yes you can click on the dataset occurrences and then click on the download tab and select the species list format. It works with any occurrence query.

My question relates to the GBIF occurrence search using to verbatim scientific names. If I understand correctly, the verbatim scientific name field contains the value as provided in the scientific name field by the publisher. I can see that there is a Darwin Core term named verbatimIdentification which seems very similar. What are the differences?

The verbatimScientificName field isn’t a Darwin Core term. We created it for the practical purpose of making the original name provided to GBIF searchable. This is especially useful in cases for which there is no match in the GBIF backbone taxonomy. The occurrences can still be found using the scientific name provided.
The verbatimeIdentification field is used to share the original identification associated with the record (before any normalization or update of any name) so it might differ from the scientific name. We don’t match the content of the verbatimeIdentification field to the GBIF backbone taxonomy.

Is there a way to capture relationships between publishers? For example, several publishers belonging to the same parent organisation? Should we just have the acronym of the parent organisation in the publisher title?

We don’t have any guidelines for this specific type of question. For practical purposes keeping the acronym of the parent institution would be the easiest. We also have machineTags that could be used in some cases but it might not suit your needs. For now, we can discuss it on a ad hoc basis. If you have specific cases, please send us an email to helpdesk@gbif.org

If someone wants to apply to create a GBIF hosted portal , is it as simple as filling out the application and having the Node support? Is there anything else to keep in mind?

Yes. The main technical requirement is that the data has to be available and searchable on GBIF (for example all the occurrences in a geographic area or network). If the data isn’t searchable, a portal might be more difficult (or sometimes impossible depending on the request). Otherwise, the application process is as easy as it looks.

Will the extensions be downloadable in the download interface?

Yes, occurrence some extensions will be downloadable either just in the download API or both in the API and the web interface. Note that this concerns only those extensions:

What about a trait and Plinian Core extension for a Taxon core?

It isn’t planned, we suggest to log a GitHub issue to let us know what would be of interest to the community.

How do publisher share trait data?

It depends if we are talking about traits and characteristics of a particular taxon or measurements on specimens. See this thread for more discussion: https://discourse.gbif.org/t/publishing-trait-data-for-plants-is-it-possible-species-level-traits-from-literature-traits-measured-in-the-lab/

You showed the country pages with associated metrics. Is there an equivalent for a hosting organization?

There is no country page equivalent for non-country Nodes. In the case where the Node is also a hosting organization, you can go to the organization page and find metrics there. For example:

here are metrics for all the occurrences hosted by the same organization
here are all the datasets hosted by the same organization

If you aren’t in the case where your Node is also a hosted organization, it is a bit more challenging. We don’t have “endorsing node” statistics. You have to get all the keys publishers endorsed by your node and generate the statistics yourself.

Topic		Replies	Views
How to publish data via the GBIF API (GBIF technical support hour for Nodes) Data Publishing NodesSupportHour	1	282	August 26, 2024
Community Metrics Data Use NodesSupportHour	3	226	May 2, 2025
Search, download, analyze and cite (repeat if necessary) - GBIF Data Blog Data blog	15	3042	September 15, 2021
How to see number of occurrences growth in GBIF Data Use	6	300	May 25, 2025
GBIF Literature tracking (GBIF technical support hour for Nodes) Data Publishing NodesSupportHour	2	748	October 14, 2023

Statistics and graphics for Nodes

The document that was used in the video with all the links for the examples:

The questions and answers during the session

Related topics