Close to 3,000 papers use GBIF-mediated data


As of 2 May 2018, we have knowledge of 2,992 peer-reviewed journal articles that make substantive use of GBIF-mediated data. Of these, 202 were published in 2018 alone, and for 50 of these, we were able to identify the specific data used in the paper.

Ideally, this happens because the author cites the data using the recommended download citation, e.g.

Sometimes, authors choose to cite the datasets directly, e.g.

  • Rivas Pava M D P, Muñoz Lara D G, Ruiz Camayo M A, Fernández Trujillo L F, Muñoz Castro F A, PĂ©rez Muñoz N (2017). ColecciĂłn MastozoolĂłgica del Museo de Historia Natural de la Universidad del Cauca. Universidad del Cauca. Occurrence Dataset accessed via on 2017-09-13.

In both cases, a DOI is included that allows us to identify the data and link it with the paper. However, as the numbers above show, this only happens in about 25 per cent of the cases.

As a consequence, I have started to reach out to authors systematically when I discover a new paper that uses GBIF-mediated data, but fails to cite the data properly. I have been doing this for about a month now, and overall, the response is positive. Once I have a bit more data, I’ll post an update.



Hi I have a bit of a follow up question to this that I’m hoping you can help me with. I have a colleague at the U.S. Geological Survey that would like to find instances of when a paper has cited a DOI of interest and she hasn’t been able to determine how to do that. Are you using CrossRef to do it? Would it be possible to share the code so we can reuse it?

1 Like


Hi there.

We get links between papers and datasets through Event Data which is shared Crossref and Datacite service. Are you looking for citations of datasets DOIs?




Thanks Daniel! Yes, we’re looking for citations of datasets DOIs. I’ll share that information with my colleague. I appreciate your help!



You can use the Event Data API. For example:

You’re interested in papers (i.e. from Crossref) citing the dataset that has the DOI 10.15468/39omei

This would be your query:


"status": "ok",
"message-type": "event-list",
"message": {
"next-cursor": "7d6eb784-3aa3-43ad-afad-49f1b6b83d3e",
"total-results": 11,
"items-per-page": 1000,
"events": [
"license": "",
"obj_id": "",
"source_token": "8676e950-8ac5-4074-8ac3-c0a18ada7e99",
"occurred_at": "2016-11-26T00:00:00Z",
"subj_id": "",
"id": "b7ce08e0-8688-4cd6-a1c4-3553ca7a2f21",
"terms": "",
"message_action": "create",
"source_id": "crossref",
"timestamp": "2017-05-31T10:30:30Z",
"relation_type_id": "references"

You get a JSON response and each event contains information about the subject (i.e. the paper) and the object (i.e. the dataset) as well as the type of relation—in this case “references”. My examples returns 11 citations for the mentioned dataset.

You can also filter by specific dates or other sources. If you’re interested in finding out which of your datasets have been cited, you can query based on a DOI prefix. Full API documentation is available here:

It’s a very powerful tool, but keep in mind that the results rely on 1) authors citing datasets using DOIs, 2) journal editors allowing these citations, and 3) journals submitting these citations as structured metadata to Crossref. For a recent presentation I did on use of GBIF data, I found some fairly depressing metrics, as only 22 out of a 1,000 citations ended up in Event Data.

But—things are improving! Let me know if anything’s unclear or there’s anything else I can do to help.


1 Like


Quick follow-up:

For datasets published to GBIF, we aggregate citations and show on dataset pages, e.g.

These can also be accessed programmatically through our albeit undocumented content API, e.g.




This is exactly what we were looking for so thank you! We can share our work on finding DOIs using xDD, if that would be of interest. Just let me know. Also what kicked this conversation off in our group was that citation link on the GBIF dataset pages which I just discovered last week. This is going to be extremely helpful in my work getting people to provide data to GBIF so I’m really happy to see that implemented. I was also going to try to find the API to access that information so thank you for anticipating my needs. Really appreciate it!

1 Like


Please do share your experiences. Event Data is far from perfect, so we all need to chip in to make sure it improves.

Also, if you’d like to know more about how we track citations that don’t end up in Event Data, let me know.