Translating citation guidelines

dnoesgaard · February 10, 2022, 12:24pm

As you all know, citations are crucial to ensuring that GBIF data publishers are credited for their work. We work continuously to improve data citation culture, but we also see some language barriers. It’s essential that we can provide simple-to-understand guidelines to anyone wishing to use and cite GBIF-mediated data.

The citation guidelines page has recently been updated significantly with critical new details on, e.g. derived datasets. While some translation work has already been done, parts may be outdated. The current status is as follows:

Language	Language (English)	Status
العربية	Arabic	No translation exists
简体中文	Simplified Chinese	No translation exists
Français	French	No translation exists
Русский	Russian	No translation exists
Español	Spanish	Existing translation needs revising
繁體中文	Traditional Chinese	No translation exists
日本語	Japanese	Existing translation needs revising
Português	Portuguese	Existing translation needs revising
Українська	Ukrainian	Existing translation needs revising

If you can help and are already set up to work in Contentful, please go ahead. If you have any questions about the page’s content or getting access to Contentful, do let me know. To keep the formatting consistent across languages, I’ve copied the English body text to the languages for which no translation exists. If you need the source for updating, I’m also including it as code below:

Data accessed through the GBIF network is free for all—but not free of obligations. Under the terms of the GBIF [data user agreement](/terms/data-user), users who download individual datasets or search results and use them in research or policy agree to cite them using a DOI, or Digital Object Identifier.

Good citation practices ensure scientific transparency and reproducibility by guiding other researchers to the original sources of information. They also reward data-publishing institutions and individuals by reinforcing the value of sharing open data and demonstrating its impact to their stakeholders and funders. Datasets published through GBIF are authored electronic data publications and, as such, should be treated as first-class research outputs and correctly cited.

> While all example citations below are formatted in Harvard style, please adapt them to the style format required by your institution, publisher or agency. However, please do include each element of content—most importantly the **DOI** expressed as a URL.

### Citing data

- [Occurrence data downloads](#occDataDownload)
- [Individual datasets](#datasets)
- [Species pages](#species)
- [Derived datasets](#derivedDatasets)
- [Occurrence data obtained using third-party tools (e.g. rgbif, pygbif, spocc, dismo, etc.)](#thirdParty)
- [Occurrence data accessed via the GBIF occurrence search API](#api)
- [Occurrence data accessed in a cloud computing environment](#cloud)

### Citing non-data content

- [GBIF.org](#nondata)
- [Authored content at GBIF.org (web page)](#nondata)
- [GBIF as an infrastructure/entity](#nondata)<p id="occDataDownload"></p>

---

### Occurrence data downloads

When downloading data from GBIF.org, a registered user is immediately redirected to a page that includes the following information:

![GBIF download citation example](//images.contentful.com/uo17ejk9rkwj/43tRSJk8iBdKSBJYVszVib/ec229db9ce0af72f800df30a4efe57a4/gbif_download_citation_example.png)

This citation appears again in the confirmation email sent to the registered user. Keep this reference close so you can cite it. Details of previous downloads can always be accessed in the registered user's [list of downloads](/user/download). Please contact <a href="mailto:comms@gbif.org?subject=Help me identify a previous download&body=(please include details of filters applied—which taxa, etc.—and roughly when data was downloaded)">GBIF</a> if you need help finding a previous download.

The download page provides a record listing all contributing datasets as well as a snapshot of all search terms, filters and facets. Users can quickly update search results from the download page and will also see links to any citations once they are picked up in GBIF's [literature tracking programme](/literature-tracking) ([for example](/occurrence/download/0010292-171124123535762)).
<p id="datasets"></p>

#### Citing filtered downloads

If you have filtered downloaded data significantly, you can create a [derived dataset](#derivedDatasets) to cite only the records used in downstream analysis. This requires that you preserve the `datasetKey` column during filtering steps.

#### Citing multiple downloads

If you have used multiple downloads, you may not be able to include all citations in the reference list of your article. In this case, we recommend including a supplementary list or addendum of all downloads used. You may also choose to summarize the combined data using a [derived dataset](#derivedDatasets). Note that the GBIF download system allows for multiple taxa ([up to 100,000](https://data-blog.gbif.org/post/downloading-long-species-lists-on-gbif/)) in a single download request.

[Back to top](#top)

### Individual datasets

Most downloads from GBIF.org contain records from multiple datasets (as above), but in some instances, such as internal reporting or the advance publication of a dataset for research, users may want or need to cite a single dataset, as in [this example](/dataset/bbaad610-29b7-480e-a7bc-f5a7dc10c191):

> Rivas Pava M D P, Muñoz Lara D G, Ruiz Camayo M A, Fernández Trujillo L F, Muñoz Castro F A, Pérez Muñoz N (2017). Colección Mastozoológica del Museo de Historia Natural de la Universidad del Cauca. Version 1.1. Universidad del Cauca. Occurrence dataset https://doi.org/10.15472/ciasei accessed via GBIF.org on 2020-03-02.

Note, that as datasets may change over time, even single-dataset downloads are assigned new, unique DOIs which should used in citations. If appropriate, this can be done in combination with the original dataset citation, e.g.:

> Telenius A, Jonsson C (2017). Molluscs of the Gothenburg Natural History Museum (GNM). GBIF-Sweden. Occurrence download https://doi.org/10.15468/dl.f14yjv accessed via GBIF.org on 2020-03-02.
<p id="species"></p>

[Back to top](#top)

### Species pages

Each species page includes a default citation, [for example](https://www.gbif.org/species/5284517):

> GBIF Secretariat: GBIF Backbone Taxonomy. https://doi.org/10.15468/39omei Accessed via https://www.gbif.org/species/5284517 `[13 January 2020]`

Note: If making assertions about the distribution of a given taxon, consider making a download of occurrences. This will ensure a persistent time-stamped snapshot of data with a DOI that can be cited in the same way as occurrence data downloads.
<p id="derivedDatasets"></p>

[Back to top](#top)

### Derived datasets

[Derived datasets](/derived-dataset/about) are citable records of GBIF-mediated occurrence data derived either from:

- a GBIF.org download that has been filtered/reduced significantly, or
- data accessed in a [cloud computing environment](#cloud), or
- data obtained by any means for which no DOI was assigned, but one is required (e.g. third-party tools accessing the GBIF search API)

When created, a derived dataset is assigned a unique DOI that can used to cite the data. To [create a derived dataset](/derived-dataset/register) you will need to authenticate using a GBIF.org account and provide a list of the GBIF datasets (by DOI or datasetKey) from which the data originated, ideally with counts of how many records each dataset contributed.
<p id="thirdParty"></p>

[Back to top](#top)

### GBIF data accessed using third-party tools (e.g. rgbif, pygbif, spocc, dismo, etc.)

Accessing occurrence data from GBIF in R, Python and other programming languages is fast and easy. It is, however, important to always keep in mind that the citation requirements of the GBIF [data user agreement](/terms/data-user) **still apply**.
<p id="api"></P>

For most users, obtaining occcurrence data using the *occ_download()* function of the rgbif package is strongly recommended as this ensures that downloads are assigned DOIs for easy citation.

Tools returning results directly from the GBIF search API (e.g. spocc, dismo and the *occ_data()* and *occ_search()* functions of rgbif) will **not** assign single DOIs for data downloaded. It is up to the user to identify dataset publishers and properly acknowledge each of them when citing the data. 

> For data obtained via occurrence search API-based tools, we recommend  using a [derived dataset](#derivedDatasets) as an easy way of obtaining a DOI for citing the data.

<p id="cloud"></p>

[Back to top](#top)

### Cloud environments

GBIF makes monthly snapshots of occurrence data available for analysis in a number of cloud computing environments:

- [Microsoft Planetary Computer](https://github.com/microsoft/AIforEarthDatasets#global-biodiversity-information-facility-gbif)
- [Amazon Web Services (AWS)](https://registry.opendata.aws/gbif/)

Users accessing and/or analysing data in such cloud environment should refer to specific instructions provided in the cloud computing repositories. For analyses where data are significantly filtered, please track the `datasetKeys` used and use a [derived dataset](#derivedDatasets) record for citing the data.

[Back to top](#top)

---
<p id="nondata"></p>

### Citing non-data GBIF content

#### GBIF.org

Those wishing to cite GBIF's website in general can use the following example: 

> GBIF.org (year), *GBIF Home Page*. Available from: https://www.gbif.org `[13 January 2020]`.

#### Authored content at GBIF.org (web page)

Similarly, users can cite non-data pages on the GBIF website as, for example:

> GBIF.org (year) *Citation guidelines*. Available from https://www.gbif.org/citation-guidelines `[13 January 2020]`.

**Note**: this approach is not an accepted alternative for citing data downloads. 

#### GBIF as an infrastructure/entity

We recommend that those wishing to cite GBIF in a broader, more general context should use the following citation:

> GBIF: The Global Biodiversity Information Facility (year) *What is GBIF?*. Available from https://www.gbif.org/what-is-gbif `[13 January 2020]`.

[Back to top](#top)

Thank you!

Topic		Replies	Views
Search, download, analyze and cite (repeat if necessary) - GBIF Data Blog data-blog	15	2907	September 15, 2021
Close to 3,000 papers use GBIF-mediated data Data Use	7	3281	April 11, 2019
Data queries doi:10.15468/dl.6cxfsw doi:10.15468/dl.b9rfa7 doi:10.15468/dl.w2nndm used in Chesshire et al. 2023 were cited, but remain marked for deletion Data Use	5	551	May 21, 2023
Why (pushing) data citations (still) matter Data Use	15	6050	February 3, 2020
GBIF Literature tracking (GBIF technical support hour for Nodes) Data Publishing NodesSupportHour	2	703	October 14, 2023

Translating citation guidelines

Related topics