Is <creator> required?

In the DwC-A we generate for GBIF at iNat, I am listed as the <creator>. I think this is why “Ueda K” appears at the top of our dataset page and why the citation GBIF recommends for each individual occurrence we publish starts with “Ueda K.” I find this very embarrassing since I am not the person who created the dataset as a whole or the individual records (you would think the citation for individual records would attribute the person in the recordedBy field). I think when we originally set the DwC-A up we just assumed the <creator> could be the same as the <contact> and didn’t give it much more thought.

So my questions are

  1. If I just omit the <creator> tag in metadata.eml.xml will the archive still be valid and ingested by GBIF?
  2. Are we supposed to assign an individual person as the creator? Would an institution be more appropriate and if so should that be iNaturalist itself, or the legal entity that we are a part of (the California Academy of Sciences)?

Hi @kueda
Other citizen science initiatives like eBird (EOD – eBird Observation Dataset) have chosen to have multiple contributors. You could for example, list the people shown on that page: About · iNaturalist

You could publish a dataset without a creator but the citation would automatically use the next type of contact (see this FAQ for more information). Plus we strongly advise against publishing datasets without any contact.

Having iNaturalist as creator, although possible, isn’t ideal. We would rather have people whenever possible as it is sometimes necessary for GBIF to get in touch with dataset administrators, e.g. to resolve issues around indexing, licenses, data content or data use requirements, etc.

I don’t know if individual citations could be crafted for occurrences based on the recordedBy. Perhaps this is a question for @trobertson?

Over all, the best would probably be to add the other people who worked to make the iNaturalist dataset available on GBIF (like other publishers have done).

Isn’t that what the <contact> is for? What would happen if we just had

<creator>
  <organizationName>iNaturalist</organizationName>
</creator>
<contact>
  <organizationName>iNaturalist</organizationName>
  <individualName>
    <givenName>Ken-ichi</givenName>
    <surName>Ueda</surName>
  </individualName>
  <electronicMailAddress>my.email@address.org</electronicMailAddress>
</contact>

Would the citation then be attributed to “iNaturalist”, while also including the contact info of a real person? That seems ideal to me, unless there some other reason GBIF needs a real person <creator> other than for contact purposes.

FWIW, the schema says,

The creator is the person who created the resource (not necessarily the author of this metadata about the resource). This is the person or institution to contact with questions about the use, interpretation of a dataset.

No member of iNaturalist staff, nor the staff collectively, created the content of the DwC-A that we publish. Individual iNat users create the content, and it wouldn’t be practical to list every one of them in the metadata. I see that another purpose of this element is to provide contact info, but that seems redundant with the <contact> element. Next to that there’s this comment:

Current primary contact for the dataset. The creator of the resource might be dead, left the organisation or doesnt want to be bothered.

That suggests to me that the <creator> is really the author of the original content, not necessarily the person to talk to about content, licensing, etc.

@kueda you are right, I think it would make more sense that way.
You would still need to have a <individualName> for the creator but it can be “iNaturalist”.
I tested this modified eml file (eml.txt (28.4 KB)
) in our test system and it showed a citation without your name while keeping you as primary contact.

I made the changes we discussed here and it’s looking better, but my name is still being listed with the citation for the archive. Can you recommend any other way to remove that? If not I think I’m just going to replace the metadataProvider with something like “iNaturalist Admin.”

FWIW, we just received a complaint from someone trying to cite an individual occurrence on GBIF representing an iNat observation they created. They were very confused by this recommended citation

iNaturalist users, Ueda K (2021). iNaturalist Research-grade Observations. iNaturalist.org. Occurrence dataset iNaturalist Research-grade Observations accessed via GBIF.org on 2021-10-29. Occurrence Detail 3383911335

because they were the person who recorded the data, not “Ueda K”. This citation is kind of weird, since it is really a citation for the entire dataset, not that individual record.

Hi @kueda, I think the easiest is probably to change your name to “iNaturalist Admin”.

The occurrence citations include the dataset DOI so we can link the citations to the dataset (given that occurrences don’t have DOIs). Perhaps @dnoesgaard can explain a bit more how it works.

Note that we also have suggested attributions for images that contain the user name (see the example below).

I’m not sure if this would work on a general level or for other datasets, but couldn’t we adjust the citation string for individual occurrences to reflect record-level recordedBy and/or identifiedBy, e.g.,

Martin Reith (2021) Wercklea hottensis observation via iNaturalist. Accessed via GBIF.org on 2021-11-02. https://www.gbif.org/occurrence/3383911335

And yes, citing the DOI is important, but for someone citing a single record, it makes more sense to have the URL (if you had to choose one). Also, the dataset can be derived from the occurrence key (if it still exists).

IMO, recordedBy is the right agent to cite for individual records, so I like your proposal.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.