Is <creator> required?

In the DwC-A we generate for GBIF at iNat, I am listed as the <creator>. I think this is why “Ueda K” appears at the top of our dataset page and why the citation GBIF recommends for each individual occurrence we publish starts with “Ueda K.” I find this very embarrassing since I am not the person who created the dataset as a whole or the individual records (you would think the citation for individual records would attribute the person in the recordedBy field). I think when we originally set the DwC-A up we just assumed the <creator> could be the same as the <contact> and didn’t give it much more thought.

So my questions are

  1. If I just omit the <creator> tag in metadata.eml.xml will the archive still be valid and ingested by GBIF?
  2. Are we supposed to assign an individual person as the creator? Would an institution be more appropriate and if so should that be iNaturalist itself, or the legal entity that we are a part of (the California Academy of Sciences)?

Hi @kueda
Other citizen science initiatives like eBird (EOD – eBird Observation Dataset) have chosen to have multiple contributors. You could for example, list the people shown on that page: About · iNaturalist

You could publish a dataset without a creator but the citation would automatically use the next type of contact (see this FAQ for more information). Plus we strongly advise against publishing datasets without any contact.

Having iNaturalist as creator, although possible, isn’t ideal. We would rather have people whenever possible as it is sometimes necessary for GBIF to get in touch with dataset administrators, e.g. to resolve issues around indexing, licenses, data content or data use requirements, etc.

I don’t know if individual citations could be crafted for occurrences based on the recordedBy. Perhaps this is a question for @trobertson?

Over all, the best would probably be to add the other people who worked to make the iNaturalist dataset available on GBIF (like other publishers have done).

Isn’t that what the <contact> is for? What would happen if we just had

<creator>
  <organizationName>iNaturalist</organizationName>
</creator>
<contact>
  <organizationName>iNaturalist</organizationName>
  <individualName>
    <givenName>Ken-ichi</givenName>
    <surName>Ueda</surName>
  </individualName>
  <electronicMailAddress>my.email@address.org</electronicMailAddress>
</contact>

Would the citation then be attributed to “iNaturalist”, while also including the contact info of a real person? That seems ideal to me, unless there some other reason GBIF needs a real person <creator> other than for contact purposes.

FWIW, the schema says,

The creator is the person who created the resource (not necessarily the author of this metadata about the resource). This is the person or institution to contact with questions about the use, interpretation of a dataset.

No member of iNaturalist staff, nor the staff collectively, created the content of the DwC-A that we publish. Individual iNat users create the content, and it wouldn’t be practical to list every one of them in the metadata. I see that another purpose of this element is to provide contact info, but that seems redundant with the <contact> element. Next to that there’s this comment:

Current primary contact for the dataset. The creator of the resource might be dead, left the organisation or doesnt want to be bothered.

That suggests to me that the <creator> is really the author of the original content, not necessarily the person to talk to about content, licensing, etc.

@kueda you are right, I think it would make more sense that way.
You would still need to have a <individualName> for the creator but it can be “iNaturalist”.
I tested this modified eml file (eml.txt (28.4 KB)
) in our test system and it showed a citation without your name while keeping you as primary contact.