Suppose you have a registered museum sample (SM1234) containing 1 male, 2 females and 3 juveniles of a particular animal species. According to the specimen label, they were all collected at the same place on the same day by the same collector. As a collection manager you want to share the information about sex and life stage, but how would you do that in Darwin Core?
This post considers some of the possibilities. In all cases the event details are the same for each occurrence record. For convenience I’ve shown individualCount rather than organismQuantity plus organismQuantityType (individuals).
Split the occurrence. A Darwin Core maintainer has recommended splitting the occurrence into three separate occurrence records, like this:
occurrenceID | catalogNumber | individualCount | sex | lifeStage |
---|---|---|---|---|
ddd1a-1 | SM1234 | 1 | male | adult |
ddd1a-2 | SM1234 | 2 | female | adult |
ddd1a-3 | SM1234 | 3 | juvenile |
This solution allows sex and lifeStage to have controlled vocabularies. It could also greatly increase the number of individual records arising from a single collection or observation.
Split the occurrence and also summarise it (nested occurrences). This suggestion comes from Anne-Sophie Archambeau, Guillaume Body, Francesca Jaroszynska and Sophie Pamerlon. It proposes a new Darwin Core term, parentOccurrenceID, and uses it like this:
parentOccurrenceID | occurrenceID | catalogNumber | individualCount | sex | lifeStage |
---|---|---|---|---|---|
ddd1a | SM1234 | 6 | |||
ddd1a | ddd1a-1 | SM1234 | 1 | male | adult |
ddd1a | ddd1a-2 | SM1234 | 2 | female | adult |
ddd1a | ddd1a-3 | SM1234 | 3 | juvenile |
Controlled vocabularies still apply. With this proposal, data publishers could have separate occurrence records for individuals from a group, for example if one of the females had an associated image or an associated DNA sequence.
Pack the information into the sex or lifeStage field. This is by far the most popular choice, and can be seen in innumerable Darwin Core datasets in GBIF:
occurrenceID | catalogNumber | individualCount | sex | lifeStage |
---|---|---|---|---|
ddd1a | SM1234 | 6 | 1 male, 2 females, 3 juveniles |
or
occurrenceID | catalogNumber | individualCount | sex | lifeStage |
---|---|---|---|---|
ddd1a | SM1234 | 6 | 1 male, 2 females, 3 juveniles |
No controlled vocabularies here, but this possibility requires the least work from the collection manager’s point of view, and accords with a “one sample, one record” principle.
Pack the information into an organismRemarks field. I’ve used this in a Darwin Core dataset of millipede records:
occurrenceID | catalogNumber | individualCount | sex | lifeStage | organismRemarks |
---|---|---|---|---|---|
ddd1a | SM1234 | 6 | 1 male | 2 female | 3 juvenile |
The entry style is the one recommended for multiple recordedBy entries, with data items separated by [space][bar][space].
Partition the information in the sex and lifeStage fields.
occurrenceID | catalogNumber | individualCount | sex | lifeStage |
---|---|---|---|---|
ddd1a | SM1234 | 6 | 1 male, 2 females | 3 adults, 3 juveniles |
A bit puzzling at first, and would get fairly confused if juveniles were also distinguishable as male or female.
Further developments. GBIF is apparently discussing the use of “mixed” as an acceptable entry in the sex field. The blank sex field in the organismRemarks solution (above) could then be filled with “mixed”. I’m not clear on why “mixed” would not also be useful in lifeStage.
Comments welcome.
Robert Mesibov (“datafixer”); robert.mesibov@gmail.com