Confused about "bibliographicCitation"? You're not alone

The bibliographicCitation field hasn’t appeared in many of the Darwin Core datasets I’ve audited, but when it did, it usually contained the wrong data items.

A bibliographicCitation entry is metadata about a record. To quote from the Darwin Core Quick Reference Guide, a bibliographicCitation entry should be

A bibliographic reference for the resource as a statement indicating how this record should be cited (attributed) when used… The intended usage of this term in Darwin Core is to provide the preferred way to cite the resource itself - “how to cite this record”.

So bibliographicCitation is not for

(1) sources used to help with identifications. These belong in identificationReferences.

(2) sources used in a literature or online search for occurrences. These belong in associatedReferences.

It’s easy to become confused about bibliographicCitation if you were thinking that metadata belongs in the eml.xml file that accompanies a dataset in a Darwin Core archive.

That’s true, and an eml.xml file has plenty of options for storing information about a dataset, including long explanations that don’t really fit in a data table. But some metadata items belong in data tables, like the date/time of the last update of an individual record, in the modified field. Entries in the license field might also differ from record to record, although license is also a required field in eml.xml.

Then there’s meta.xml in the Darwin Core archive, which explains the structure of a data table by referring to external standards. In the case of bibliographicCitation, meta.xml points to http://purl.org/dc/terms/bibliographicCitation. That URL sends you on to https://www.dublincore.org/specifications/dublin-core/dcmi-terms/#bibliographicCitation. There you can find a brief explanation from the originators of the term bibliographicCitation, namely the librarians of the Dublin Core Metadata Initiative. (Metadata about metadata!)

The OBIS Darwin Core manual has additional advice for bibliographicCitation users:

bibliographicCitation allows for providing different citations on record level, while a single citation for the entire dataset can and should be provided in the metadata (see EML). The citation at record level can have the format of a chapter in a book, where the book is the dataset citation. The record citation will have preference over the dataset citation. We do not, however, recommend to create different citations for every record, as this will explode the number of citations and will hamper the re-use of data.

(Note: I’ve seen Darwin Core datasets in which bibliographicCitation contained a unique UUID for every single record, and in which those thousands of UUIDs were duplicated in occurrenceID. A Google search for any of those UUIDs didn’t return a result.)

So unless you have an important reason to provide citations for individual records in your dataset, you don’t need to include a bibliographicCitation field. If you do, though, please do not fill this field with the references you used for identifications, or with references to literature or online sources for your occurrence records.

I sometimes wish that Darwin Core hadn’t borrowed so faithfully from Dublin Core. Imagine if bibliographicCitation had been renamed toCiteThisRecord


Robert Mesibov (“datafixer”); robert.mesibov@gmail.com

6 Likes

@datafixer thanks for this great post to highlight how to make data better in this given DarwinCore bucket. Please then can you add an example of what is a good bibliographicCitation for a given record in a dataset? (Perhaps one from a dataset currently in GBIF?)

Hi, @Debbie. You don’t like the examples given in the Darwin Core Quick Reference Guide (Darwin Core Quick Reference Guide - Darwin Core)? OK, here’s a suggestion based on the latest occurrence dataset from GBIF’s Dodo (https://twitter.com/GBIFDodo), namely Anfibios de la Ecoreserva San Antero (San Antero, Córdoba) - Proyecto Ecoreservas.

The citation for this dataset (in the EML) is “Instituto de Investigación de Recursos Biológicos Alexander von Humboldt, Oleoducto Bicentenario de Colombia S.A.S., Cenit Transporte y Logística de Hidrocarburos S.A.S., Ecopetrol S.A. (2023). Anfibios de la Ecoreserva San Antero (San Antero, Córdoba) - Proyecto Ecoreservas. \https://doi.org/10.15472/op7ztx”.

[I’ve put a “\” before the DOI link to prevent this forum platform turning the URL into a title]

A representative record from this dataset is Occurrence Detail 4109994366, which has the unique occurrenceID “OBC-IAvH:ECORESERVAS:SANANTERO:AMPHIBIA:ESPECIMENPRESERVADO:I2D-BIO_2023_022:001”.

A bibliographicCitation for this particular record could therefore be

“Instituto de Investigación de Recursos Biológicos Alexander von Humboldt, Oleoducto Bicentenario de Colombia S.A.S., Cenit Transporte y Logística de Hidrocarburos S.A.S., Ecopetrol S.A. (2023). Anfibios de la Ecoreserva San Antero (San Antero, Córdoba) - Proyecto Ecoreservas. \https://doi.org/10.15472/op7ztx. OBC-IAvH:ECORESERVAS:SANANTERO:AMPHIBIA:ESPECIMENPRESERVADO:I2D-BIO_2023_022:001”

There would be other ways to create a bibliographicCitation, but this one (dataset citation + unique record ID) only requires a little work, because it uses pre-existing data items.

2 Likes

Voila! Thanks @datafixer putting the example with the conversation is a fabulous way to help this post do more. Those who read this now get the “what not to do” and a “good working example” of better practice all in one lovely thread. Context really helps and now we can reference this post when we’re explaining this to others.

@tuco perhaps we can link this to our dwc-qa list in the wiki. Seem reasonable? I could create a “bibiliographicCitation” wiki entry.

@Debbie @tuco Think of this post as a suggestion for Darwin Core: The Missing Manual, a publication I’ve been hoping to see listed on GBIF’s standards page someday.

1 Like

For me @datafixer, related idea I’ve expressed lots of times, is for a tool that lets us easily see things like

  • good examples
  • not good examples (the not case can be very helpful for understanding (as you know)

I’d like to also be able to (easily) see, for example

  • for given group (say, paleo)
  • show me the distinct dwc terms that paleo datasets use (including those in extensions)
  • show me distinct values (with counts) for a given dwc term that wants a controlled vocab
  • make it so that I can see those cases easily (browse the hits, in other words).

What would I do with this? Why useful?
It’s an eas(ier) way to show people
a) the realities of what’s inside those various buckets
b) it makes it eas(ier) to see what your neighbor is / is not doing
c) it makes it easier to get to “aha” moments of what’s possible / what’s hard to do well
d) it makes all sorts of metrics and quality efforts measurable. We can see / count if “distinct” values count is going down as we could posit it might … with work to share better practices, good examples, and better options and validations in our local CMS.
e) it could easily be part of our data validation efforts / tools / methods

Your idea of The Missing Manual – I love it. And I’d imagine something like what I just described above to be in it. You could also imagine such a tool being part of what would help us (finally?) do annotations and track their downstream effects.

(Anyone Intrigued?)

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.