3.1. Pathways and tools for publishing collection records (TECHNOLOGY)

This is topic 3.1. in the Technology section of the Advancing the Catalogue of the World’s Natural History Collections consultation. Use this topic to discuss the questions listed below.

Existing information on collections is edited and maintained in different ways. IH allows herbaria to provide or edit their records and offers support for herbaria to provide updates via email or other channels. Other communities such as national portals have other pathways for collections to provide or update information. Several tools help data publishers to create EML metadata for publishing data to GBIF and elsewhere. These could evolve to deliver collection records in preferred formats. The Integrated Publishing Toolkit (IPT) could be enhanced to offer collection records as one of the core record types that can be shared. This would allow collections either to publish one or more collection records as a small standalone dataset or collection networks to manage and publish a dataset comprising many collection records. Wikidata could also serve as a tool or platform for editing catalogue information and making it widely accessible and reusable.

Other materials

The following contributed presentations are particularly relevant to this topic:


  • Which existing tools, databases and websites can help to mobilise and maintain collection records?
  • Is it possible to identify additional tools or pathways that need to be developed or supported?

This is quite an important topic, because ultimately the creation and maintenance of software supporting the system is one of the larger costs to providing the system at all. As we have seen many times in the past, a lack of resources for software and hardware maintenance can end a project.

Re: Wikidata: I don’t really see Wikidata being a complete solution as it is not authoritative in of itself. Without authoritative sources Wikidata would be a rather useless source of opinion, rather than fact.
Wikidata, would however be an excellent broker between different identifier systems and provide a much needed way for the community in general to provide additional data to a collection’s record.

Thanks, Quentin. The choices here matter. We would ideally like to see a single master record for each collection and to have efficient ways for the collection owners themselves to keep it current, but also to benefit from work elsewhere to document important aspects of each collection.

In 2008, TDWG, GBIF and Royal Botanic Gardens Edinburgh established the Biological Collections Index. This was an early attempt to organise various sources of informaiton on collections. It aimed for a “mix of authoritative and community sourced data by having multiple records for each collection all hanging off a single globally unique ID”. This made it relatively easy to align collection information from multiple sources (and technologies like OpenRefine might make it easier to do something similar today). However, it did not offer an obvious way to progress towards agreeing a standard master version of the information.

I think it is important to allow some sort of automated updating of this information through existing publishing mechanisms for those who are publishing their data. The collections descriptors extension is a great start and we could incorporate this into our Specify publishing routine to seamlessly allow for publishing of this information.

I fully agree with Donald here. This asks for a clear and advanced mechanism to describe provenance for a collection description.

From @maperalta in the Spanish thread

The current national system SNDB (Sistema Nacional de Datos Biológicos portal [Argentina]) can be useful for updating the data served to the catalogue. However, information about the content of a collection can amply surpass that which is available on the portal, especially for those collections that are being digitized. Collections should update data served to the catalogue regularly.

From @ErikaSalazar in this Spanish thread

The platform of the Registro Nacional de Colecciones (RNC) de Colombia updates records of biological collections in the country. These processes are done by national mandate. In this link you can see the requirements that the RNC has to register and update the Colombian biological collections: http://rnc.humboldt.org.co/wp/registro-y-actualizacion/ [in Spanish]