3.4. Collection management systems (TECHNOLOGY)

dhobern · March 27, 2020, 10:39am

This is topic 3.4. in the Technology section of the Advancing the Catalogue of the World’s Natural History Collections consultation. Use this topic to discuss the questions listed below.

Background
Most natural history collections maintain data on their specimens in a collection management system (CMS) such as Specify, Symbiota, EMu, DarWIN or BRAHMS. Some of these tools could develop to interface directly with the collection catalogue, providing up-to-date metadata and metrics.

Other materials

The following contributed materials are particularly relevant to this topic:

Questions

What present or future requirements are there for interfaces directly between CMS platforms and the collection catalogue?
Are there special opportunities that should be considered?
Could CMS platforms become a source of metadata for institutional collections within a global catalogue?

dhobern · April 16, 2020, 2:00pm

This topic was automatically opened after 9 days.

elyw · April 21, 2020, 3:24am

The missing question for me in this one is: are the vendors/developers of the various collection management systems willing to engage in this conversation and make changes according to developing standards? I’m sure some will be responsive, others less so.
It might also be another of those questions where vendors and even institutions ask - why would I store collections descriptive data in my collection management system? This was the challenge Australia faced in developing the Museum Metadata Exchange - all of the information had to be compiled before it could be published. No-one in each institution needed the information for internal purposes so it had never before been written or gathered into one place.

dhobern · April 21, 2020, 5:35am

Thanks @elyw. It will only make sense for these functions to be included within collection management systems if the following things are true:

There are benefits that come to the institutions from having good current information accessible on their collections. Maybe there are parallels here with ORCID. As ORCIDs come more and more to be a useful tool for researchers to integrate and organise information on their work, it becomes more essential to have one. If collection records come to be important linkage points for information and services that matter to collections, they will become mission-critical.
There are efficient publication pathways for collection records that can readily be integrated into the workflow of collection management systems - management of a TDWG Collection Descriptions record could make this true.

emeyke · April 21, 2020, 3:20pm

The short answer to @elyw is (and I cannot speak for others) - we definitely want to follow the standards and implement the solution based on these discussions and technical specs.

tkarim · April 21, 2020, 3:42pm

In terms of metadata, I also think this is a great idea. If this could be automated and pulled periodically from the database it is one less thing a collection manager has to do or remember to update.

In terms of metrics, there was a presentation at the SPNHC 2019 meeting about Arctos interfacing with some external sources and then providing metrics on things like publication and use for specimens. I would love to see something like that for Specify. It would make compiling annual reports easier and also make it easier to demonstrate the impact of our collection.

dhobern · April 21, 2020, 10:52pm

Thanks @emeyke and tkarim - I’m glad to see the interest in this idea. The Arctos example is a really excellent one. It shows how a collection management system can be a tool for curating information from the side of the knowledge graph - ownership and responsibility for data owned by the institution/researcher, combined with the ability to mesh with BOLD and ecological data systems. I know that some other collection management systems (including EarthCape and Specify) have similar interests.

An interlinked question is how we know which of the potentially many records describing a given collection is a) the most authoritative and b) the most current. This ties in with other topics: 3.1. Pathways and tools for publishing collection records (TECHNOLOGY) and 4.1. Ownership of information for each collection (GOVERNANCE).

pzermoglio · April 22, 2020, 2:29am

I would like to point out that the assertion:

“Most natural history collections maintain data on their specimens in a collection management system (CMS) such as Specify, Symbiota, EMu, DarWIN or BRAHMS.”

may not be exactly true in many parts of the world. Unless you can call Excel a collection management system…

For those using CMSs, interfaces with the catalogue would be great and save people a lot of time and effort. But we should not neglect those using simple spreadsheets, for whom other solutions would be needed (the beauty of the IPT model, allowing just uploading a simple file).

trobertson · April 22, 2020, 7:44am

Thanks @pzermoglio. This is very much in line with our thinking on the sketch we put together. Going beyond even the IPT model which requires someone to install a server, having some simple web forms to fill in, along with the ability to upload Excel files with standardised field heading could really lower the technical threshold for many to have an online search portal. Excel and databases like FileMaker are commonplace and we need a technical solution that allows easy participation. I propose we consider offering that natively in the catalogue itself.

emeyke · April 22, 2020, 10:40am

@trobertson if you are talking about just collections catalogue and maintaining own collection record, than Excel upload seems like a bit of overkill. Simple web form for creating/updating the record should do it.

trobertson · April 22, 2020, 11:18am

Thanks @emeyke, Sorry that was not clear. I meant web forms for descriptive content and excel for specimen records.

emeyke · April 22, 2020, 11:33am

Well this is interesting and I might have missed a big discussion on that. Are we talking about direct upload of occurrence datasets via csv/excel files into GBIF? We can take this elsewhere as it seems to be drifting off the current topic.

Rich87 · April 22, 2020, 5:57pm

One aspect where a link between a CMS and a Global Catalogue might be useful is to provide metadata on digitization progress in a collection. As an example, Index Herbariorum entries now have this optional set of fields (data shown for NY Botanical Garden):

Drawing similar data directly from an institution’s CMS, if possible, should reduce the effort needed in keeping this type of information at least somewhat current.

abentley · April 22, 2020, 9:31pm

Specify software certainly is. We have most of the relevant tables and fields that could accommodate this information an successfully publish it through existing IPT and xml infrastructure.

trobertson · April 23, 2020, 7:58am

Thanks, @Rich87. Please know GBIF are actively working on bringing this into GRSciColl and it is available in the API already. For example, you can see these counts in the collectionSummary field at the bottom of the response for NY Botanical Garden. Today this is populated only for records sync’ed from IH records (happens automatically now) but will be expanded for more collections and moved into the user interface.

This is limited of course, and more expressive descriptors should be available when metadata in (N)CD Standard is provided.

trobertson · April 23, 2020, 8:06am

Thanks, @abentley. I have noticed a few mentions on threads of the IPT (GBIF’s publishing tool) and the associated dataset metadata which is an extension of the Ecological Metadata Language (EML) to bring in elements of the Natural Collections Descriptions standard from that time.

Given the existence of this operational infrastructure today, I wonder if we should consider exploring a revision of the EML profile in use by GBIF and others, along with promotion of metadata-only resource sharing as @dhobern described in this thread.

waddink · April 23, 2020, 8:15pm

Relevant here is also MIDS (minimum information about a digital specimen) that is under development and describes digitization status, planned to be included in TDWG CD.

Rich87 · April 26, 2020, 11:30pm

Thanks waddink.

I am very curious about the MIDS standard - where could I get more information about that standard? I think this is a very pertinent issue that likely is of interest to quite a few people in our community.

emeyke · April 27, 2020, 8:43am

Defintions are here: 2.5 Digitisation classification

Topic		Replies	Views
Collections catalogue (GRBio) Miscellaneous	52	6552	June 28, 2020
Darwin Core Half-Million - UPDATE Data Publishing	11	1216	December 8, 2022
The strange case(s) of the missing identity Miscellaneous	23	297	September 8, 2024
Making collection content discoverable when you don’t have occurrences published on GBIF - GBIF Data Blog Data blog	1	113	October 9, 2024
A modest proposal for the NHM Data Publishing	9	244	January 9, 2025

3.4. Collection management systems (TECHNOLOGY)

Related topics