The short answer to @elyw is (and I cannot speak for others) - we definitely want to follow the standards and implement the solution based on these discussions and technical specs.
In terms of metadata, I also think this is a great idea. If this could be automated and pulled periodically from the database it is one less thing a collection manager has to do or remember to update.
In terms of metrics, there was a presentation at the SPNHC 2019 meeting about Arctos interfacing with some external sources and then providing metrics on things like publication and use for specimens. I would love to see something like that for Specify. It would make compiling annual reports easier and also make it easier to demonstrate the impact of our collection.
Thanks @emeyke and tkarim - I’m glad to see the interest in this idea. The Arctos example is a really excellent one. It shows how a collection management system can be a tool for curating information from the side of the knowledge graph - ownership and responsibility for data owned by the institution/researcher, combined with the ability to mesh with BOLD and ecological data systems. I know that some other collection management systems (including EarthCape and Specify) have similar interests.
An interlinked question is how we know which of the potentially many records describing a given collection is a) the most authoritative and b) the most current. This ties in with other topics: 3.1. Pathways and tools for publishing collection records (TECHNOLOGY) and 4.1. Ownership of information for each collection (GOVERNANCE).
I would like to point out that the assertion:
- “Most natural history collections maintain data on their specimens in a collection management system (CMS) such as Specify, Symbiota, EMu, DarWIN or BRAHMS.”
may not be exactly true in many parts of the world. Unless you can call Excel a collection management system…
For those using CMSs, interfaces with the catalogue would be great and save people a lot of time and effort. But we should not neglect those using simple spreadsheets, for whom other solutions would be needed (the beauty of the IPT model, allowing just uploading a simple file).
Thanks @pzermoglio. This is very much in line with our thinking on the sketch we put together. Going beyond even the IPT model which requires someone to install a server, having some simple web forms to fill in, along with the ability to upload Excel files with standardised field heading could really lower the technical threshold for many to have an online search portal. Excel and databases like FileMaker are commonplace and we need a technical solution that allows easy participation. I propose we consider offering that natively in the catalogue itself.
@trobertson if you are talking about just collections catalogue and maintaining own collection record, than Excel upload seems like a bit of overkill. Simple web form for creating/updating the record should do it.
Thanks @emeyke, Sorry that was not clear. I meant web forms for descriptive content and excel for specimen records.
Well this is interesting and I might have missed a big discussion on that. Are we talking about direct upload of occurrence datasets via csv/excel files into GBIF? We can take this elsewhere as it seems to be drifting off the current topic.
One aspect where a link between a CMS and a Global Catalogue might be useful is to provide metadata on digitization progress in a collection. As an example, Index Herbariorum entries now have this optional set of fields (data shown for NY Botanical Garden):
Drawing similar data directly from an institution’s CMS, if possible, should reduce the effort needed in keeping this type of information at least somewhat current.
Specify software certainly is. We have most of the relevant tables and fields that could accommodate this information an successfully publish it through existing IPT and xml infrastructure.
Thanks, @Rich87. Please know GBIF are actively working on bringing this into GRSciColl and it is available in the API already. For example, you can see these counts in the
collectionSummary field at the bottom of the response for NY Botanical Garden. Today this is populated only for records sync’ed from IH records (happens automatically now) but will be expanded for more collections and moved into the user interface.
This is limited of course, and more expressive descriptors should be available when metadata in (N)CD Standard is provided.
Thanks, @abentley. I have noticed a few mentions on threads of the IPT (GBIF’s publishing tool) and the associated dataset metadata which is an extension of the Ecological Metadata Language (EML) to bring in elements of the Natural Collections Descriptions standard from that time.
Given the existence of this operational infrastructure today, I wonder if we should consider exploring a revision of the EML profile in use by GBIF and others, along with promotion of
metadata-only resource sharing as @dhobern described in this thread.
Relevant here is also MIDS (minimum information about a digital specimen) that is under development and describes digitization status, planned to be included in TDWG CD.
I am very curious about the MIDS standard - where could I get more information about that standard? I think this is a very pertinent issue that likely is of interest to quite a few people in our community.
Defintions are here: 2.5 Digitisation classification
I’m in hopes we have a sort of “software summit” to gather developers and some of the key users of these software platforms to have a decadal meeting (where have we come in 10 years, where do we want to go). In this meeting, we’d talk about needs / path to integrating TDWG DQ standards into CMS systems, interoperability needed / or opportunities for more interoperability. Example: Agents tables with Wikidata. And, we could move toward (or at least discuss feasibility) “metadata” tables in CMS to reduce the burden of creating, mapping, exporting, updating, publishing these data. This would also have the added benefit of simplifying annual reporting. As you point out @elyw, often this data must be compiled anew each time it’s needed locally, let alone regionally or globally. The Global Aggregated understanding of what we have, is still far from complete, and very difficult to estimate. As we continue to digitize, our estimates will continue to improve.
Thanks for joining us Rich! If you have questions about MIDS, lots of folks would be happy to discuss with you such as Elspeth Haston at RBGE. @waddink too, and me
This is an interesting idea to explore as a possible STEP 1.
If we added CD fields to it, I could see it working.
My main observation is that many of the EML fields (being free text), often result in confusing data.
Sometimes collections describe their dataset in the EML file. Other times they describe their entire collections (which may not all be included in the dataset). And sometimes, a mix. This makes it difficult to understand what’s being described in the EML w/o looking at the associated data records.
IF we could use the EML (+ CD fields) and then
a) make it clear we’re looking for metadata
b) make it possible to link to any specimen-record-level datasets being shared
This could be a first pass at a workable system I think.
Even better then, once collections have metadata tables in their databases, then a view of the table data could be linked automatically to the IPT.
How might one link a collection’s wikidata page into this vision of using the IPT for EML metadata files?
@rich87, MIDS is still work in progress, you can find the latest draft here: https://docs.google.com/document/d/1fpfn_bh2htvJl1bfDrdxd6hIZPdZnuTq9KK-cetmWRE/edit#heading=h.svg0ouedud33. Comments are welcome!
Among the various ways of providing data to the catalogue that have been mentioned in this and other topics of this consultation the integration with CDMS would be my preferred route to go.
I would like to be able to manage collection information (that is then channeled to this catalogue or other destinations) seamlessly with information on the specimens that make up these collections. Once a collection is defined in terms of the specimens that are its constituent parts or other criteria I would want automatic updates of information aggregated on the level of the collection based on the specimen-level information and I would like to be able to configure automatic publication of that collection-level data.
I would not want to compile, manage and export data on the collections’ level with a second independent system.
Also, when collections are defined according to intensional criteria, e.g., all specimens collected by a particular collector, then such integration would automatically pick up on updated knowledge about the specimens - we expect that digitization will expose a lot of such hitherto unknown specimen properties.