Annotating specimens and other data

Hi Deb:

Interesting situation that takes the concept one step further than most CMs might consider. You are right - the ideal would be to get the data back to the providers but are the providers even acknowledged in the publication? I see the attribution problem rearing its ugly head as well…



I see - thanks for the link. I agree it’s a problem that we, as a community, need to try to address. Yes, one can only do so much in a day, but when one has data “out there”, one has to think of being a good steward of that data where possible. Might be a good question to pose in a survey of CMs - where do they feel this task should fall amongst their other tasks??


Hi @Rich87 Let’s dream big here. We know that some CMS will be able to manage, others we know will not. So even if some collections want to be the authoritative source (as @jegelewicz posits for ARCTOS), it will not be possible for some of them, at least as we currently stand. But, they will be (usually) the repositories of the physical objects and always the arbiters of access to those specimens and other related products (e.g. sensitive data not published, derivative objects, etc). I think they’d remain in control of what data they want to maintain / what annotations they want to accept / ingest.

  • So when we think about designing and implementing annotations (and credit tracking) we have to think about how digital information about an object, gets linked with other information about that same object or its derivatives and who needs recognizing – AND how automated can we make this?
  • We definitely need to get past “Acknowledgements” in a powerpoint slide or paragraph at the end of a paper – to something we can track / measure / cite. We in this case includes the collectors, those doing identifications, collection managers, database managers, preparators, georeferencers, the collections themselves, etc. We need systems that facilitate (require?) the use of QID / ORCID (or similar) wherever possible. In short, we need identifiers, for everyone and everything and they need to be included in publications. Pensoft is supporting “semantic publishing” to accomplish at least some of these ideas / needs at the publication level.
  • Let’s see attribution as an opportunity @Rich87! So much potential for finally seeing credit where it’s long overdue!
1 Like

Agree. See what I was writing as you posted this! Analyzing/mining specimen data for novel applications - #9 by jegelewicz

I think this would be very interesting information, along with “Is this task part of your annual evaluation?”

1 Like

I suspect DO/ES will require a change in frame of reference for collections folks and will represent two scopes, wherein CMSs will represent the authoritative database for an individual collection and a global dataset will represent the global version of the CMS specimens, with annotations and additions from many human and machine sources. As the global dataset grows, it is questionable whether CMs will want or have the capacity to accommodate what BCoN calls all extended data. Hence, round-tripping is likely to become the purview of CMs as they select the data elements they wish to incorporate into their CMS.

1 Like

Hi Rich, we do have some related data about this from SPNHC-TDWG 2018 and SPNHC 2019 Symposia and related survey data …

I would posit they would all love to do this, but it’s being added to the huge list of responsibilities / roles they are already expected to do (see @jegelewicz presentation from SPNHC-TDWG2018)

  • Do we need more career paths?
  • How much of this work can be automated? Jobs / careers evolve. A nurse of the 1930’s is not the same as a nurse today. See the talk by McCuller and Hogue about the huge changes in a the “Collection Manager” role over 20 years, SPNHC 2020 symposium.
  • What does a museum human infrastructure need to function now that they’re managing 2 collections (physical and digital) – which require specific skills and knowledge.
  • What happens with small collections in this respect? They can be more agile (less administrators to liaison with or navigate), but fewer resources too.
  • Can we network some of these tasks around annotation (e.g vetting people identifiers and people data?). In other words, can we network our expertise? (I think maybe we have to).
  • @Rich87 does museum leadership recognize the need to invest in annotations? That the annotation and attribution pieces hold much promise for improving awareness of collections, collections expertise, and value of these, and so faciliate FAIR too?
1 Like

One issue is that we can talk amongst ourselves all we want, but those who hold the purse strings need to be in on the conversation. We need more directors, faculty, provosts, and the like to understand what is required. This is often not easy for collections staff to do on their own. I have considered writing AAM on the need for updates to their Collections Stewardship Core Standard to include something about collections data management and collaboration with the wider community of collections, which might partially be implied by:

A system of documentation, records management and inventory is in effect to describe each object and its acquisition (permanent or temporary), current condition and location and movement into, out of and within the museum.

but that seems dated.

1 Like

attribution and repatriation are tied to each other

1 Like

Certainly. I would love to see all collections have a collection data manager - even if that person is shared by several museums (which seems doable, especially now that we can all work remotely).

1 Like

I concur. It now, one hopes, might do much more than is encompassed in that quote. It’s about more than tracking the physical object.

1 Like

This is the same with the annotations in ALA. I like it that way, if only because not all Collection Management Systems – even the best – do not handle annotations (or not all types of annotations) well and having the annotations taken care of by the regional infrastructures, like DiSSCo, ALA etc., makes that also lesser-resourced collections can benefit from them.

I am not a fan of “round-tripping” annotations either. I know that some (most) of my colleague Australian collections database managers would like to write annotations back to their CMS, but I personally do not. We copied some of the geo-reference related data quality assertions into our CMS (not as annotations) at the start of the first lock-down, so that our curation officers who had to work from home could verify the coordinates without having to go back and forth between CMS and ALA for every record. Also, collections can subscribe to alerts on annotations on their records, which all AVH providers do, so you can work very well with annotations without “round-tripping”. The annotations in the regional infrastructures will be a lot more F.A.I.R. than in individual collections’ CMS.

On the other hand, I would like to be able to get annotations from our CMS and other applications (we make a lot of annotations in our online Flora for example) into the regional infrastructures. This means that I would like my CMS to be able to handle annotations other than determinations.

Whether this should be done globally? If this means if it should be possible to annotate records in GBIF, I think that would be nice, as then collections that do not have access to regional infrastructures that provide annotation services can benefit from annotations (I mostly see annotations as value-adding). If this means if all annotating should be done in one place globally, I would rather see a global service that gives access to annotations from wherever they are made – or rather a more limited list of annotation services that would all use the same standard.

If we have all this in place, CMS can use external annotation services and there will be no need to store annotations that are made elsewhere in the CMS.

thanks @NielsKlazenga

agree best to get data back to publisher, if that can’t be done what are the alternatives. GBIF has dataset from literature that we can cluster with the specimen so that the relationship between them is discoverable. Is there value in publishing the data as occurrences that can get clustered and then discoverable? Does that make a mess of duplicates having many occurrences with only partial data? Is that a problem or a solution? Annotating those new data tot he original record in a global or regional store seems plausible if we build one.

If we are looking at more integrated institutional data and the implementation of annotation how do you manage which annotations individual institutions accept. without this you would drive information separation not normalisation


I would assume that collection managers who subscribe to such alerts have a goal different from subscribing to a news feed that would be read and discarded. Some annotations might be valuable and worth drawing-in into the CMS through copy/paste or some other means. So, I think it’s worth making a distinction between fully automated ingestion vs curatorial oversight vs precisely what is drawn-in.

I too doubt that any CMS today is prepared to store annotations in their native structure from whence they were created. That is a heavy technical requirement. Rather, I fully expect that collections managers would cherry-pick items within remotely stored annotations that are deemed relevant and useful because they are knowledgable about their local data structures & would know precisely where new or amended pieces of information could reside. However, if we want to drive positive feedback loops, there must be some way for the CMS to additionally indicate from where & by whom the annotation was created. If we fully sever the communication & do not acknowledge the parties responsible for having made enhancements that a collection manager decided was locally useful (or outright rejected), is there any point really to having an annotation store?

1 Like


Your description sounds very close to what I am doing to update records at MICH based on finding bits of data elsewhere. A new determination is an easy add to our CMS. For georeferences, I often add a short phrase to a “georeference remarks” field something to the effect of “georef information from duplicate collection at IND”.

@Rich87 Hi, Rich. I expect such workflows are very common throughout our community. We also have a breadth of “solutions” for end-users to report or comment on the data we share ranging from email (gasp!) to GitHub tickets with pre-constructed templates. You have to be a very impassioned user to continually provide feedback when there is no evidence that anyone took action or worse, if your feedback was even received at all. At the very least, I would hope that an annotation store can coordinate the accepted/rejected notifications/signals between recipient and sender if indeed there is a recipient and the annotation is something that they deemed useful and drew-in to their CMS.

1 Like

thanks everyone, please see a summary here with some questions to focus on use cases

Daily Summaries - 3. Annotating specimens and other data

One use case that I have encountered on several occasions comes from my own floristics work. I have used the SEINet and Consortium of Pacific Northwest Herbaria portals while writing treatments of the Caryophyllaceae for floras of New Mexico and Oregon respectively. They are especially handy for being able to view specimens from herbaria (some of which I have not known of), esp. of taxa that I wouldn’t expect to find and for finding county records that might expand a species distribution. IIn both cases, I have found quite a few misidentifications; in my most recent work, I have a file of 92 misidentified New Mexico specimens from 25 different herbaria. I’d like to get these corrections “into the system” so that others may, in this case, not be mislead by misdeterminations, but given my current options of reporting either individually via a comment to each portal record, or composing 25 different e-mails, it’s unlikely I will get to reporting my findings anytime soon. Is there a better way???


My dream research related annotation project is to measure all specimens and annotate them with leaf, stem and inflorescence character measurements. Methodological/technical progress is being made in many groups. In Acacia with over a 1000 species and there are currently 46,700 specimen images with most but not all species represented. Can a large scale morphometric study could recapitulate the taxonomy? With all of these specimens sequenced in a phylogeny too (and distributions) we could learn a lot about evolution. Is this a broad case study across tree of life that would would appeal to funders?

What is a high level use case that our broad community can support and use to support annotations infrastructure development?