Annotating specimens and other data

agree best to get data back to publisher, if that can’t be done what are the alternatives. GBIF has dataset from literature that we can cluster with the specimen so that the relationship between them is discoverable. Is there value in publishing the data as occurrences that can get clustered and then discoverable? Does that make a mess of duplicates having many occurrences with only partial data? Is that a problem or a solution? Annotating those new data tot he original record in a global or regional store seems plausible if we build one.

If we are looking at more integrated institutional data and the implementation of annotation how do you manage which annotations individual institutions accept. without this you would drive information separation not normalisation


I would assume that collection managers who subscribe to such alerts have a goal different from subscribing to a news feed that would be read and discarded. Some annotations might be valuable and worth drawing-in into the CMS through copy/paste or some other means. So, I think it’s worth making a distinction between fully automated ingestion vs curatorial oversight vs precisely what is drawn-in.

I too doubt that any CMS today is prepared to store annotations in their native structure from whence they were created. That is a heavy technical requirement. Rather, I fully expect that collections managers would cherry-pick items within remotely stored annotations that are deemed relevant and useful because they are knowledgable about their local data structures & would know precisely where new or amended pieces of information could reside. However, if we want to drive positive feedback loops, there must be some way for the CMS to additionally indicate from where & by whom the annotation was created. If we fully sever the communication & do not acknowledge the parties responsible for having made enhancements that a collection manager decided was locally useful (or outright rejected), is there any point really to having an annotation store?

1 Like


Your description sounds very close to what I am doing to update records at MICH based on finding bits of data elsewhere. A new determination is an easy add to our CMS. For georeferences, I often add a short phrase to a “georeference remarks” field something to the effect of “georef information from duplicate collection at IND”.

@Rich87 Hi, Rich. I expect such workflows are very common throughout our community. We also have a breadth of “solutions” for end-users to report or comment on the data we share ranging from email (gasp!) to GitHub tickets with pre-constructed templates. You have to be a very impassioned user to continually provide feedback when there is no evidence that anyone took action or worse, if your feedback was even received at all. At the very least, I would hope that an annotation store can coordinate the accepted/rejected notifications/signals between recipient and sender if indeed there is a recipient and the annotation is something that they deemed useful and drew-in to their CMS.

1 Like

thanks everyone, please see a summary here with some questions to focus on use cases

Daily Summaries - 3. Annotating specimens and other data

One use case that I have encountered on several occasions comes from my own floristics work. I have used the SEINet and Consortium of Pacific Northwest Herbaria portals while writing treatments of the Caryophyllaceae for floras of New Mexico and Oregon respectively. They are especially handy for being able to view specimens from herbaria (some of which I have not known of), esp. of taxa that I wouldn’t expect to find and for finding county records that might expand a species distribution. IIn both cases, I have found quite a few misidentifications; in my most recent work, I have a file of 92 misidentified New Mexico specimens from 25 different herbaria. I’d like to get these corrections “into the system” so that others may, in this case, not be mislead by misdeterminations, but given my current options of reporting either individually via a comment to each portal record, or composing 25 different e-mails, it’s unlikely I will get to reporting my findings anytime soon. Is there a better way???


My dream research related annotation project is to measure all specimens and annotate them with leaf, stem and inflorescence character measurements. Methodological/technical progress is being made in many groups. In Acacia with over a 1000 species and there are currently 46,700 specimen images with most but not all species represented. Can a large scale morphometric study could recapitulate the taxonomy? With all of these specimens sequenced in a phylogeny too (and distributions) we could learn a lot about evolution. Is this a broad case study across tree of life that would would appeal to funders?

What is a high level use case that our broad community can support and use to support annotations infrastructure development?


Another plant-based research use case that I like is to score phenology across major groups (of course, some are easier to score than others). This would generate a massive data set to evaluate change in flowering time, a key indicator of climate change that has implications on ecosystem services like pollination. Of course, this can be put in an evolutionary context too. There have been some great regional studies done but in a more global context this would be powerful (Nice California example: This can be a great citizen science project (a few tools have already been developed) and at some point we may be able to get at least some semi-automated results from image analysis/AI approaches. To my knowledge, the tools do not use an annotation framework but spreadsheets with the data are available and in other cases the data is being recorded directly into CMSs.


Two comments on the use cases suggested by James and Joe:

  • these are good examples of what can be done with “our” data that are beyond “typical” uses we often think about.
  • they, esp. Joe’s, might be very difficult to capture in a CMS. Given the problem of “too much to do” that has been raised already, I can see incorporating “extended” specimen information somewhere “down the list”…
1 Like

@Rich87 Yes, good point. This is the reason that we need a global annotation store that can house these assertions, no matter where they are generated. The assertions can then be searched/mined by researchers to address these research use cases. Thus, they do not rely on the information being stored in a CMS, nor do they rely on a given aggregator to store this information. However, a collection manager can also access the annotation store and query for information that has been added about their specimens. They can choose to download this information in a standardized form and “push” it into their CMS either manually or potentially at least semi-automatically. This is what we envisioned for the FilteredPush network. Of course, the reality here is that the annotation store would get very big, very quickly and we would need the cyberinfrastructure to support this :wink:

Is this an area where we could seek to apply emerging but off-the-shelf tech, rather than having to invest, build and maintain it ourselves? Thinking specifically of, though other options may exist.

1 Like

I see another question arising - can annotations placed in such a global store be edited and if so, by whom??

I wonder if we have practical examples of annotation systems and storage solutions for them. In particular, it would be nice to see how and how much those annotations are actually used.

1 Like

I would think annotations should not be edited by other than the contributor. An additional annotation correcting or commenting on a previous annotation is the way it is done in the collection.


@giln Exactly! Annotations would not be editable. Anyone can make a new annotation and assert evidence that a previous annotation is incorrect or more knowledge is available, etc. However, in FilteredPush we did have the concept of annotation “conversations” where information is iteratively improved through additional evidence being provided by other agents/machines (mostly related to data quality improvement).


Yes, I certainly think some proof of concepts or pilots would be valuable. There are several annotation systems out there with quite different approaches. We can dive more deeply into that conversation next week.

1 Like

@jmacklin That’s my hope as well. I know of one instance where data submitted to an aggregator was georeferenced via permissions of aggregator without the knowledge of the data owners - hopefully an aberation.

@Rich87 I was just considering what the equivalent in the physical sense would be to editing an annotation… In the extreme, this might mean making the piece of paper the annotation was on “disappear.” Sadly, I am sure this happens and this is where an image is invaluable (at least in the botanical case). But, I do recall from my CM days seeing an annotation crossed out and someone’s handwriting saying “NO!!!” beside it :grimacing:


@jmacklin @JoeMiller : And that (conveniently…) gets us back to imaging. Should one reimage a specimens each time it is annotated? The easy answer would be “yes”, but…

We’ve have that discussion here at MICH and, beyond taking the image, there is the matter of curating the image and making sure the “current” version is available wherever the images can be viewed. In a collection where getting the initial image taken is/was an accomplishment, re-imaging might be seen as not even possible.