Summaries - 3. Annotating specimens and other data

This is the compilation of daily summaries written by the topic facilitators. The goal of this page is to orient new readers of the topic. Please go to the thread to comment.

Go to Annotating specimens and other data

Summary Number 2 March 1, 2021

There has been a good and lively continuation of the discussion on annotations.

Several research use cases were suggested involving measurements of traits and states like phenology (California Phenology Network, Morphobank, TRY). The idea being that these could provide a scientific rationale for accessing funding to develop an annotation system. This led to a discussion about how these kinds of data would be captured and stored as annotations. Several pointed out that we don’t have a standardized approach in our community to incorporating/integrating these data types in our systems and that there are opportunities to work with other relevant communities to do so. This lends itself well to the ES/DS concept as this important research data must remain linked to the source specimen no matter where or how it resides.

There was also a discussion about possibilities for implementation of an annotation system. Most agreed that we should start simple with dataset or record-level text blobs/strings that contain new knowledge or corrections that enhance the data. Data curators of systems would then at least be able to get information about their specimens but ingesting it would still require manual effort. There was interest in how many annotations had been generated in systems where this function existed and in general the numbers were low, however it was acknowledged that this could easily grow if machines/services were involved. The concept of trust of the information provided in an annotation and potential enforcement of having data records annotated in the CMS of the owning institution were raised but more discussion on these are needed.

Summary Number 1 Feb 17, 2021

Thanks for the initial wave of comments.

Use case example:
Image annotation, including text documents stored as media, seen as critical. Images can hold many types of data including text and may require other methods to integrate but provide valuable information that may be integrated differently.

Best case is that images are integrated with the metadata of the specimen(s) they refer to. If this can’t easily be done in volume, should image datasets of specimens be published separately and linked retroactively via text matching?

There was discussion in the thread on where annotations are best applied. Some think that annotations (especially machine derived) will overwhelm local databases ability or desire to update local records. Many would prefer that curators and collections managers be empowered to pick and choose which annotations to take back to the collections. At this point in our global process there is not unanimity that all annotations should be updated at the local database. There appears to be a role for regional or global annotation data stores.

Does it matter where the definitive version of the data is retained as long as access to available? Perhaps the authoritative/definitive version is not the way to look at it. A goal is to allow everyone access to the data and annotations and let them decide what to use. Is that reasonable?

During the first week of the consultation we wanted to focus on use cases.

What is the use case that gives the highest value to well annotated up to date specimen data?
What are the most important use cases at the collection, is it critical for the collection?
What are the most important use cases for data users?

What question could we address if we had specimens annotated by machines that we can’t address now?
What is the use case that would convince a funding source?
What is the use case that would be most valuable to policy makers (climate change, biodiversity loss)?