Annotations are a way to convey information about a resource or associations between resources (see PLOS paper). Common uses of annotations are to bring the scientific names of specimens up-to-date to conform with current classification and nomenclatural concepts, to dispute/correct the identification of a specimen and/or to make comments on and correct locality, georeference or other specimen information. Scientists and curators want to annotate specimens with the latest opinions and determinations, and they want to see what has been annotated in the past and their status. Collection managers want to be able to review and optionally accept such annotations back into their systems as updates to information already there.
Classic annotation including taxonomic identifications, phenotypic observation, type status, etc., involves associating slips of paper with the physical specimen and is the key documentation path (provenance) for knowledge to move through generations of researchers. However with the advent of high resolution images, this has quickly become less practical and non-scalable.The problem is that the invaluable information in these annotations is often not digitized nor available for researchers to discover without visiting or taking loans from a collection.
Annotation is reliant on round tripping data which we currently cannot do at scale. We are currently able to easily publish data to aggregators thereby making the data publicly accessible to the larger community. When a user works with data, they often have updates and/or suggestions but there is no simple way to send this information back to the collections, especially at scale. Currently many suggested annotations are sent to aggregators (iDigBio, ALA, GBIF) but the aggregators do not have access to the data from the provider (they have a cached copy) in order to make changes nor do they have a mechanism to easily share the annotations with the provider. This is particularly a problem in botany where duplicates of the same collection may be deposited in many institutions - how does one send the same information to many institutions?
For the purpose of this consultation we will address annotation in two phases. First, we would like to develop some strong use cases for a digital annotation system. Secondly, we would like to hear about implementation mechanisms that range from centralized to distributed.
This thread differs from the Extending, enriching and integrating data category because it is not about linking new data types, but rather it focuses on enhancing the data that exists through correcting errors, adding knowledge through refining/enhancing data, and adding data that was not previously recorded/known. Therefore, the addition/integration of data types such as traits, DNA sequences and phylogenies will be considered in the Extending, enriching and integrating data category but this category will focus on annotation of information after links are made and data is available…
Annotation is the implementation of either a desire to fill in information that doesn’t exist or to correct or enhance existing information.
Extension is making available the fields of communication of the needs/wants that will develop —places to eventually be annotated.
- Practical uses cases for annotations: Tschöpe, O., Macklin, J. A., Morris, R. A., Suhrbier, L. and Berendsohn, W.G. 2014. Annotating Biodiversity Data via the Internet. Taxon 62(6):1248-1258.
- Deep Dive into the details of models: Morris, R.A., Dou L., Hanken J., Kelly, M., Lowery, D.B., Ludäscher, B., Macklin, J.A., Morris, P.J. 2013. Semantic Annotation of Mutable Data. PLoS ONE 8(11): e76093. https://doi.org/10.1371/journal.pone.0076093
- Tschöpe, O., Macklin, J.A., Morris, R.A., Suhrbier, L. and Berendsohn, W.G. (2013), Annotating biodiversity data via the Internet. Taxon, 62: 1248-1258. doi:10.12705/626.4
- Web Annotation Data Model
Questions to promote discussion
Use cases - first phase
- What is the value of a digital annotation?
- “Round-tripping” challenges: Is it necessary, worthwhile, possible to “round-trip” data back to the owner/provider?
- Social challenges: Exposing annotation histories (dirty laundry; privacy concerns); annotation “wars” (disagreement over a subject).
- Text vs objects: Should we annotate images and other media as well?
Implementation – second phase
- Is a global annotation store necessary or are there other models such as regional or local?
- Technical challenges of implementation (i.e. distributed vs. centralized); challenge of “pushing” data back to the provider in a form that they can easily assess and digest it (CMSs not built to do this).
- What standards and provenance are required to implement an annotation network?
- Scaling issues, what is most important to track: determinations, georeferences…?