Summaries - 5. Analyzing/mining specimen data for novel applications

This is the compilation of daily summaries written by the topic facilitators. The goal of this page is to orient new readers of the topic. Please go to the thread to comment.

Go to Analyzing/mining specimen data for novel applications

Summary Number 2; Mar 3, 2021

Several discussions have emerged on this topic thread that point to interest in research applications at a global scale, the roles and responsibilities of collections and users of collections data, and links to annotation under an implemented digital extended specimen plan.

It’s clear that there is interest in taking advantage of an ES/DS system to revisit research questions and scale them up spatially, temporally and/or taxonomically (@jmheberling, @libby). The scale of biodiversity research is currently hindered by, among other things, the amount of manual data wrangling that is required of a researcher. Research can be made more efficient and effective through centralizing data, enhancing search capabilities, and allowing for the integration of numerous types of data (@mikewebster, @nickyn, @troberston, @mhoefft, @jbates606, @waddink). Likewise, in an implemented ES/DS, machine learning methods can be more easily leveraged to benefit fields including taxonomy and systematics, landscape-level ecology, and even collections management (@RogerBurkhalter). Further, extending specimens with SNP datasets could support reconstruction of kinship estimates and infer specimen pedigrees across time and space (@AYoung).

As the capabilities of biodiversity data grow, the community wants to ensure that data providers, collections managers, and data users are not tasked with the additional duties that an ES/DS may involve. Just because data for ecological modeling are more readily available, for example, shouldn’t necessitate that all researchers become expert modelers. Similarly in collections – a collections manager with experience in maintaining physical collections may not be best suited to maintaining the digital and extended representations of those collections.

We invite followers of this thread to also read the @Annotating Specimens thread for additional discussion on this intersecting topic.

Summary Number 1; Feb 17, 2021

Great conversations are already emerging in this thread.

Discussion kicked off on the topic of images, and several contributors identified uses of images and image data in research applications. Image-based research, however, is currently hindered by our ability to readily and reliably access images from numerous collections. Small-scale image-based research has proven to be effective and valuable, but the methods developed for focused studies are not scalable to the degree necessary to address global problems. At the crux of this issue is that researchers currently need to pull images themselves in order to process them – a task requiring considerable storage space and computing power.

Another issue that slows use of image data is that “new” data based on images, i.e. annotations of traits, measurements, etc., usually don’t have a way to be associated with the image and label data for future users. One researcher may take a measurement of an insect wing, for example, and another researcher will need to redo this work if they are interested in the same measurements. Machine learning is helping to make some of these tasks more efficient, though the issue remains that image annotations aren’t easily housed in current online databases and “round-tripping” data back to providers isn’t always easy.

As with so many aspects of the extended digital specimen, our ability to realize the full potential of specimen data relies on a workforce that is provided the training, resources, time and funding to be able to incorporate these tasks into their (likely already maxed) workload. How can the biodiversity community support collections staff and researchers in this work? How can we engage with new communities of data users to further demonstrate novel applications of collections data?