9. Workforce capacity development and inclusivity

As a humanities scholar, I think this point is highly relevant. Awareness is critical to ensuring that people understand the value and importance of collections and their management and care, including digital/specimen/management and data models that include linking etc. When I was Programme Director at the Centre for Collections Based Research at the University of Reading, I had interdisciplinary PhD students from across the humanities working in archives and collections: I always took them on a peripatetic tour of the University’s (impressive) collections. That meant going to the Herbarium and also to the Cole Zoological Collections. This really opened their eyes to the differences in collection management for, say, literary archives, and natural history collections. In some instances, this actually changed the design of their research questions and involved them more closed in nathist. All of them will remember those encounters forever. University collections have a huge role to play.

There are also senior researchers in environmental history, history of natural history, global history, and history of knowledge production who are not only aware, but spreading awareness of nathist collections and their value through their research (see below). Not only that, but many of these colleagues have the research skills and analytical ability to contribute to the kinds of data (transcriptions, collation of field notes with correspondence, etc) that are needed for the realisation of the dream of the Digital Extended Specimen. It will be critically important to include such highly skilled colleagues in the thinking behind the data models for the DES.

Further, many of these humanities colleagues who are already engaged with nathist collections are also quite advanced in processes of developing inclusion and diversity in their work – be that teaching, research, or decolonial practices. It would be great to share this kind of know-how across disciplines.

Some of us have started an international and interdisciplinary research group (albeit informal) and we have started a research blog about it. It is called ‘collection<>ecologies’ and here is the link: We are the Collection Ecologists – Collection Ecologies

But that is the tip of an iceberg – it would be worth planning an effective, well-structured consultation for GBIF with the humanities and social sciences communities. Happy to be part of that!

3 Likes

Great point! Lots of communities need different types of data and specimen data are just one type of information key to building a biodiversity knowledge base and community that serves many. Thanks for sharing your insights, much appreciated.

Aha, so you’re saying we also need to build some new networks, cross-discipline, and across political boundaries too, I’m thinking? This is one of the questions we have in 11. Partnerships to collaborate more effectively. See question there too please? For example: under “Our shared experiences” – How does your institutional affiliation affect your ability to collaborate? Is collaboration supported outside your niche space? How do you learn about potential augmented data for things in your collection? and What is your network and model and do they cross discipline, geographic, and language boundaries? If so, what makes it work?

Fascinating. Thanks Martha. You took the initiative to look outward from your position, and to help others do the same – to build a richer understanding of “connectedness” and relevance (to one’s work, degree program, and broader interests too, I suspect?). I wonder how many folks have this as part of their role in what they do?

Martha, I’m personally so excited to see this idea surface. Thank you. I’m always pushing for #STEAM (instead of #STEM) for similar reasons. I’d extend this to our Alliance for Bio community (includes GBIF). We need richer, connected networks (large and small). Do you have ideas for existing network models (collaborations like this that you know about already as examples?) If yes, could you share them here: What is your network and model and do they cross discipline, geographic, and language boundaries? If so, what makes it work?

This is great to hear. We definitely need to learn more about and connect with this community and find out more about the research skills and analytical ability you mention.

1 Like

A few comments on these two and related topics.

One of the barriers to ensure accessibility and equality is to acknowledge multiple perspectives or systems of knowledge. In Western natural science domain a certain way of knowing is still privileged. This is not to suggest opting for the binary trope of “western” vs “eastern” or “modern” vs “traditional” – I think the challenge is to support institutions, policies that can bring different perspectives to the same level. Scholars such as Sonya Atalay and Robin Wall Kimmerer have been using the idea of “braiding knowledge” to think about this issue. From the article Indigenous Science for a World in Crisis:

Within the context of research, braided knowledge can refer to multiple forms of braiding and weaving. Work in this area might include examination of cases in which western and Indigenous knowledge complement each other, examples of how community and university knowledge can be integrated and understanding where they are incongruent or contradict one another.

In examining the political economy of knowledge production in western academic institutions, I have found that researching, teaching, and learning are changed, transformed, and improved through various forms of braiding knowledge. I argue that bringing Indigenous concepts of knowledge production into research and teaching practices is part of a larger project of decolonizing our institutions, which is beneficial both within and outside the academy, for Indigenous and non-Indigenous peoples and multiple, diverse groups and communities. Braiding knowledge in support of science is not only an ethical imperative, it is essential for our survival as we face devastating climate change.

Our commitments and values will reflect what data sources and links, we prefer to include in our schema, what types of data training we provide to the students and researchers. The article above mentions storywork to increase science literacy and share historical knowledge. In our data intensive domain, this can mean bringing together narrative and unstructured data with structured datasets. We already do that in some extent – bringing literature and specimen data together. “Braiding” also moves our focus away from simpler, linear narrative towards complex, distributed systems.

Another article entitled Stewardship of global collective behavior recently shared by @Debbie, mentions similar ideas within the context of collective behaviour, role of algorithms, and complexity science. The article talks about how friction, noise and latency — often valued as negative elements – can play important roles in promoting cooperation. I found some similarities with the notion of “braiding” and and the following ideas:

While noise, latency, and information decay are often viewed as unwanted in other areas of study, in collective systems they can serve several important functions. Noise can disrupt gridlock and promote cooperation (100), facilitate coherence (101), and improve detection of weak signals through phenomena akin to stochastic resonance (102). Evidence from fish schools revealed that noise and decay are important for preventing the spread of false alarms (39). Further, rapid information flows may overwhelm cognitive processes and yield less accurate decisions (103, 104). Through multiple iterations of high-fidelity transmission, communication technology allows information in tweets and articles to propagate beyond the three or four degrees of separation inherent to noisier forms of communication (83). Facsimiles of false information (e.g., misinformation and disinformation) can now spread across vast swaths of society without the risk of decay or fact checking along the way. Adding friction to this process has become one of the more promising approaches to reducing misinformation online (105).

Given that the impacts of communication technology on patterns of behavior cross the lines that divide academic disciplines, a transdisciplinary synthesis and approach to managing our collective behavior are required. Between the complexity of our social systems, the specter of ongoing human suffering, and the urgency required to avert catastrophe, we must face these challenges in the absence of a complete model or full understanding (14, 134). In this way, the field of human collective behavior must join the ranks of other crisis disciplines such as medicine, conservation biology, and climate science (20).

As we can think about a truly diverse and inclusive workforce landscape maybe we need to embrace the elements of noice and friction widely.

3 Likes

Great discussions so far! Building off of some points that have been raised , we’d like to pose an additional question: What other barriers or challenges are you aware of beyond lack of awareness and the exclusion of multiple perspectives and systems of knowledge?

1 Like

@Debbie – I am glad to read that you are interested in the idea of ‘planning an effective, well-structured consultation for GBIF with the humanities and social sciences communities’! You suggest that I post in the Network Model section of the consultation, but I am not sure that this really fits. There are existent networks like learned societies for environmental humanities, for geography, for history of science – but the particular convergence of people who have history of natural history interests and skills and who understand the importance of collections as well is not yet a ‘discipline’ and only has a loose network that is growing fast. I have given some examples of individual research projects here: Extending, enriching and integrating data - #53 by MAFleming

I can give a more structured overview. At institutional level – and of course, an institution is not necessarily a network – here are some developments:

2009-2016: Centre for Arts and Humanities research at the Natural History Museum London
I was one of the people who set up this centre, and we ran a range of humanities research projects related directly to collections and which benefited collection knowledge/catalogues. This included a project about Hans Sloane’s early modern collections, Nathaniel Wallich (in collaboration with Kew).

2012 - ongoing. Humanities of Nature Department, Museum für Naturkunde Berlin
Set up by Johannes Vogel when he left NHM for Berlin, using the model of the NHM Centre. Currently the most active of these institutional departments at nathist museums, working on colonial histories of palaeontology, on historical relationships between Berlin Tiergarten and the MfN, on logistics of natural history collecting in 19th C., on philosophy of data modeling and digitisation of specimens.

2015 - ongoing. Humanities Institute of the New York Botanical Gardens
Supported by the Andrew W Mellon Foundation (same funder for JSTOR Plants), the centre runs research projects and is based in the Gardens’ library. Botany and medicine, coffee plantation practices, histories of expeditions to Cuba, etc.

2018 - ongoing. Dumbarton Oaks Plant Humanities Initiative
Funded by the Andrew W Mellon Foundation, working with scholars to directly to correlate the rare books and special collections of DOAKS with JSTOR Plants, and in collaboration with the team of developers at JSTOR Labs.

2020 - ongoing. Kew / Royal Holloway Humanities of Nature project
This project picks up where the Mobile Museums project left off, with the same leadership, and with the intent of articulating the value of the humanities to botanical collections and vice versa. Kew has made a specific commitment to collaborations with humanities scholars in their recent strategic development document.

So there are researchers, there are projects, there are institutional departments, there are learned societies, and there are emergent networks. It could be so valuable to look at where humanities, socsci, biology and collections can help each other to do better research. Some of the connective tissue is really critical to some of the questions relating to the DES: including equity, IP, credit, inclusion, accurate point reference data, data models, decolonial activities, etc.

As part of ICEDIG (development programme for DiSSCo), there was a WP concerning ‘cultural heritage’ and a survey of ‘Humanities Researcher Synergies with Natural Science Collections and Archives’. Unfortunately, not enough time was taken with this (barely two months, and over Christmas!) and the report itself states that there were only 34 surveys returned and ‘humanities researchers were underrepresented owing to short project timelines compounded by the holidays’. Further, ‘D. Koureas began by suggesting that these preliminary discussions could use better follow-up and continuity. In similar situations the tendency has been to put them on a shelf without further intervention.’ Here is where GBIF could step in with your well-developed non-chron, discourse-driven consultation platform!

1 Like

Here’s a new question to think about:

How would you define the role of a Digital Extended Specimen Data Curator?

Personally, I’d expect that DES data curation will be an activity performed broadly by the biodiversity research community and beyond (e.g., by amateur enthusiasts), though I’d expect that we can recognize a few core skills and competencies related to DES data curation that it’d be good for specimen curators to have. So there might not be anyone with a job title that is “Digital Extended Specimen Data Curator.”

Similarly, I would expect that the set of data “curated” by a specimen curator might stretch well beyond those data about specimens in their own collection. It might be circumscribed by the taxa for which they are experts.

Others’ thoughts?

2 Likes

Hi Austin, this is a great question. I think there are lots of elements to data management that are currently not taught relative to curation of collections and generation of FAIR data. While some can be acquired, I am interested in developing a list of skills from the basic idea of developing tidy data sets and data archiving to the more complex requirements of related to monitoring online data and annotating virtual data. What “new vocabulary” is needed to understand DarwinCore, Unique Identifiers, Digital records. Once we start thinking about Data Integration, what are the basic and more advanced skills needed by a curator or data provider. I also think we have a new job and discipline where biodiversity literate individuals need to have the computational and technical skills to build the datasets and interface to support the DES. Maybe we already have some skill lists and vocabulary that would help and if we do how can we get that in the main stream training. I feel like we are missing the instruction manual to bring someone from zero knowledge on collection based data to curator level and we need a list of skills so we can facilitate training.

2 Likes

Hi Anna, It is exactly this type of conversation we had to create Data Carpentry at our 2014?(I think) meeting across existing NSF BioCenters IT staff. We really need to get the Carpentries folks as collaborators on this topic going forward. The peer-to-peer structure of the Carpentries also supports naturally, our goals to be equitable and inclusive and responsive to address skills needs in a sustainable manner.

Not a question previously identified to seed discussion, just one that is of interest to me: How critical is language translation to inclusivity, and how would that most usefully occur in the Digital Extended Specimen framework?

Hello everyone,

@Debbie has asked me to chime in on this thread to bring the perspective from The Carpentries regarding inclusivity and skills/training.

Providing opportunities for education to improve awareness of the resources and teaching the skills required are needed to make a space/community more accessible. However, this is only the first step, as improving equity and diversity requires providing ongoing support and opportunities for continued learning. One way to achieve this is by supporting the development of communities of practices where participants can share their experiences and support each other on their learning paths. For such initiatives to be successful, they must have a positive and supporting environment (having an enforceable code of conduct is one component needed to create such an environment).

Creating communities of practices also helps to get feedback to ensure that the skills taught in training are adequate once put to the test of the day-to-day tasks and responsibilities.

3 Likes

Thanks @francois! You clearly point out here, that offering workshops to fill skills gaps are not enough. That if we really want to realize sustainable change and local empowerment, we need to “create communities of practice.” In doing so, these entities can meet their local, distinct, and changing needs. Thanks for adding this to our understanding of what’s needed to address capacity development and inclusivity.

In another thread about GBIF dataset evaluation @datafixer gives us some succinct examples of data issues seen. These clearly offer some insights into skills needed (see question 7). This discovery process is happening at the level of the data publisher. Thanks @datafixer for that post. I wonder at what point or points in our research data pipelines are the most critical ones to take on these a) skills / knowledge needs, b) infrastructure (tools and methods) needs, and c) sustainable community development beyond workshops.

On skills:

In the “100 GBIF datasets, improved” post I showed that many compilers of biodiversity data “do not have the necessary knowledge or skills to produce tidy, complete, consistent and Darwin Core-compliant datasets”.

Some of the present discussion about increasing workforce capacity and inclusivity seems to be about the democratisation of the necessary knowledge and skills - getting more people competent. This sounds like a solution to the problem of low data quality, along the lines of “If you give a man a fish, you feed him for a day. If you teach a man to fish, you feed him for a lifetime”.

That’s indeed a solution for an individual biodiversity data worker, but it isn’t a solution for the global problem.

The “100 GBIF datasets, improved” post was about a demonstrably successful solution to the global problem: put “gatekeeper” data specialists between data compilers and data disseminators, to look for data problems and to advise compilers on what, exactly, needs to be fixed.

Of course you can teach vehicle operators how to service the car or truck they drive, but wouldn’t you expect a better-serviced vehicle fleet if the servicing was done by trained vehicle mechanics?

A discussion on skills would benefit from consideration of how to (a) recruit data specialists for “gatekeeping” roles and (b) how to insert data specialists between compilers and disseminators.

A few words about (b): The aggregator operating model (I’m reluctant to call it a “business model”) doesn’t require the aggregator to serve high-quality data. The operating model assumes that end-users will do data cleaning. To assist end users, aggregators flag a small set of data problems and attach flags to individual records. Data providers can also see these flags, but are not required to do anything about them. The threshold for outright data rejection - how awful does data have to be before it doesn’t get aggregated? - is set remarkably low.

Just as aggregators aren’t really troubled when data quality is dreadful, they aren’t really enthusiastic when data quality is excellent. In GBIF’s case, the work described in “100 GBIF datasets, improved” is evidently seen as a private arrangement between data providers and Pensoft, and GBIF has never, to my knowledge, directed a data provider to Pensoft or to any other third-party data-checking service.

To sum up the last two paragraphs, I think we can forget the aggregators. Other participants in this discussion may have a different view , but I think the most that aggregators would be willing to contribute to “gatekeeping” would be advice to data publishers that data-checking services exist.

Now back to (a). Data specialists already exist. They’re turned out every year by information and library science courses, and they also work in the corporate world as “data scientists” (which sounds glamorous until you hear that “80% of a data scientist’s time is spent cleaning data”). The training required to bring a data specialist up to speed to turn messy biodiversity data into tidy, complete, consistent and Darwin Core-compliant records needn’t be either long or taxing.

Discussion point: How to set up that specialist training?

Next discussion point, recruit data specialists for what? Apart from Pensoft, is there any organisation in the world that carefully checks biodiversity datasets from any and all sources? Where and for what reward would data specialists be working?

To half-answer my own question, I think there might already be in-house data specialists at some of the larger museums/herbaria. Is there a way to expand their remit, so that they not only clean the house data, but also data from other institutions, as part of their paid work?

[cough] Call for data papers describing datasets from Russia (closed)

@kcopas Read the call. It doesn’t direct authors to Pensoft for data checking, just for publication. On quality: “Authors should start by publishing a dataset comprised of data and metadata that meets GBIF’s stated data quality requirement. This effort will involve work on an installation of the GBIF Integrated Publishing Toolkit.” The call also doesn’t point out that if a dataset’s problems aren’t fixed, then the data paper won’t be published. If you’re trying to say that in this case GBIF is using Pensoft as a data quality filter, you need more evidence.