9. Workforce capacity development and inclusivity

Hello everyone,

@Debbie has asked me to chime in on this thread to bring the perspective from The Carpentries regarding inclusivity and skills/training.

Providing opportunities for education to improve awareness of the resources and teaching the skills required are needed to make a space/community more accessible. However, this is only the first step, as improving equity and diversity requires providing ongoing support and opportunities for continued learning. One way to achieve this is by supporting the development of communities of practices where participants can share their experiences and support each other on their learning paths. For such initiatives to be successful, they must have a positive and supporting environment (having an enforceable code of conduct is one component needed to create such an environment).

Creating communities of practices also helps to get feedback to ensure that the skills taught in training are adequate once put to the test of the day-to-day tasks and responsibilities.

3 Likes

Thanks @francois! You clearly point out here, that offering workshops to fill skills gaps are not enough. That if we really want to realize sustainable change and local empowerment, we need to “create communities of practice.” In doing so, these entities can meet their local, distinct, and changing needs. Thanks for adding this to our understanding of what’s needed to address capacity development and inclusivity.

In another thread about GBIF dataset evaluation @datafixer gives us some succinct examples of data issues seen. These clearly offer some insights into skills needed (see question 7). This discovery process is happening at the level of the data publisher. Thanks @datafixer for that post. I wonder at what point or points in our research data pipelines are the most critical ones to take on these a) skills / knowledge needs, b) infrastructure (tools and methods) needs, and c) sustainable community development beyond workshops.

On skills:

In the “100 GBIF datasets, improved” post I showed that many compilers of biodiversity data “do not have the necessary knowledge or skills to produce tidy, complete, consistent and Darwin Core-compliant datasets”.

Some of the present discussion about increasing workforce capacity and inclusivity seems to be about the democratisation of the necessary knowledge and skills - getting more people competent. This sounds like a solution to the problem of low data quality, along the lines of “If you give a man a fish, you feed him for a day. If you teach a man to fish, you feed him for a lifetime”.

That’s indeed a solution for an individual biodiversity data worker, but it isn’t a solution for the global problem.

The “100 GBIF datasets, improved” post was about a demonstrably successful solution to the global problem: put “gatekeeper” data specialists between data compilers and data disseminators, to look for data problems and to advise compilers on what, exactly, needs to be fixed.

Of course you can teach vehicle operators how to service the car or truck they drive, but wouldn’t you expect a better-serviced vehicle fleet if the servicing was done by trained vehicle mechanics?

A discussion on skills would benefit from consideration of how to (a) recruit data specialists for “gatekeeping” roles and (b) how to insert data specialists between compilers and disseminators.

A few words about (b): The aggregator operating model (I’m reluctant to call it a “business model”) doesn’t require the aggregator to serve high-quality data. The operating model assumes that end-users will do data cleaning. To assist end users, aggregators flag a small set of data problems and attach flags to individual records. Data providers can also see these flags, but are not required to do anything about them. The threshold for outright data rejection - how awful does data have to be before it doesn’t get aggregated? - is set remarkably low.

Just as aggregators aren’t really troubled when data quality is dreadful, they aren’t really enthusiastic when data quality is excellent. In GBIF’s case, the work described in “100 GBIF datasets, improved” is evidently seen as a private arrangement between data providers and Pensoft, and GBIF has never, to my knowledge, directed a data provider to Pensoft or to any other third-party data-checking service.

To sum up the last two paragraphs, I think we can forget the aggregators. Other participants in this discussion may have a different view , but I think the most that aggregators would be willing to contribute to “gatekeeping” would be advice to data publishers that data-checking services exist.

Now back to (a). Data specialists already exist. They’re turned out every year by information and library science courses, and they also work in the corporate world as “data scientists” (which sounds glamorous until you hear that “80% of a data scientist’s time is spent cleaning data”). The training required to bring a data specialist up to speed to turn messy biodiversity data into tidy, complete, consistent and Darwin Core-compliant records needn’t be either long or taxing.

Discussion point: How to set up that specialist training?

Next discussion point, recruit data specialists for what? Apart from Pensoft, is there any organisation in the world that carefully checks biodiversity datasets from any and all sources? Where and for what reward would data specialists be working?

To half-answer my own question, I think there might already be in-house data specialists at some of the larger museums/herbaria. Is there a way to expand their remit, so that they not only clean the house data, but also data from other institutions, as part of their paid work?

[cough] Call for data papers describing datasets from Russia (closed)

@kcopas Read the call. It doesn’t direct authors to Pensoft for data checking, just for publication. On quality: “Authors should start by publishing a dataset comprised of data and metadata that meets GBIF’s stated data quality requirement. This effort will involve work on an installation of the GBIF Integrated Publishing Toolkit.” The call also doesn’t point out that if a dataset’s problems aren’t fixed, then the data paper won’t be published. If you’re trying to say that in this case GBIF is using Pensoft as a data quality filter, you need more evidence.