Here is a real life example M.Andrew Johnston Research Collection .
A strong YES to include especially biobanks (meaning tissue, DNA and environmental samples), otherwise this catalogue is useless for molecular research at all, which today is one of the most important approaches in life sciences. GGBN already started this task many years ago and is urgently waiting for a central solution. These collections are usually overlooked, only the sequence and maybe the voucher specimen matters, which is a huge problem with respect to good scientific practice in general and the implementation of the Nagoya Protocol in particular. Traceability is crucial these days.
The living collections should be included, but maybe a synchronised approach like planned for IH is the way to go if they already have their own registries (I think at least the Zoos have something, but don’t know for sure). Since GGBN already has members from all mentioned kind of living collections plus veterinary, crop and life stock collections we are happy to help working on this topic if needed.
- Collections must have an institutional basis. Scientific publications should not include material from personal collections that are inaccessible to the rest of the community.
- Data sets can involve distinct collections.
- Collected material can end up in several collections from distinct institutions.
We should include geological and paleontological collections, anthropological collections, ethnobotanical collections, xylotheques, tissue banks and DNA repositories.
Responding to these questions in the prompt: “What is the definition for our purposes (minimal and sufficient criteria) of a natural history collection? How do collections relate to and differ from institutions? How do collections relate to and differ from datasets? How do collections relate to and differ from collecting events (e.g. expeditions)?” I’m not convinced that the issue about what should be included in a general catalogue of collections is best settled by a definition of what a collection is. More likely, the question should be which collections are appropriate to include? Perhaps requiring validation of clear answers about legal ownership and public access rights may be more definitive. For example, would the catalogue include a personal collection whose owner illegally removed specimens from another country? Or what about a collection with no public access? Legal issues may provide a common ground here for determining minimal requirements that sidesteps the vague conceptual issue of what is a “collection.”
Let me attempt to answer with an example (apologies if I digress with content better related to other topics):
INBio (Costa Rica) had an Institutional Collection but the Research Department (Inventario) would probably clarify that it was actually several collections (Plantae, Insecta, Fungi, Mollusca, Nematoda, Arachnida, Myriapoda, Onychophora). Now Collection Managers (Curators) would argue there were different Collections (“Dry” Mounted Duplicates, Seeds, “Xyla”, Wet, Vouchers Collections and so on) and each had its own protocols and management methods.
For example, a specialist could visit and take specimens of her group from the wet collection (of Malaise traps’ soups) of insects, mount them and identified them for the dry collection. Some times, duplicates from a single collecting event would be sent to other institutions. Other times, live specimens will be grown in-house and by-products of the process would be catalogued as vouchers of Natural History associated to the specimen, the plant it fed from would go to one collection, the final dead specimen would go to the dry-mounted collection, next to all of its by-products.
Counting number of specimens was dependant on the definition of specimen itself that each Collection Manager decided (the mollusks in 1 rock, the shells in a vial, the sheets from the same plant, the nematodes mounted in one plate). Sometimes they will count each of them separately, sometimes they would count all of them together as one. It didn’t matter what each collection considered an specimen, numbers were provided for the annual account and the reports to donors. In some cases, while digitization was still going on (not finished yet), an estimation was made using the same sampling method every year and the totals kept growing until, finally, the year the backlog digitization caught up, the total (not-estimated) number of specimens diminished considerably for some of the collections. Much later, INBio gave their collections away to another institution that finally incorporated them with their own collections.
- There is a hierarchy in Collections that has to be reflected in the data handling.
- Collections will definitely overlap (thinking Dimensions in TDWG CD Standard here).
- A Collection could be housed in different Buildings/Departments but we are tending to an Institution-based model. So CollectionA[OneHalf@InstitutionX] is a different Collection than CollectionA[OtherHalf@InstitutionY] and a Collector’s life-long collecting events (one possible dimension?) will also be divided by (Hosting) Institution, too.
- Depending on the Managers’ definition of their Collections, a collecting event could produce specimens in different Collections.
- Total numbers estimated and reported by Collection Managers should maintain the calculation methods consistent throughout reports. Therefore, estimation method might be something to consider stating in the metadata (and maybe considering defining a categorization/vocabulary for it?)
Thanks @WUlate - this example is great. It certainly seems to reinforce the importance that others have already highlighted of letting institutions determine what breakdown into “collections” makes most sense to them, their staff and their users.
Your point about the same institutional holdings being treated both as a set of taxonomically organised collections and as a set of collections categorised by preservation methods is interesting. I could imagine an extreme case where a collection chooses to reorganise completely from the first model to the second (more likely than the other way around) and wishes/needs to present all of its holdings to the wider world as a new set of collections with complex historical relationships to the older ones (many partial transfers of old collections to new ones).
Ideally, in such a case, metadata would be clear enough that we would be able to determine both the original and the current collection for any specimen that preceded the reorganisation. In practice, I expect that we would need simply to handle the old collection codes as ambiguous synonyms for multiple new collection codes and rely on human effort or business logic to map older references (e.g. in an older taxonomic treatment) to the correct present-day one.
I agree that the example by @WUlate nicely illustrates the complexities that may arise for the relations between collections in virtue of the histories of their constituent specimens.
Handling old collection codes as “ambiguous synonyms for multiple new collection codes” might fall short of achieving a meaningful representation of the actual relations among the collections concerned.
I would recommend drafting a carefully chosen and defined set of relations between collections that corresponds to user expectations in the context of the catalogue and that represents collections’ relations along the lines of splits, merges, and in general transformations in terms of their constituent specimens (and I’d gladly try to contribute to such an approach).
This could be done so that both relations among collections existing in a temporal sequence (e.g., a historical collection being absorbed wholly or partially in a contemporary collection) and among collections existing concurrently (e.g., some specimens transferred from one to another in workflows such as the one described by @WUlate).
Last but not least such an approach could also support cases where collections, even concurrently, are conceived in terms of different partitions of the attribute space (“dimensions”) of an institution’s (or several institutions’) holdings, e.g. one partition based on taxonomy vs. one based on preparation type.
Thanks @cboelling - I fully agree that we need the kind of approaches you describe and as far as possible to represent the actual relations among the collections. My “ambiguous synonymy” was because we know in advance that there will be cases where this is not possible and we need to be ready to handle the lowest common denominator as well as what we hope to be best practice.
I think a first priority for the catalog should be, given some of the potential usages mentioned in earlier discussions, to get a more complete and up to date overview of the institute collection holdings (the sum of the collections in an institute). Once that is established, AI tooling could be used to discover collection relationships where I think the most inportant use case is to relate mentions of collections in literature to current collections.