2.3. Hierarchical collection structures and subcollections (INFORMATION)

I find it helpful to separate thinking about the real-world entities from thinking about the information structures used to represent them in an information system (for a given set of use-cases - like the planned catalogue). The former informs the latter.

In addition, I understand a hierarchical relationship to hold when entities are related to one another by a relation that puts them in a nested sequence of increased/decreased generality:

a cat is_a mammal — a mammal is_a vertebrate.
collection-A is_part_of collection-B — collection B is_part_of collection-C

In this sense I think that institutions, collections and datasets stand in various relations to each other, but they don’t form a hierarchy. I tried to sketch the basic relations between them in this post in topic 2.1. (I am actually unsure if the term “dataset” is used here in its general sense or in some special sense with regard to collections that I have not understood.)

I agree that one can and should exploit commonalities of the properties of the real-world entities for designing the information structures used to represent them. In this vein, it is certainly an option to arrange and aggregate data fields so that on the representation level (in the data structures of the information system) there is a hierarchy of representations. However, in my experience a design that mirrors the distinctions of the real-world entities we want to capture is often easier to maintain and easier to extend in the future if new functionalities (e.g., representation and query of more complex relations between and among institutions and collections or other entities which we don’t consider now) are desired. I therefore would argue that also on the representation level a collection should indeed always be treated separately from its holding institution - even if, in the real world, all the holdings of an institution are subsumed in a single collection.

I would agree that this was a desirable feature and this approach keeps the clear distinction between the institution and the (possibly nested) set of collections it administers.

As you, @dhobern, note, the relations between two collections may be more complex: when they aren’t nested but there is a partial overlap with regard to the specimens each collection subsumes. Given adequate data from the relevant sources this could be represented in the catalogue as a separate symmetric relation between collection (A overlaps_with B). Parthood (non-symmetric) discussed in this topic could then be treated as a special case of overlap (A part_of B ==> A overlaps_with B). (Also the relations between historical and contemporary collections and merge / split situations discussed in the context of identifiers could be modeled as sub-relations of such an overlap-relation.)

Regarding the second question of this topic, I think that if collection B is identified as a sub-collection of collection A (in the sense that B is part of A) then it is reasonable to conclude that what applies to the whole collection also applies to the sub-collection, so there is inference from parent to child but not vice versa. However, I think this very much depends on the kinds of attributes available for a collection entry. There might be edge cases where based on the parthood no other information can be legitimately transferred from the parent to the child. So may be, until there is a better understanding what information can be safely copied to sub-collections on the side of the catalogue indicating parthood on the basis of the provided information is what the catalogue should aim for.