Thanks @Lyubomir. I am very interested in whether OpenRefine could give us the framework we need to handle situations such as mapping (InstitutionCode+CollectionCode+CatalogNumber+other contextual information) to locate CollectionID and the current and accurate version of the collection record and then determine whether the Specimen is also digitised. In the ELODINS proposal we all submitted a few years ago to seek funds to increase interlinkages between European biodiversity data sources, I called this “linking the linkable” and I still see it that way. In very many cases, a combination like (InstitutionCode+CollectionCode+CatalogNumber) is fully adequate for a trained human reader to know what collection and what specimen is referenced. We can do this even when there are some typos or glitches in the code strings, becauase we have a strong probabilistic understanding in context.
There was some initial discussion of OpenRefine options in the Topic 3.5 thread.