I agree with @sabaj that a way to go is to collect what users of collection codes do and then create some sort of lookup table. Here is a glimpse into the collectionCodes we extract from publications. Be aware, it is dirty data, not least because of the botanists that use single letter as collection codes which makes it very difficult to mine. But this might be another starting point.
For each of the collection code we have the treatment, and the publication from where we extracted the data
Here is a CSV http://tb.plazi.org/GgServer/srsStats/stats?outputFields=colls.code+colls.name&groupingFields=colls.code+colls.name&format=CSV&separator=%2C output, and you can get more through using the Plazi stats at http://tb.plazi.org/GgServer/srsStats