@sharif.islam, recording individual annotations (and corrections) in a structured way is of course a good idea, but while you are thinking about it, please consider
- suggested corrections that apply to multiple records (> 1000s)
- suggested corrections that involve multiple fields at the same time
- suggested corrections that involve multiple records at the same time (“These records contradict each other”)
- suggested corrections that identify strict duplicate records
- suggestions without explicit corrections (“This can’t be right, please check”; “There is a missing-but-expected value here”; “One or more of these values is incorrect, please check”; see here)
Atomic annotations for individual values in individual fields are suited to TDWG-type “assertions” but not for the real-world checking that I and other data auditors do, and presumably will also do when DiSSCo and participants begin their own checking.
An alternative type of record-keeping is to have an original dataset version, a corrected or queried dataset version and a diff that applies to the whole dataset. Each diff could be given a PID, and the new versions would be generated by both the data publisher (DiSSCo participant or DiSSCo staff) and data checkers.
Yes, this is information-dense and requires substantial storage, but it is less complicated than tracking individual annotations/corrections.