I read that “GBIF doesn’t deduplicate the occurrences between the datasets” in this topic.
But I would be glad if community will clarify the answer for my special case.
We have a large herbarium (LE - ~6 million specimens), but haven’t yet published specimens with metadata to GBIF as one big dataset. My colleagues already have published several small datasets, containing LE specimens.
My question is: when I’ll publish a big dataset representing all specimens with metadata and images in LE it’ll have also the same specimens that are already present in these small datasets, published earlier (all DWC fields, including occurenceID will be the same) .
Is this a problem?
If it is - a “big” dataset is main priority, because in the first place the samples belong to the LE collection and only in the second place - to individual projects.
Offtopic: Is it possible to include links to small thumbnail images of specimens in GBIF dataset, but organize metadata in such a way that clicking on thumbnail image on GBIF occurrence page will open external link to our collection site (we have a good own big picture viewer, allowing to view images in several resolutions)? Now we are publishing data in such a way that clicking on thumbnail on GBIF page opens it fullscreen (produces a very blurry image of cource) and only if you click on link below thumbnail it opens our site with viewer (it’s counter intuitive). I don’t want to include links to big images in GBIF (though this can solve the problem of viewing blurry image in GBIF site), because it’ll greatly increase load on our server - I see that GBIF don’t create own thumbnails for large images and then you open GBIF gallery it loads original files - if I give links to big images to such a gallery our server internet connection will be overloaded (our “medium size” images are ~10 Mb each and if several people open GBIF gallery with such our images simultaneously our site will suffer and may be down).