Hello, may I know where can we publish large number (>1k) of images that can be linked to Occurrence records which will be published to GBIF/OBIS? This question is being asked a lot in the marine/OBIS community. I personally also encounter dataset where the number of photos of specimens far exceeded Zenodo’s limit Zenodo FAQ - What are the size limitations of Zenodo?
Curious if anyone has the answer to this question please? Thanks a lot!
I know that you can also use Wikimedia and is free, but I do not have an experienci with ir or an example from our publishers, but maybe someone has experience with it and can share information.
@EstebanMH-SiB@ymgan This seems like a good solution, thanks for identifying it! It’s not clear if they mint persistent identifiers for the individual files, or the collections, of the things they archive. Do either of you know if that’s possible?
The limitations for Zenodo in size and file number are only per record. You can make a community and add all images to it as separate records, like this example with ca. 270k images we did as a pilot for the ICEDIG project years ago: Search Belgium Herbarium of Meise Botanic Garden
Cautionary note here that the IA is battling serious litigation. As a result, the Biodiversity Heritage Library whose page scans have been in IA for decades, is exploring alternatives like AWS.
Thanks for the useful insights @dshorthouse and @MatDillen. We will explore Zenodo a little bit so we can recommend it to our publishers and keep an eye on IA hoping they continue working without much trouble, they have been a great resource so far!
Thank you very much for this Mat! The Herbarium specimen records are nice. From what I understood, the examples are 1 image = 1 occurrence. I am wondering if you have any recommendations/examples on how to do this for images that can have multiple occurrences please?
I’m not sure what else you need? That Zenodo record you linked to already does the job of hosting (multiple) multi-specimen images. You can enrich the record with a more standardized data file, like a Darwin Core archive that lists all identified observations from the images and video as occurrences. You can then use the Audiovisual Core extension to link the occurrences to the images they occur in. You could even link the occurrence to a region of interest within the image if you know where each of the organisms can be spotted: Audiovisual Core Term List - Audiovisual Core
What you probably can’t do is embed the data in the record as subjects, making them easily accessible through the Zenodo API, like we did in this example.
I showed the Zenodo record as an example of the type of images that I have. It works for that dataset because there are <100 files, within Zenodo’s size limitations (per record).
I have another new dataset that I have not published yet with >700 of same type of images as the Zenodo record which way exceeded Zenodo’s size limitations (per record) and I am curious how to do it like the example you shared.
Why can’t you put each image in a separate record (i.e. >700 similar records), or bundle them in smaller batches? That would require splitting up the data, or making a separate record for the overarching data and linking that one to each image record (like this one), but other than that I don’t see why it wouldn’t work?
At some point, you might exceed Zenodo’s Fair Usage policy. But that mainly depends on your total data volume (how many of such datasets you have and how much gigabytes their media files amount to). Individual cases like this should be fine.