I’ve searched around and couldn’t find any existing topics on here, but apologies if I’ve missed something.
I’m trying to carry out an analysis that uses an occurrence family for a whole plant family, so has > 2 million records (like the Myrtaceae family).
When I try to load the file into R or Python, I always end up with fewer rows in the table than cited on the download page.
I’ve tried a few different packages, but get the same problem each time. For instance, using the R package vroom:
library(vroom)
col_names <- c("gbifID", "genus", "species", "taxonRank",
"scientificName", "countryCode",
"decimalLatitude", "decimalLongitude",
"day", "month", "year", "taxonKey",
"speciesKey", "basisOfRecord", "issue")
d <- vroom(MYRTACEAE_DL_PATH, delim="\t", col_select=col_names)
Gives a table with 1,785,588 rows, when the download says there should be 2,638,956 occurrences.
Is there something I’m missing here? Am I using the wrong delimiter? Or quotation character?
Any insight you can give would be greatly appreciated.