GBIF backbone - long format to wide format


I am trying to convert the gbif backbone (Taxon.tsv) from its long format to a wide format that is more conducive to analysis. By wide format I meant that the rank is a column, and then each species becomes a row - e.g.:

"kingdom", "phylum", "class", "order", "family", "species"
animalia, arthropoda, insecta, odonata, coenagrionidae, ischnura elegans

So far, I have been using a quite involved script map “children” taxa to their “parents”, including some backward-recursive logic to find a parent in higher levels if a direct merge fails. This procedure is slow and seems clumsy, and I wanted to check if there is a pipeline available to generate what I want from the gbif backbone?

Maybe this is a question for @mgrosjean?

Hi @mluerig,

The Taxon.tsv file has the fields kingdom, phylum, class, order, family, genus which contains names. If you select the lines where the rank is “species”, you should get the higher taxonomy in those fields for each species.
One thing you would still have to do is find the accepted species for the species that don’t have the taxonomic status “accepted” by using the acceptedNameUsageID field.

Does it make sense?

thanks, this is of course the solution - should have checked before throwing it into my pipeline

very cool - is this a recent implementation?

Hey @mluerig -

Long form taxonomies are supported through Nomer (a tool that “maps identifiers and names to other identifiers and names”, GitHub - globalbioticinteractions/nomer: maps identifiers and names to other identifiers and names) out of the box.

To generate gbif long form - type the following in your terminal/command line -

nomer ls --include-header gbif 

to produce stuff like -

providedExternalId providedName relationName resolvedExternalId resolvedName resolvedAuthorship resolvedRank resolvedCommonNames resolvedPath resolvedPathIds resolvedPathNames resolvedPathAuthorships resolvedExternalUrl
GBIF:1423395 Ischnura elegans HAS_ACCEPTED_NAME GBIF:1423395 Ischnura elegans (Vander Linden, 1820) species Animalia | Arthropoda | Insecta | Odonata | Coenagrionidae | Ischnura | Ischnura elegans GBIF:1 | GBIF:54 | GBIF:216 | GBIF:789 | GBIF:8577 | GBIF:1423281 | GBIF:1423395 kingdom | phylum | class | order | family | genus | species | | | | | Charpentier, 1840 | (Vander Linden, 1820) Ischnura elegans (Vander Linden, 1820)

other supported taxonomies include, but are not limited to, ITIS, World of Flora Online, Catalogue of Life, DiscoverLife, Paleobiology Database, Mammal Diversity Database and NCBI Taxonomy.

And . . . Nomer uses a versioned snapshots of taxonomic resources to reduce variability due to upgrades or internet outages [1].

See also GitHub - globalbioticinteractions/name-alignment-template: align names with known taxonomic resources and associated re-usable workshop at Big-Bee-Network: 18 January 2023 . The name alignment template is configured to produce name alignments like:

providedName                Adoretus
parseRelation               SAME_AS
parsedName                  Adoretus
parsedNameSource            gbif-parse
parsedNameSourceAccessedAt  hash://sha256/d2903d0384a8b8193819b8061c8c4e6fec8cc2f7fe72dc0e91c90c07ba2fe15e
alignRelation               HAS_ACCEPTED_NAME
alignedCatalogName          itis
alignedName                 Adoretus
alignedAuthorship           Dejean, 1833
alignedRank                 genus
alignedKingdomName          Animalia
alignedKingdomId            ITIS:202423
alignedPhylumName           Arthropoda
alignedPhylumId             ITIS:82696
alignedClassName            Insecta
alignedClassId              ITIS:99208
alignedOrderName            Coleoptera
alignedOrderId              ITIS:109216
alignedOrderAuthorship      Linnaeus, 1758
alignedFamilyName           Scarabaeidae
alignedFamilyId             ITIS:114493
alignedFamilyAuthorship     Latreille, 1802
alignedSubfamilyName        Rutelinae
alignedSubfamilyId          ITIS:678509
alignedSubfamilyAuthorship  MacLeay, 1819
alignedTribeName            Adoretini
alignedTribeId              ITIS:926256
alignedTribeAuthorship      Burmeister, 1844
alignedGenusName            Adoretus
alignedGenusId              ITIS:187484
alignedGenusAuthorship      Dejean, 1833
alignedPath                 Animalia | Bilateria | Protostomia | Ecdysozoa | Arthropoda | Hexapoda | Insecta | Pterygota | Neoptera | Holometabola | Coleoptera | Polyphaga | Scarabeiformia | Scarabaeoidea | Scarabaeidae | Rutelinae | Adoretini | Adoretus
alignedPathIds              ITIS:202423 | ITIS:914154 | ITIS:914155 | ITIS:914158 | ITIS:82696 | ITIS:563886 | ITIS:99208 | ITIS:100500 | ITIS:563890 | ITIS:914213 | ITIS:109216 | ITIS:112747 | ITIS:678302 | ITIS:114486 | ITIS:114493 | ITIS:678509 | ITIS:926256 | ITIS:187484
alignedPathNames            kingdom | subkingdom | infrakingdom | superphylum | phylum | subphylum | class | subclass | infraclass | superorder | order | suborder | infraorder | superfamily | family | subfamily | tribe | genus
alignedPathAuthorships      |  |  |  |  |  |  |  |  |  | Linnaeus, 1758 | Emery, 1886 | Crowson, 1960 | Latreille, 1802 | Latreille, 1802 | MacLeay, 1819 | Burmeister, 1844 | Dejean, 1833
alignedNameSource           itis
alignedNameSourceAccessedAt hash://sha256/d2903d0384a8b8193819b8061c8c4e6fec8cc2f7fe72dc0e91c90c07ba2fe15e

Big thanks to GBIF, ITIS, World of Flora Online, Catalogue of Life, DiscoverLife, Paleobiology Database, Mammal Diversity Database and NCBI Taxonomy and many other projects for making their taxonomic resource available for re-use online in bulk. With these comprehensive taxonomic structured resources, (fast!) tools like GitHub - globalbioticinteractions/nomer: maps identifiers and names to other identifiers and names, GitHub - ropensci/taxadb: 📦 Taxonomic Database [2], Global Names Resolver [3] can be developed independent of the specific tools and services provided by the taxonomic projects themselves.

Hope this helps provide some perspective,


[1] Poelen, J. H. (ed . ) . (2024). Nomer Corpus of Taxonomic Resources hash://sha256/83617875e84bb8ae7ac2a257ad50eb8e82d8935d975f465b8ee8f3a803f72b48 hash://md5/c639d7e3fcd5603f6c48e9d5e6c49672 (0.24) [Data set]. Zenodo. Nomer Corpus of Taxonomic Resources hash://sha256/83617875e84bb8ae7ac2a257ad50eb8e82d8935d975f465b8ee8f3a803f72b48 hash://md5/c639d7e3fcd5603f6c48e9d5e6c49672

[2] Norman KEA, Chamberlain S, Boettiger C. taxadb: A high-performance local taxonomic database interface. Methods Ecol Evol. 2020; 11: 1153–1159.

[3] Dmitry Mozzherin. (2023). gnames/gnverifier: v1.1.5 (v1.1.5). Zenodo. gnames/gnverifier: v1.1.5 . See also .


cool, this is really helpful for all sorts of quantitative research on these datasets. thanks for taking the time to elaborate, much appreciated!

