Downloading data from Zenodo using zenodo_get

From Zenodo's Terms of Use page:

Zenodo is an open dissemination research data repository for the preservation and making available of research, educational and informational content. Access to Zenodo’s content is open to all, for non-military purposes only.

As stated on the main page, reasons for using Zenodo include:

  • Safe — your research is stored safely for the future in CERN’s Data Centre for as long as CERN exists.
  • Trusted — built and operated by CERN and OpenAIRE to ensure that everyone can join in Open Science.
  • Citeable — every upload is assigned a Digital Object Identifier (DOI), to make them citable and trackable.
  • No waiting time — Uploads are made available online as soon as you hit publish, and your DOI is registered within seconds.
  • Open or closed — Share e.g. anonymized clinical trial data with only medical professionals via our restricted access mode.
  • Versioning — Easily update your dataset with our versioning feature.
  • GitHub integration — Easily preserve your GitHub repository in Zenodo.
  • Usage statistics — All uploads display standards compliant usage statistics

Presumably, given these features, some researchers are sharing their data using Zenodo. I have personally used Zenodo before because I wanted to create a citeable DOI for a small R package I made.

While you can simply visit a Zenodo record and download the shared data by clicking on "Download all", a better way is to use a package that can be scripted up because it is much more convenient.

We will use the zenodo_get package, which has been packaged in Conda. If you weren't already aware, Conda provides package, dependency, and environment management for any language. I use it a lot of install packages in isolated environments.

For using Conda, I recommend Miniforge3, so install that first by visiting the GitHub page and finding the appropriate installation file for your OS and computer architecture.

After you have installed Miniforge3, you can install zenodo_get in an isolated environment.

mamba create --name zenodo -c conda-forge zenodo_get
mamba activate zenodo
zenodo_get --version
zenodo_get 1.3.4

We can now easily download Zenodo record 8164711 and also verify that the files were downloaded correctly.

zenodo_get -r 8164711
Title: PBMC datasets for harmony
Keywords: 
Publication date: 2023-07-19
DOI: 10.5281/zenodo.8164711
Total size: 132.5 MB

Link: https://zenodo.org/api/records/8164711/files/pbmc_stim.RData/content   size: 30.5 MB
100% [........................................................................] 31973441 / 31973441
Checksum is correct. (dab7de4203a0f5bec3ba67d5c1e614c4)

Link: https://zenodo.org/api/records/8164711/files/pbmc.RData/content   size: 102.0 MB
100% [......................................................................] 106968050 / 106968050
Checksum is correct. (23b2f9e77a602de4d4bd9c69c1024db0)
All files have been downloaded.

Once we have finished we can deactivate our environment.

mamba deactivate

The next time you need to download a Zenodo record, make a note of the record number, activate the Conda environment, and download using zenodo_get!

mamba activate zenodo
zenodo_get -r RECORD_NUM
mamba deactivate
Print Friendly, PDF & Email



Creative Commons License
This work is licensed under a Creative Commons
Attribution 4.0 International License
.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.