pooch.DOIDownloader
pooch.DOIDownloader¶
- class pooch.DOIDownloader(progressbar=False, chunk_size=1024, **kwargs)[source]¶
- Download manager for fetching files from Digital Object Identifiers (DOIs). - Open-access data repositories often issue Digital Object Identifiers (DOIs) for data which provide a stable link and citation point. The trick is finding out the download URL for a file given the DOI. - When called, this downloader uses the repository’s public API to find out the download URL from the DOI and file name. It then uses - pooch.HTTPDownloaderto download the URL into the specified local file. Allowing “URL”s to be specified with the DOI instead of the actual HTTP download link. Uses the- requestslibrary to manage downloads and interact with the APIs.- The format of the “URL” is: - doi:{DOI}/{file name}.- Notice that there are no - //like in HTTP/FTP and you must specify a file name after the DOI (separated by a- /).- Use with - pooch.Pooch.fetchor- pooch.retrieveto be able to download files given the DOI instead of an HTTP link.- Supported repositories: - Attention - DOIs from other repositories will not work since we need to access their particular APIs to find the download links. We welcome suggestions and contributions adding new repositories. - Parameters
- progressbar (bool or an arbitrary progress bar object) – If True, will print a progress bar of the download to standard error (stderr). Requires tqdm to be installed. Alternatively, an arbitrary progress bar object can be passed. See Using custom progress bars for details. 
- chunk_size (int) – Files are streamed chunk_size bytes at a time instead of loading everything into memory at one. Usually doesn’t need to be changed. 
- **kwargs – All keyword arguments given when creating an instance of this class will be passed to - requests.get.
 
 - Examples - Download one of the data files from the figshare archive of Pooch test data: - >>> import os >>> downloader = DOIDownloader() >>> url = "doi:10.6084/m9.figshare.14763051.v1/tiny-data.txt" >>> # Not using with Pooch.fetch so no need to pass an instance of Pooch >>> downloader(url=url, output_file="tiny-data.txt", pooch=None) >>> os.path.exists("tiny-data.txt") True >>> with open("tiny-data.txt") as f: ... print(f.read().strip()) # A tiny data file for test purposes only 1 2 3 4 5 6 >>> os.remove("tiny-data.txt") - Same thing but for our Zenodo archive: - >>> url = "doi:10.5281/zenodo.4924875/tiny-data.txt" >>> downloader(url=url, output_file="tiny-data.txt", pooch=None) >>> os.path.exists("tiny-data.txt") True >>> with open("tiny-data.txt") as f: ... print(f.read().strip()) # A tiny data file for test purposes only 1 2 3 4 5 6 >>> os.remove("tiny-data.txt") - Methods Summary - DOIDownloader.__call__(url, output_file, pooch)- Download the given DOI URL over HTTP to the given output file. 
- DOIDownloader.__call__(url, output_file, pooch)[source]¶
- Download the given DOI URL over HTTP to the given output file. - Uses the repository’s API to determine the actual HTTP download URL from the given DOI. - Uses - requests.get.
