.. _beginner: Fetching files from a registry ============================== If you need to manage the download of multiple files from one or more locations, then this section is for you! Setup ----- In the following example we'll assume that: 1. You have several data files served from the same base URL (for example, ``"https://www.somewebpage.org/science/data"``). 2. You know the file names and their `hashes `__. We will use :func:`pooch.create` to set up our download manager: .. code:: python import pooch odie = pooch.create( # Use the default cache folder for the operating system path=pooch.os_cache("my-project"), base_url="https://www.somewebpage.org/science/data/", # The registry specifies the files that can be fetched registry={ "temperature.csv": "sha256:19uheidhlkjdwhoiwuhc0uhcwljchw9ochwochw89dcgw9dcgwc", "gravity-disturbance.nc": "sha256:1upodh2ioduhw9celdjhlfvhksgdwikdgcowjhcwoduchowjg8w", }, ) The return value (``odie``) is an instance of :class:`pooch.Pooch`. It contains all of the information needed to fetch the data files in our **registry** and store them in the specified cache folder. .. note:: The Pooch **registry** is a mapping of file names and their associated hashes (and optionally download URLs). .. tip:: If you don't know the hash or are otherwise unable to obtain it, it is possible to bypass the check. This is **not recommended** for general use, only if it can't be avoided. See :ref:`hashes`. .. attention:: You can have data files in **subdirectories** of the remote data store (URL). These files will be saved to the same subdirectories in the local storage folder. However, the names of these files in the registry **must use Unix-style separators** (``'/'``) **even on Windows**. Pooch will handle the appropriate conversions. Downloading files ----------------- To download one our data files and load it with `xarray `__: .. code:: python import xarray as xr file_path = odie.fetch("gravity-disturbance.nc") # Standard use of xarray to load a netCDF file (.nc) data = xr.open_dataset(file_path) The call to :meth:`pooch.Pooch.fetch` will check if the file already exists in the cache folder. If it doesn't: 1. The file is downloaded and saved to the cache folder. 2. The hash of the downloaded file is compared against the one stored in the registry to make sure the file isn't corrupted. 3. The function returns the absolute path to the file on your computer. If it does: 1. Check if it's hash matches the one in the registry. 2. If it does, no download happens and the file path is returned. 3. If it doesn't, the file is downloaded once more to get an updated version on your computer. Why use this method? -------------------- With :class:`pooch.Pooch`, you can centralize the information about the URLs, hashes, and files in a single place. Once the instance is created, it can be used to fetch individual files without repeating the URL and hash everywhere. A good way to use this is to place the call to :func:`pooch.create` in Python module (a ``.py`` file). Then you can ``import`` the module in ``.py`` scripts or Jupyter notebooks and use the instance to fetch your data. This way, you don't need to define the URLs or hashes in multiple scripts/notebooks. Customizing the download ------------------------ The :meth:`pooch.Pooch.fetch` method supports for all of Pooch's :ref:`downloaders ` and :ref:`processors `. You can use HTTP, FTP, and SFTP (even with :ref:`authentication `), :ref:`decompress files `, :ref:`unpack archives `, show :ref:`progress bars `, and more with a bit of configuration.