pooch.create

pooch.create(path, base_url, version, version_dev, env=None, registry=None)[source]

Create a new Pooch with sensible defaults to fetch data files.

The Pooch will be versioned, meaning that the local storage folder and the base URL depend on the projection version. This is necessary if your users have multiple versions of your library installed (using virtual environments) and you updated the data files between versions. Otherwise, every time a user switches environments would trigger a re-download of the data.

The version string will be appended to the local storage path (for example, ~/.mypooch/cache/v0.1) and inserted into the base URL (for example, https://github.com/fatiando/pooch/raw/v0.1/data). If the version string contains +XX.XXXXX, it will be interpreted as a development version.

Parameters:
path : str, PathLike, list or tuple

The path to the local data storage folder. If this is a list or tuple, we’ll join the parts with the appropriate separator. The version will be appended to the end of this path. Use pooch.os_cache for a sensible default.

base_url : str

Base URL for the remote data source. All requests will be made relative to this URL. The string should have a {version} formatting mark in it. We will call .format(version=version) on this string. If the URL is a directory path, it must end in a '/' because we will not include it.

version : str

The version string for your project. Should be PEP440 compatible.

version_dev : str

The name used for the development version of a project. If your data is hosted on Github (and base_url is a Github raw link), then "master" is a good choice.

env : str

An environment variable that can be used to overwrite path. This allows users to control where they want the data to be stored. We’ll append version to the end of this value as well.

registry : dict

A record of the files that are managed by this Pooch. Keys should be the file names and the values should be their SHA256 hashes. Only files in the registry can be fetched from the local storage. Files in subdirectories of path must use Unix-style separators ('/') even on Windows.

Returns:
pooch : Pooch

The Pooch initialized with the given arguments.

Examples

Create a Pooch for a release (v0.1):

>>> pup = create(path="myproject",
...              base_url="http://some.link.com/{version}/",
...              version="v0.1",
...              version_dev="master",
...              registry={"data.txt": "9081wo2eb2gc0u..."})
>>> print(pup.path.parts)  # The path is a pathlib.Path
('myproject', 'v0.1')
>>> print(pup.base_url)
http://some.link.com/v0.1/
>>> print(pup.registry)
{'data.txt': '9081wo2eb2gc0u...'}

If this is a development version (12 commits ahead of v0.1):

>>> pup = create(path="myproject",
...              base_url="http://some.link.com/{version}/",
...              version="v0.1+12.do9iwd",
...              version_dev="master")
>>> print(pup.path.parts)
('myproject', 'master')
>>> print(pup.base_url)
http://some.link.com/master/

To place the storage folder at a subdirectory, pass in a list and we’ll join the path for you using the appropriate separator for your operating system:

>>> pup = create(path=["myproject", "cache", "data"],
...              base_url="http://some.link.com/{version}/",
...              version="v0.1",
...              version_dev="master")
>>> print(pup.path.parts)
('myproject', 'cache', 'data', 'v0.1')

The user can overwrite the storage path by setting an environment variable:

>>> # The variable is not set so we'll use *path*
>>> pup = create(path=["myproject", "not_from_env"],
...              base_url="http://some.link.com/{version}/",
...              version="v0.1",
...              version_dev="master",
...              env="MYPROJECT_DATA_DIR")
>>> print(pup.path.parts)
('myproject', 'not_from_env', 'v0.1')
>>> # Set the environment variable and try again
>>> import os
>>> os.environ["MYPROJECT_DATA_DIR"] = os.path.join("myproject", "from_env")
>>> pup = create(path=["myproject", "not_from_env"],
...              base_url="http://some.link.com/{version}/",
...              version="v0.1",
...              version_dev="master",
...              env="MYPROJECT_DATA_DIR")
>>> print(pup.path.parts)
('myproject', 'from_env', 'v0.1')