Decompressing

If you have a compressed file that is not an archive (zip or tar), you can use pooch.Decompress to decompress it after download.

For example, large binary files can be compressed with gzip to reduce download times but will need to be decompressed before loading, which can be slow. You can trade storage space for speed by keeping a decompressed copy of the file:

from pooch import Decompress

def fetch_compressed_file():
    """
    Load a large binary file that has been gzip compressed.
    """
    # Pass in the processor to decompress the file on download
    fname = GOODBOY.fetch("large-binary-file.npy.gz", processor=Decompress())
    # The file returned is the decompressed version which can be loaded by
    # numpy
    data = numpy.load(fname)
    return data

pooch.Decompress returns "large-binary-file.npy.gz.decomp" as the decompressed file name by default. You can change this behaviour by passing a file name instead:

import os
from pooch import Decompress

def fetch_compressed_file():
    """
    Load a large binary file that has been gzip compressed.
    """
    # Pass in the processor to decompress the file on download
    fname = GOODBOY.fetch("large-binary-file.npy.gz",
        processor=Decompress(name="a-different-file-name.npy"),
    )
    # The file returned is now named "a-different-file-name.npy"
    data = numpy.load(fname)
    return data

Warning

Passing in name can cause existing data to be lost! For example, if a file already exists with the specified name it will be overwritten with the new decompressed file content. Use this option with caution.