Downloading data

Ensaio provides functions for downloading datasets from the fatiando/data collection to your computer. The functions are available through different modules for each major release of the data collection. For example, datasets from the version 1.X series are available through ensaio.v1.

The recommended way to use Ensaio is to import a particular version module like so:

# Load Pandas as well so we can read in some data
import pandas as pd

import ensaio.v1 as ensaio

To download a particular dataset, say our Southern Africa gravity data, call the corresponding fetch_ functions:

Out:

/home/runner/work/_temp/cache/ensaio/v1/southern-africa-gravity.csv.xz

Tip

You can browse a list of all available datasets in List of functions and classes (API) or Available datasets (v1).

If the data are not yet available on your computer, Ensaio will automatically download it and return the path to the downloaded file. In the file had already been downloaded, Ensaio won’t repeat the download and will only return the path to the existing file.

This means that placing the code above in a Python script or Jupyter notebook will mean that whoever runs it is guaranteed to get the data on their computer. Running the code multiple times or using the same data in multiple places will only trigger a single download, saving bandwidth and storage space.

Note

Ensaio uses Pooch under the hood to make all of this work.

Once we have the path to the data file, we can load it like we would any other data file. In this case, our data is in a CSV file so the natural choice is to use Pandas:

longitude latitude height_sea_level_m gravity_mgal
0 18.34444 -34.12971 32.2 979656.12
1 18.36028 -34.08833 592.5 979508.21
2 18.37418 -34.19583 18.4 979666.46
3 18.40388 -34.23972 25.0 979671.03
4 18.41112 -34.16444 228.7 979616.11
... ... ... ... ...
14354 21.22500 -17.95833 1053.1 978182.09
14355 21.27500 -17.98333 1033.3 978183.09
14356 21.70833 -17.99166 1041.8 978182.69
14357 21.85000 -17.95833 1033.3 978193.18
14358 21.98333 -17.94166 1022.6 978211.38

14359 rows × 4 columns



Using Ensaio in your project documentation?

Make sure you take a look at Using Ensaio in your project for useful tips and tricks.

Where are the data?

The location of the cache folder varies by operating system. Use the ensaio.v1.locate function to get its location on your computer.

print(ensaio.locate())

Out:

/home/runner/work/_temp/cache/ensaio/v1

You can also set the location manually by creating a ENSAIO_V1_DATA_DIR environment variable with the desired path. Ensaio will search for this variable and if found will use its value instead of the default cache folder.

Similar variables and functions are available for each data collection version.

Total running time of the script: ( 0 minutes 0.025 seconds)

Gallery generated by Sphinx-Gallery