Downloading data#

Ensaio provides functions for downloading datasets from the Fatiando a Terra Datasets collection to your computer. These functions don’t attempt to do any loading of the data into memory and only return the path of the downloaded file on your computer.

To take care of the actual loading of the data, we’ll import Pandas as well since the data we’ll use is in CSV format.

import pandas as pd

import ensaio

To download a particular dataset, say version 1 of our Southern Africa gravity data, call the corresponding fetch_* functions:

fname = ensaio.fetch_southern_africa_gravity(version=1)
print(fname)

/home/runner/work/_temp/cache/ensaio/v1/southern-africa-gravity.csv.xz

Tip

The version of the data should always be explicitly included so that you code continues to work in the same way even if a newer version of the data is released.

If the data are not yet available on your computer, Ensaio will automatically download it and return the path to the downloaded file. In the file had already been downloaded, Ensaio won’t repeat the download and will only return the path to the existing file.

This means that placing the code above in a Python script or Jupyter notebook will mean that whoever runs it is guaranteed to get the data on their computer. Running the code multiple times or using the same data in multiple places will only trigger a single download, saving bandwidth and storage space.

Note

Ensaio uses Pooch under the hood to make all of this work.

Once we have the path to the data file, we can load it like we would any other data file. In this case, our data is in a CSV file so the natural choice is to use Pandas:

data = pd.read_csv(fname)
data

	longitude	latitude	height_sea_level_m	gravity_mgal
0	18.34444	-34.12971	32.2	979656.12
1	18.36028	-34.08833	592.5	979508.21
2	18.37418	-34.19583	18.4	979666.46
3	18.40388	-34.23972	25.0	979671.03
4	18.41112	-34.16444	228.7	979616.11
...	...	...	...	...
14354	21.22500	-17.95833	1053.1	978182.09
14355	21.27500	-17.98333	1033.3	978183.09
14356	21.70833	-17.99166	1041.8	978182.69
14357	21.85000	-17.95833	1033.3	978193.18
14358	21.98333	-17.94166	1022.6	978211.38

14359 rows × 4 columns

Where are the data?#

The location of the cache folder varies by operating system. Use the ensaio.locate function to get its location on your computer.

print(ensaio.locate())

/home/runner/work/_temp/cache/ensaio

You can also set the location manually by creating a ENSAIO_DATA_DIR environment variable with the desired path. Ensaio will search for this variable and if found will use its value instead of the default cache folder.

Downloading data

Contents

Downloading data#

Where are the data?#