Select points that fall inside a region

3. Select points that fall inside a region#

When we have irregularly sampled data, it can be tedious to select the points that fall inside a given region (bounding box). It can be done with numpy or pandas boolean indexing, of course. But Bordado offers bordado.inside to make this easier and generalizable to n-dimensions. Let’s see how it works on a real dataset!

import ensaio
import pygmt
import pandas as pd
import bordado as bd

We’ll use ensaio.fetch_caribbean_bathymetry to fetch some bathymetry data from the Caribbean. The dataset is a CSV file, which can be loaded with pandas.read_csv:

fname = ensaio.fetch_caribbean_bathymetry(version=2)
data = pd.read_csv(fname)
data
survey_id latitude longitude bathymetry_m
0 86005311 16.09652 -61.52117 -187
1 86005311 16.09415 -61.52104 -177
2 86005311 16.09177 -61.52091 -185
3 86005311 16.08940 -61.52078 -188
4 86005311 16.08703 -61.52066 -192
... ... ... ... ...
294316 JR336 15.28529 -57.01258 -5276
294317 JR336 15.28705 -57.00994 -5277
294318 JR336 15.28883 -57.00732 -5278
294319 JR336 15.29057 -57.00467 -5277
294320 JR336 15.29234 -57.00203 -5276

294321 rows × 4 columns

Let’s plot the data with pygmt to see what we’ve got:

region = bd.get_region((data.longitude, data.latitude))

fig = pygmt.Figure()
pygmt.makecpt(
    cmap="cmocean/topo+h",
    series=[data.bathymetry_m.min(), data.bathymetry_m.max()],
)
fig.plot(
    x=data.longitude,
    y=data.latitude,
    fill=data.bathymetry_m,
    cmap=True,
    style="c0.02c",
    projection="M15c",
    region=region,
    frame=True,
)
fig.colorbar(frame='af+l"bathymetry [m]"')
fig.coast(land="#666666")
fig.show()
../_images/inside_2_0.png

Now, let’s say we wanted only the data that around Barbados. We may have the exact region that we want to crop the data:

region_barbados = (-60.3, -58.7, 12.5, 13.7)  # W, E, S, N

We can use bordado.inside to get an index for our dataset that will select only data inside the given region:

barbados = bd.inside(
    coordinates=(data.longitude, data.latitude),
    region=region_barbados,
)
barbados
array([False, False, False, ..., False, False, False], shape=(294321,))

With the index, we can select the data like so:

data_barbados = data[barbados]
data_barbados
survey_id latitude longitude bathymetry_m
11349 19920018 13.21314 -60.11236 -2396
11350 19920018 13.22323 -60.16074 -2406
11351 19920018 13.23367 -60.21238 -2396
11352 19920018 13.24463 -60.26391 -2396
13175 ODP156JR 13.68979 -59.54782 -1469
... ... ... ... ...
290849 80003511 13.63710 -58.74000 -2198
290850 80003511 13.63640 -58.73740 -2184
290851 80003511 13.63580 -58.73490 -2174
290852 80003511 13.63520 -58.73240 -2145
290853 80003511 13.63470 -58.72980 -2098

11615 rows × 4 columns

And plot the selected data:

fig = pygmt.Figure()
pygmt.makecpt(
    cmap="cmocean/topo+h",
    series=[
        data_barbados.bathymetry_m.min(),
        data_barbados.bathymetry_m.max(),
    ],
)
fig.plot(
    x=data_barbados.longitude,
    y=data_barbados.latitude,
    fill=data_barbados.bathymetry_m,
    cmap=True,
    style="c0.02c",
    projection="M15c",
    region=region,
    frame=True,
)
fig.colorbar(frame='af+l"bathymetry [m]"')
fig.coast(land="#666666")
fig.show()
../_images/inside_6_0.png

Hint

The order of the coordinates must be the same as the order of elements in the region. If the region is W, E, S, N, then the coordinates must be longitude and then latitude.