.. _tutorial_split:

Splitting points into blocks
============================

Beyond generating values for coordinates, Bordado can also operate on them in a few useful ways. One category of this is splitting points spatially by segmenting them into blocks or windows. Functions :func:`bordado.block_split`, :func:`bordado.expanding_window`, :func:`bordado.rolling_window`, and :func:`bordado.rolling_window_spherical` are designed for this. This tutorial will teach you how to use them.

.. jupyter-execute::

    import bordado as bd
    import numpy as np
    import matplotlib.pyplot as plt
    import pygmt

.. seealso::

   We'll use random data in these examples
   generated by :func:`bordado.random_coordinates` and
   :func:`bordado.random_coordinates_spherical`. See
   ":ref:`tutorial_random`" to learn how to use them.

All of these segmentation functions generate arrays that can be used to index
the coordinate arrays instead of actually doing any of the indexing themselves.
This makes them more flexible to combine with other spatial operations that you
may need.


Non-overlapping spatial blocks
------------------------------

Bordado has function :func:`bordado.block_split` that can provide indices for
segmenting coordinate arrays according to spatial blocks.
For this example, let's make some random coordinates with
:func:`bordado.random_coordinates`:

.. jupyter-execute::

    region = (20, 30, -50, -42)
    coordinates = bd.random_coordinates(region, size=500, random_seed=42)

    fig, ax = plt.subplots(1, 1, figsize=(8, 6), layout="constrained")
    ax.plot(*coordinates, ".")
    ax.set_aspect("equal")
    ax.set_xlim(*region[:2])
    ax.set_ylim(*region[2:])
    ax.set_xlabel("Easting")
    ax.set_ylabel("Northing")
    ax.grid()
    plt.show()

Then we can use :func:`~bordado.block_split` to get the indexers that split
these points into evenly sized blocks:

.. jupyter-execute::

    block_coordinates, labels = bd.block_split(coordinates, block_size=2)
    print(labels)

The labels are the number of the block to which each point belongs. For
example, these are the coordinates of the points that fall inside block 1:

.. jupyter-execute::

    block1 = tuple(c[labels == 1] for c in coordinates)

    print("Coordinates of block 1:")
    for i, c in enumerate(block1):
        print(f"coordinate {i}:\n{c}\n")

The ``block_coordinates`` is a tuple with the coordinates of the center of each
block:

.. jupyter-execute::

    for c in block_coordinates:
        print(f"block coordinate {i}:\n{c}\n")

.. tip::

    The keen-eyed among you might have noticed that the coordinates of the block
    centers are **not spaced by 2** (the block size). This is because by default
    :func:`~bordado.block_split` extracts the region that is divided into blocks
    from the input data, which are random and there are no points exactly on the
    region boundaries. You can pass a ``region`` argument to make this be exact
    or pass ``adjust="region""`` to keep the block size fixed even if the region
    is not passed.

Let's use :mod:`matplotlib` to plot this so we can understand it better.
We'll plot the block centers as stars and a scatter of the points colored
according to which block they belong:

.. jupyter-execute::

    fig, ax = plt.subplots(1, 1, figsize=(8, 8), layout="constrained")
    ax.plot(*block_coordinates, "k*", markersize=10)
    tmp = ax.scatter(*coordinates, c=labels, cmap="tab20")
    cb_region = (-0.5, 0.5 + labels.max())
    fig.colorbar(
        tmp,
        label="Block number",
        orientation="horizontal",
        aspect=30,
        boundaries=bd.line_coordinates(*cb_region, spacing=1),
        values=bd.line_coordinates(*cb_region, spacing=1, pixel_register=True),
        ticks=bd.line_coordinates(*cb_region, spacing=1, pixel_register=True),
    )
    ax.set_aspect("equal")
    ax.set_xlim(*region[:2])
    ax.set_ylim(*region[2:])
    ax.set_xlabel("Easting")
    ax.set_ylabel("Northing")
    plt.show()

The indexing is useful for calculating things like block means and other statistics. For example, we find out the number of points in each block like so:

.. jupyter-execute::

    points_per_block = [coordinates[0][labels == i].size for i in range(labels.max() + 1)]

    print(points_per_block)


Splitting into overlapping windows
----------------------------------

In case we split points into blocks, but we want the blocks to overlap, the
operation is called a **rolling window**. Bordado implements this operation in
function :func:`bordado.rolling_window`.
This is how we use it in the dataset we generated above with windows of size 4
and with a 50% overlap between adjacent windows:

.. jupyter-execute::

    window_coordinates, indices = bd.rolling_window(
        coordinates, window_size=4, overlap=0.5,
    )

The ``window_coordinates`` is a tuple of coordinate arrays corresponding to the
center of each window. It will have the same number of arrays as ``coordinates``
and works in any number of dimensions.

.. jupyter-execute::

    for c in window_coordinates:
        print(f"window coordinate {i}:\n{c}\n")

.. hint::

    Notice that the window size is adjusted here as well to fit the region
    of the points. To prevent this from happening, either pass a ``region`` or
    ``adjust="region"`` to :func:`~bordado.rolling_window`.

Like the ``block_coordinates`` above, the arrays have the number of dimensions
of the input coordinates (in this case, 2D).

The ``indices`` are a bit more complex than the ``labels`` from :func:`~bordado.block_split`. They are an array with the same shape as the ``window_coordinates``:

.. jupyter-execute::

    print(window_coordinates[0].shape)
    print(indices.shape)

Each element of the ``indices`` array corresponds to a window and is an index for selecting points that fall inside that window. For example, this is the index for the first window:

.. jupyter-execute::

    print(indices[0, 0])

.. tip::

    Notice that it's a tuple with one index array. This is because our
    ``coordinates`` are 1D arrays. If they were 2D arrays (for example, if they
    were a regular grid), then ``indices[0, 0]`` would be a tuple with 2 index
    arrays.

We can select the points that fall inside the first window like so:

.. jupyter-execute::

    window = [c[indices[0, 0]] for c in coordinates]

    print("Coordinates of window 0, 0:")
    for i, c in enumerate(window):
        print(f"coordinate {i}:\n{c}\n")

And then we can plot the window centers and the points of the first window:

.. jupyter-execute::

    fig, ax = plt.subplots(1, 1, figsize=(8, 7), layout="constrained")
    ax.plot(*window, ".")
    ax.plot(*window_coordinates, "k*", markersize=10)
    ax.set_aspect("equal")
    ax.set_xlim(*region[:2])
    ax.set_ylim(*region[2:])
    ax.set_xlabel("easting")
    ax.set_ylabel("northing")
    plt.show()

We can do this for other windows as well, like the one to the right of 0, 0:

.. jupyter-execute::

    window = [c[indices[0, 1]] for c in coordinates]

    fig, ax = plt.subplots(1, 1, figsize=(8, 7), layout="constrained")
    ax.plot(*window, ".")
    ax.plot(*window_coordinates, "k*", markersize=10)
    ax.set_aspect("equal")
    ax.set_xlim(*region[:2])
    ax.set_ylim(*region[2:])
    ax.set_xlabel("Easting")
    ax.set_ylabel("Northing")
    plt.show()

Notice that there is overlap and 50% (the chosen ``overlap``) of the points
belong to both windows.


Rolling windows on the sphere
-----------------------------

The rolling window strategy won't work well for geographic data since windows
will get smaller and smaller towards the poles because of the convergence of
meridians.
To demonstrate this, let's first make some random points that are uniformly distributed on the sphere with :func:`bordado.random_coordinates_spherical`:

.. jupyter-execute::

    region = (0, 360, -90, 90)  # W, E, S, N in degrees
    coordinates = bd.random_coordinates_spherical(region, size=3000)

    fig = pygmt.Figure()
    fig.coast(land="#cccccc", region="g", projection="W0/20c", frame="afg")
    fig.plot(x=coordinates[0], y=coordinates[1], style="c0.1c", fill="seagreen")
    fig.show()

Now, we'll split them into 30° windows with 50% overlap using the standard
:func:`~bordado.rolling_window` and plot the window centers:

.. jupyter-execute::

    window_coordinates, indices = bd.rolling_window(
        coordinates, window_size=30, overlap=0.5,
    )

    fig = pygmt.Figure()
    fig.coast(land="#cccccc", region="g", projection="W0/20c", frame="afg")
    fig.plot(
        x=window_coordinates[0].ravel(),
        y=window_coordinates[1].ravel(),
        style="a0.3c",
        fill="orange",
    )
    fig.show()

The window centers get closer as windows get smaller towards the poles. This can
be a problem since windows towards the equator will likely contain more points,
skewing any statistics or other math done on the data inside windows.
There is also a gap at the 360° - 0° boundary since the function doesn't know
that it's a periodic boundary.

We can show this by calculating the number of points inside each window and
plotting that as a function of the window center latitude:

.. jupyter-execute::

    points_per_window = [coordinates[0][i].size for i in indices.ravel()]
    window_latitude = window_coordinates[1].ravel()

    fig, ax = plt.subplots(1, 1, figsize=(8, 5), layout="constrained")
    ax.plot(window_latitude, points_per_window, ".")
    ax.set_xlabel("Window latitude")
    ax.set_ylabel("Points per window")
    ax.grid()
    plt.show()

It's clear from the plot above that there are fewer points inside windows at the
poles, even though the distribution of points is uniform.

To get around this, use :func:`bordado.rolling_window_spherical`:

.. jupyter-execute::

    window_coordinates, indices = bd.rolling_window_spherical(
        coordinates, window_size=30, overlap=0.5,
    )

    fig = pygmt.Figure()
    fig.coast(land="#cccccc", region="g", projection="W0/20c", frame="afg")
    fig.plot(
        x=window_coordinates[0].ravel(),
        y=window_coordinates[1].ravel(),
        style="a0.3c",
        fill="orange",
    )
    ax.set_title("Standard rolling windows")
    fig.show()

.. tip::

    The windows now don't have a fixed 30° x 30° size to compensate for the
    convergence of the meridians. This means that the ``window_size`` argument of
    :func:`~bordado.rolling_window_spherical` refers to the latitudinal size of
    the windows. The longitudinal size will vary depending on the latitude band
    of the window.

In this case, the window centers don't get closer towards the poles. But there
is still the 360° - 0° gap. This happens because the region isn't exactly
360° in longitude and so the code doesn't know that it should wrap around the
globe. To get around this, we can specify the ``region`` argument with the full
range:

.. jupyter-execute::

    window_coordinates, indices = bd.rolling_window_spherical(
        coordinates, window_size=30, overlap=0.5, region=(0, 360, -90, 90),
    )

    fig = pygmt.Figure()
    fig.coast(land="#cccccc", region="g", projection="W0/20c", frame="afg")
    fig.plot(
        x=window_coordinates[0].ravel(),
        y=window_coordinates[1].ravel(),
        style="a0.3c",
        fill="orange",
    )
    fig.show()

This time, since the window centers aren't on a regular grid anymore, both the
``window_coordinates`` and ``indices`` are 1D arrays:

.. jupyter-execute::

    print(f"Coordinate shapes:", end=" ")
    for c in window_coordinates:
        print(c.shape, end=" ")
    print("\nIndices shape:", indices.shape)

Let's plot the points from some of the windows:

.. jupyter-execute::

    window = [c[indices[0]] for c in coordinates]

    fig = pygmt.Figure()
    fig.coast(land="#cccccc", region="g", projection="W0/20c", frame="afg")
    fig.plot(
        x=window[0].ravel(), y=window[1].ravel(), style="c0.1c", fill="seagreen",
    )
    fig.plot(
        x=window_coordinates[0].ravel(),
        y=window_coordinates[1].ravel(),
        style="a0.3c",
        fill="orange",
    )
    fig.show()

.. jupyter-execute::

    window = [c[indices[5]] for c in coordinates]

    fig = pygmt.Figure()
    fig.coast(land="#cccccc", region="g", projection="W0/20c", frame="afg")
    fig.plot(
        x=window[0].ravel(), y=window[1].ravel(), style="c0.1c", fill="seagreen",
    )
    fig.plot(
        x=window_coordinates[0].ravel(),
        y=window_coordinates[1].ravel(),
        style="a0.3c",
        fill="orange",
    )
    fig.show()

.. note::

    Notice that there is an overlap between the two windows plotted above that
    crosses the 360° - 0° line.

Making the same points per window calculation as before, we can check if it
actually works:

.. jupyter-execute::

    points_per_window_spherical = [coordinates[0][i].size for i in indices.ravel()]
    window_latitude_spherical = window_coordinates[1].ravel()

    fig, ax = plt.subplots(1, 1, figsize=(8, 5), layout="constrained")
    ax.plot(window_latitude, points_per_window, ".", label="Regular")
    ax.plot(window_latitude_spherical, points_per_window_spherical, "*", label="Spherical")
    ax.legend()
    ax.set_xlabel("Window latitude")
    ax.set_ylabel("Points per window")
    ax.grid()
    plt.show()

Now the distribution is much more uniform!


What's next
-----------

With this, we've finished our tutorial! To learn more about Bordado, take a look
at the detailed documentation for each function in ":ref:`api`".

Please remember to :ref:`cite Bordado <citing>` if you use it in
a publication, thesis, report, etc. It helps us justify the effort that goes
into making this package.

.. admonition:: Have questions?

    Please ask on any of the `Fatiando a Terra community channels
    <https://www.fatiando.org/contact>`__!