{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Using Weights\n\nOne of the advantages of using a Green's functions approach to interpolation is\nthat we can easily weight the data to give each point more or less influence\nover the results. This is a good way to not let data points with large\nuncertainties bias the interpolation or the data decimation.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import cartopy.crs as ccrs\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport pyproj\n\n# The weights vary a lot so it's better to plot them using a logarithmic color\n# scale\nfrom matplotlib.colors import LogNorm\n\nimport verde as vd"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We'll use some sample GPS vertical ground velocity which has some variable\nuncertainties associated with each data point. The data are loaded as a\npandas.DataFrame:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "data = vd.datasets.fetch_california_gps()\nprint(data.head())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's plot our data using Cartopy to see what the vertical velocities and\ntheir uncertainties look like. We'll make a function for this so we can reuse\nit later on.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def plot_data(coordinates, velocity, weights, title_data, title_weights):\n    \"Make two maps of our data, one with the data and one with the weights\"\n    fig, axes = plt.subplots(\n        1, 2, figsize=(9.5, 7), subplot_kw=dict(projection=ccrs.Mercator())\n    )\n    crs = ccrs.PlateCarree()\n    ax = axes[0]\n    ax.set_title(title_data)\n    maxabs = vd.maxabs(velocity)\n    pc = ax.scatter(\n        *coordinates,\n        c=velocity,\n        s=30,\n        cmap=\"seismic\",\n        vmin=-maxabs,\n        vmax=maxabs,\n        transform=crs,\n    )\n    plt.colorbar(pc, ax=ax, orientation=\"horizontal\", pad=0.05).set_label(\"m/yr\")\n    vd.datasets.setup_california_gps_map(ax)\n    ax = axes[1]\n    ax.set_title(title_weights)\n    pc = ax.scatter(\n        *coordinates, c=weights, s=30, cmap=\"magma\", transform=crs, norm=LogNorm()\n    )\n    plt.colorbar(pc, ax=ax, orientation=\"horizontal\", pad=0.05)\n    vd.datasets.setup_california_gps_map(ax)\n    plt.show()\n\n\n# Plot the data and the uncertainties\nplot_data(\n    (data.longitude, data.latitude),\n    data.velocity_up,\n    data.std_up,\n    \"Vertical GPS velocity\",\n    \"Uncertainty (m/yr)\",\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Weights in data decimation\n\n:class:`~verde.BlockReduce` can't output weights for each data point because\nit doesn't know which reduction operation it's using. If you want to do a\nweighted interpolation, like :class:`verde.Spline`,\n:class:`~verde.BlockReduce` won't propagate the weights to the interpolation\nfunction. If your data are relatively smooth, you can use\n:class:`verde.BlockMean` instead to decimated data and produce weights. It\ncan calculate different kinds of weights, depending on configuration options\nand what you give it as input.\n\nLet's explore all of the possibilities.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "mean = vd.BlockMean(spacing=15 / 60)\nprint(mean)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Option 1: No input weights\n\nIn this case, we'll get a standard mean and the output weights will be 1 over\nthe variance of the data in each block:\n\n\\begin{align}\\bar{d} = \\dfrac{\\sum\\limits_{i=1}^N d_i}{N}\n    \\: , \\qquad\n    \\sigma^2 = \\dfrac{\\sum\\limits_{i=1}^N (d_i - \\bar{d})^2}{N}\n    \\: , \\qquad\n    w = \\dfrac{1}{\\sigma^2}\\end{align}\n\nin which $N$ is the number of data points in the block, $d_i$ are\nthe data values in the block, and the output values for the block are the\nmean data $\\bar{d}$ and the weight $w$.\n\nNotice that data points that are more uncertain don't necessarily have\nsmaller weights. Instead, the blocks that contain data with sharper\nvariations end up having smaller weights, like the data points in the south.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "coordinates, velocity, weights = mean.filter(\n    coordinates=(data.longitude, data.latitude), data=data.velocity_up\n)\n\nplot_data(\n    coordinates,\n    velocity,\n    weights,\n    \"Mean vertical GPS velocity\",\n    \"Weights based on data variance\",\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Option 2: Input weights are not related to the uncertainty of the data\n\nThis is the case when data weights are chosen by the user, not based on the\nmeasurement uncertainty. For example, when you need to give less importance\nto a portion of the data and no uncertainties are available. The mean will be\nweighted and the output weights will be 1 over the weighted variance of the\ndata in each block:\n\n\\begin{align}\\bar{d}^* = \\dfrac{\\sum\\limits_{i=1}^N w_i d_i}{\\sum\\limits_{i=1}^N w_i}\n    \\: , \\qquad\n    \\sigma^2_w = \\dfrac{\\sum\\limits_{i=1}^N w_i(d_i - \\bar{d}*)^2}{\n        \\sum\\limits_{i=1}^N w_i}\n    \\: , \\qquad\n    w = \\dfrac{1}{\\sigma^2_w}\\end{align}\n\nin which $w_i$ are the input weights in the block.\n\nThe output will be similar to the one above but points with larger initial\nweights will have a smaller influence on the mean and also on the output\nweights.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# We'll use 1 over the squared data uncertainty as our input weights.\ndata[\"weights\"] = 1 / data.std_up**2\n\n# By default, BlockMean assumes that weights are not related to uncertainties\ncoordinates, velocity, weights = mean.filter(\n    coordinates=(data.longitude, data.latitude),\n    data=data.velocity_up,\n    weights=data.weights,\n)\n\nplot_data(\n    coordinates,\n    velocity,\n    weights,\n    \"Weighted mean vertical GPS velocity\",\n    \"Weights based on weighted data variance\",\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Option 3: Input weights are 1 over the data uncertainty squared\n\nIf input weights are 1 over the data uncertainty squared, we can use\nuncertainty propagation to calculate the uncertainty of the weighted mean and\nuse it to define our output weights. Use option ``uncertainty=True`` to tell\n:class:`~verde.BlockMean` to calculate weights based on the propagated\nuncertainty of the data. The output weights will be 1 over the propagated\nuncertainty squared. In this case, the **input weights must not be\nnormalized**. This is preferable if you know the uncertainty of the data.\n\n\\begin{align}w_i = \\dfrac{1}{\\sigma_i^2}\n    \\: , \\qquad\n    \\sigma_{\\bar{d}^*}^2 = \\dfrac{1}{\\sum\\limits_{i=1}^N w_i}\n    \\: , \\qquad\n    w = \\dfrac{1}{\\sigma_{\\bar{d}^*}^2}\\end{align}\n\nin which $\\sigma_i$ are the input data uncertainties in the block and\n$\\sigma_{\\bar{d}^*}$ is the propagated uncertainty of the weighted mean\nin the block.\n\nNotice that in this case the output weights reflect the input data\nuncertainties. Less weight is given to the data points that had larger\nuncertainties from the start.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Configure BlockMean to assume that the input weights are 1/uncertainty**2\nmean = vd.BlockMean(spacing=15 / 60, uncertainty=True)\n\ncoordinates, velocity, weights = mean.filter(\n    coordinates=(data.longitude, data.latitude),\n    data=data.velocity_up,\n    weights=data.weights,\n)\n\nplot_data(\n    coordinates,\n    velocity,\n    weights,\n    \"Weighted mean vertical GPS velocity\",\n    \"Weights based on data uncertainty\",\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<div class=\"alert alert-info\"><h4>Note</h4><p>Output weights are always normalized to the ]0, 1] range. See\n    :func:`verde.variance_to_weights`.</p></div>\n\n## Interpolation with weights\n\nThe Green's functions based interpolation classes in Verde, like\n:class:`~verde.Spline`, can take input weights if you want to give less\nimportance to some data points. In our case, the points with larger\nuncertainties shouldn't have the same influence in our gridded solution as\nthe points with lower uncertainties.\n\nLet's setup a projection to grid our geographic data using the Cartesian\nspline gridder.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "projection = pyproj.Proj(proj=\"merc\", lat_ts=data.latitude.mean())\nproj_coords = projection(data.longitude.values, data.latitude.values)\n\nregion = vd.get_region(coordinates)\nspacing = 5 / 60"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we can grid our data using a weighted spline. We'll use the block mean\nresults with uncertainty based weights.\n\nNote that the weighted spline solution will only work on a non-exact\ninterpolation. So we'll need to use some damping regularization or not use\nthe data locations for the point forces. Here, we'll apply a bit of damping.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "spline = vd.Chain(\n    [\n        # Convert the spacing to meters because Spline is a Cartesian gridder\n        (\"mean\", vd.BlockMean(spacing=spacing * 111e3, uncertainty=True)),\n        (\"spline\", vd.Spline(damping=1e-10)),\n    ]\n).fit(proj_coords, data.velocity_up, data.weights)\ngrid = spline.grid(\n    region=region,\n    spacing=spacing,\n    projection=projection,\n    dims=[\"latitude\", \"longitude\"],\n    data_names=\"velocity\",\n)\n# Avoid showing interpolation outside of the convex hull of the data points.\ngrid = vd.convexhull_mask(coordinates, grid=grid, projection=projection)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Calculate an unweighted spline as well for comparison.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "spline_unweighted = vd.Chain(\n    [\n        (\"mean\", vd.BlockReduce(np.mean, spacing=spacing * 111e3)),\n        (\"spline\", vd.Spline()),\n    ]\n).fit(proj_coords, data.velocity_up)\ngrid_unweighted = spline_unweighted.grid(\n    region=region,\n    spacing=spacing,\n    projection=projection,\n    dims=[\"latitude\", \"longitude\"],\n    data_names=\"velocity\",\n)\ngrid_unweighted = vd.convexhull_mask(\n    coordinates, grid=grid_unweighted, projection=projection\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, plot the weighted and unweighted grids side by side.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "fig, axes = plt.subplots(\n    1, 2, figsize=(9.5, 7), subplot_kw=dict(projection=ccrs.Mercator())\n)\ncrs = ccrs.PlateCarree()\nax = axes[0]\nax.set_title(\"Spline interpolation with weights\")\nmaxabs = vd.maxabs(data.velocity_up)\npc = grid.velocity.plot.pcolormesh(\n    ax=ax,\n    cmap=\"seismic\",\n    vmin=-maxabs,\n    vmax=maxabs,\n    transform=crs,\n    add_colorbar=False,\n    add_labels=False,\n)\nplt.colorbar(pc, ax=ax, orientation=\"horizontal\", pad=0.05).set_label(\"m/yr\")\nax.plot(data.longitude, data.latitude, \".k\", markersize=0.1, transform=crs)\nax.coastlines()\nvd.datasets.setup_california_gps_map(ax)\nax = axes[1]\nax.set_title(\"Spline interpolation without weights\")\npc = grid_unweighted.velocity.plot.pcolormesh(\n    ax=ax,\n    cmap=\"seismic\",\n    vmin=-maxabs,\n    vmax=maxabs,\n    transform=crs,\n    add_colorbar=False,\n    add_labels=False,\n)\nplt.colorbar(pc, ax=ax, orientation=\"horizontal\", pad=0.05).set_label(\"m/yr\")\nax.plot(data.longitude, data.latitude, \".k\", markersize=0.1, transform=crs)\nax.coastlines()\nvd.datasets.setup_california_gps_map(ax)\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.12.3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}