{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\nChaining Operations\n===================\n\nOften, a data processing pipeline looks like the following:\n\n#. Apply a blocked mean or median to the data\n#. Remove a trend from the blocked data\n#. Fit a spline to the residual of the trend\n#. Grid using the spline and restore the trend\n\nThe :class:`verde.Chain` class allows us to created gridders that perform multiple\noperations on data. Each step in the chain filters the input and passes the result along\nto the next step. For gridders and trend estimators, filtering means fitting the model\nand passing along the residuals (input data minus predicted data). When predicting data,\nthe predictions of each step are added together.\n\nOther operations, like :class:`verde.BlockReduce` and :class:`verde.BlockMean` change\nthe input data values and the coordinates but don't impact the predictions because they\ndon't implement the :meth:`~verde.base.BaseGridder.predict` method.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>The :class:`~verde.Chain` class was inspired by the\n    :class:`sklearn.pipeline.Pipeline` class, which doesn't serve our purposes because\n    it only affects the feature matrix, not what we would call *data* (the target\n    vector).</p></div>\n\nFor example, let's create a pipeline to grid our sample bathymetry data.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\nimport matplotlib.pyplot as plt\nimport cartopy.crs as ccrs\nimport pyproj\nimport verde as vd\n\ndata = vd.datasets.fetch_baja_bathymetry()\nregion = vd.get_region((data.longitude, data.latitude))\n# The desired grid spacing in degrees (converted to meters using 1 degree approx. 111km)\nspacing = 10 / 60\n# Use Mercator projection because Spline is a Cartesian gridder\nprojection = pyproj.Proj(proj=\"merc\", lat_ts=data.latitude.mean())\nproj_coords = projection(data.longitude.values, data.latitude.values)\n\nplt.figure(figsize=(7, 6))\nax = plt.axes(projection=ccrs.Mercator())\nax.set_title(\"Bathymetry from Baja California\")\nplt.scatter(\n    data.longitude,\n    data.latitude,\n    c=data.bathymetry_m,\n    s=0.1,\n    transform=ccrs.PlateCarree(),\n)\nplt.colorbar().set_label(\"meters\")\nvd.datasets.setup_baja_bathymetry_map(ax)\nplt.tight_layout()\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We'll create a chain that applies a blocked median to the data, fits a polynomial\ntrend, and then fits a standard gridder to the trend residuals.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "chain = vd.Chain(\n    [\n        (\"reduce\", vd.BlockReduce(np.median, spacing * 111e3)),\n        (\"trend\", vd.Trend(degree=1)),\n        (\"spline\", vd.Spline()),\n    ]\n)\nprint(chain)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Calling :meth:`verde.Chain.fit` will automatically run the data through the chain:\n\n#. Apply the blocked median to the input data\n#. Fit a trend to the blocked data and output the residuals\n#. Fit the spline to the trend residuals\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "chain.fit(proj_coords, data.bathymetry_m)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now that the data has been through the chain, calling :meth:`verde.Chain.predict` will\nsum the results of every step in the chain that has a ``predict`` method. In our case,\nthat will be only the :class:`~verde.Trend` and :class:`~verde.Spline`.\n\nWe can verify the quality of the fit by inspecting a histogram of the residuals with\nrespect to the original data. Remember that our spline and trend were fit on decimated\ndata, not the original data, so the fit won't be perfect.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "residuals = data.bathymetry_m - chain.predict(proj_coords)\n\nplt.figure()\nplt.title(\"Histogram of fit residuals\")\nplt.hist(residuals, bins=\"auto\", density=True)\nplt.xlabel(\"residuals (m)\")\nplt.xlim(-1500, 1500)\nplt.tight_layout()\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Likewise, :meth:`verde.Chain.grid` creates a grid of the combined trend and spline\npredictions. This is equivalent to a *remove-compute-restore* procedure that should be\nfamiliar to the geodesists among us.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "grid = chain.grid(\n    region=region,\n    spacing=spacing,\n    projection=projection,\n    dims=[\"latitude\", \"longitude\"],\n    data_names=[\"bathymetry\"],\n)\nprint(grid)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, we can plot the resulting grid:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "plt.figure(figsize=(7, 6))\nax = plt.axes(projection=ccrs.Mercator())\nax.set_title(\"Gridded result of the chain\")\npc = grid.bathymetry.plot.pcolormesh(\n    ax=ax, transform=ccrs.PlateCarree(), vmax=0, zorder=-1, add_colorbar=False\n)\nplt.colorbar(pc).set_label(\"meters\")\nvd.datasets.setup_baja_bathymetry_map(ax)\nplt.tight_layout()\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Each component of the chain can be accessed separately using the ``named_steps``\nattribute. It's a dictionary with keys and values matching the inputs given to the\n:class:`~verde.Chain`.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(chain.named_steps[\"trend\"])\nprint(chain.named_steps[\"spline\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "All gridders and estimators in the chain have been fitted and can be used to generate\ngrids and predictions. For example, we can get a grid of the estimated trend:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "grid_trend = chain.named_steps[\"trend\"].grid(\n    region=region,\n    spacing=spacing,\n    projection=projection,\n    dims=[\"latitude\", \"longitude\"],\n    data_names=[\"bathymetry\"],\n)\nprint(grid_trend)\n\nplt.figure(figsize=(7, 6))\nax = plt.axes(projection=ccrs.Mercator())\nax.set_title(\"Gridded trend\")\npc = grid_trend.bathymetry.plot.pcolormesh(\n    ax=ax, transform=ccrs.PlateCarree(), zorder=-1, add_colorbar=False\n)\nplt.colorbar(pc).set_label(\"meters\")\nvd.datasets.setup_baja_bathymetry_map(ax)\nplt.tight_layout()\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.7"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}