verde.cross_val_score

verde.cross_val_score(estimator, coordinates, data, weights=None, cv=None, client=None)[source]

Score an estimator/gridder using cross-validation.

Similar to sklearn.model_selection.cross_val_score but modified to accept spatial multi-component data with weights.

By default, will use sklearn.model_selection.KFold to split the dataset. Another cross-validation class can be passed in through the cv argument.

Can optionally run in parallel using dask. To do this, pass in a dask.distributed.Client as the client argument. Tasks in this function will be submitted to the dask cluster, which can be local. In this case, the resulting scores won’t be the actual values but dask.distributed.Future objects. Call their .result() methods to get back the values or pass them along to other dask computations.

Parameters:
estimator : verde gridder

Any verde gridder class that has the fit and score methods.

coordinates : tuple of arrays

Arrays with the coordinates of each data point. Should be in the following order: (easting, northing, vertical, …).

data : array or tuple of arrays

the data values of each data point. If the data has more than one component, data must be a tuple of arrays (one for each component).

weights : none or array or tuple of arrays

if not none, then the weights assigned to each data point. If more than one data component is provided, you must provide a weights array for each data component (if not none).

cv : None or cross-validation generator

Any scikit-learn cross-validation generator. Defaults to sklearn.model_selection.KFold.

client : None or dask.distributed.Client

If None, then computations are run serially. Otherwise, should be a dask Client object. It will be used to dispatch computations to the dask cluster.

Returns:
scores : list

List of scores for each split of the cross-validation generator. If client is not None, then the scores will be futures.

Examples

>>> from verde import grid_coordinates, Trend
>>> coords = grid_coordinates((0, 10, -10, -5), spacing=0.1)
>>> data = 10 - coords[0] + 0.5*coords[1]
>>> # A linear trend should perfectly predict this data
>>> scores = cross_val_score(Trend(degree=1), coords, data)
>>> print(', '.join(['{:.2f}'.format(score) for score in scores]))
1.00, 1.00, 1.00, 1.00, 1.00
>>> # To run parallel, we need to create a dask.distributed Client. It will
>>> # create a local cluster if no arguments are given so we can run the
>>> # scoring on a single machine.
>>> from dask.distributed import Client
>>> client = Client()
>>> # The scoring will now only submit tasks to our local cluster
>>> scores = cross_val_score(Trend(degree=1), coords, data, client=client)
>>> # The scores are not the actual values but Futures
>>> type(scores[0])
<class 'distributed.client.Future'>
>>> # We need to call .result() to get back the actual value
>>> print('{:.2f}'.format(scores[0].result()))
1.00