mlstream.models package¶

Submodules¶

mlstream.models.base_models module¶

class mlstream.models.base_models.LumpedModel¶

Bases: object

Model that operates on lumped (daily, basin-averaged) inputs.

load(model_file: pathlib.Path) → None¶

Loads a trained and pickled model.

Parameters:	model_file (Path) – Path to the stored model.

predict(ds: mlstream.datasets.LumpedBasin) → numpy.ndarray¶

Generates predictions for a basin.

Parameters:	ds (LumpedBasin) – Dataset of the basin to predict.
Returns:	Array of predictions.
Return type:	np.ndarray

train(ds: mlstream.datasets.LumpedH5) → None¶

Trains the model.

Parameters:	ds (LumpedH5) – Training dataset

mlstream.models.lstm module¶

Large parts of this implementation are taken over from https://github.com/kratzert/ealstm_regional_modeling.

class mlstream.models.lstm.EALSTM(input_size_dyn: int, input_size_stat: int, hidden_size: int, batch_first: bool = True, initial_forget_bias: int = 0)¶

Bases: sphinx.ext.autodoc.importer._MockObject

Implementation of the Entity-Aware-LSTM (EA-LSTM)

Model details: https://arxiv.org/abs/1907.08456

Parameters:

input_size_dyn (int) – Number of dynamic features, which are those, passed to the LSTM at each time step.
input_size_stat (int) – Number of static features, which are those that are used to modulate the input gate.
hidden_size (int) – Number of hidden/memory cells.
batch_first (bool, optional) – If True, expects the batch inputs to be of shape [batch, seq, features] otherwise, the shape has to be [seq, batch, features], by default True.
initial_forget_bias (int, optional) – Value of the initial forget gate bias, by default 0

forward(x_d: <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fef0>, x_s: <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5ff28>) → Tuple[<sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5ff60>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5ff98>]¶

Performs a forward pass on the model.

Parameters:

x_d (torch.Tensor) – Tensor, containing a batch of sequences of the dynamic features. Shape has to match the format specified with batch_first.
x_s (torch.Tensor) – Tensor, containing a batch of static features.

Returns:

h_n (torch.Tensor) – The hidden states of each time step of each sample in the batch.
c_n (torch.Tensor) – The cell states of each time step of each sample in the batch.

reset_parameters()¶: Initialize all learnable parameters of the LSTM

class mlstream.models.lstm.LSTM(input_size: int, hidden_size: int, batch_first: bool = True, initial_forget_bias: int = 0)¶

Bases: sphinx.ext.autodoc.importer._MockObject

Implementation of the standard LSTM.

Parameters:	input_size (int) – Number of input features hidden_size (int) – Number of hidden/memory cells. batch_first (bool, optional) – If True, expects the batch inputs to be of shape [batch, seq, features] otherwise, the shape has to be [seq, batch, features], by default True. initial_forget_bias (int, optional) – Value of the initial forget gate bias, by default 0

forward(x: <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de6b0f0>) → Tuple[<sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de6b128>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de6b160>]¶

Performs a forward pass on the model.

Parameters:	x (torch.Tensor) – Tensor, containing a batch of input sequences. Format must match the specified format, defined by the batch_first agrument.
Returns:	h_n (torch.Tensor) – The hidden states of each time step of each sample in the batch. c_n (torch.Tensor) – The cell states of each time step of each sample in the batch.

reset_parameters()¶: Initializes all learnable parameters of the LSTM.

class mlstream.models.lstm.LumpedLSTM(num_dynamic_vars: int, num_static_vars: int, use_mse: bool = True, no_static: bool = False, concat_static: bool = False, run_dir: pathlib.Path = None, n_jobs: int = 1, hidden_size: int = 256, learning_rate: float = 0.001, learning_rates: Dict[KT, VT] = {}, epochs: int = 30, initial_forget_bias: int = 5, dropout: float = 0.0, batch_size: int = 256, clip_norm: bool = True, clip_value: float = 1.0)¶

Bases: mlstream.models.base_models.LumpedModel

(EA-)LSTM model for lumped data.

load(model_file: pathlib.Path) → None¶

Loads a trained and pickled model.

Parameters:	model_file (Path) – Path to the stored model.

predict(ds: mlstream.datasets.LumpedBasin) → numpy.ndarray¶

Generates predictions for a basin.

Parameters:	ds (LumpedBasin) – Dataset of the basin to predict.
Returns:	Array of predictions.
Return type:	np.ndarray

train(ds: mlstream.datasets.LumpedH5) → None¶

Trains the model.

Parameters:	ds (LumpedH5) – Training dataset

class mlstream.models.lstm.Model(input_size_dyn: int, input_size_stat: int, hidden_size: int, initial_forget_bias: int = 5, dropout: float = 0.0, concat_static: bool = False, no_static: bool = False)¶

Bases: sphinx.ext.autodoc.importer._MockObject

Wrapper class that connects LSTM/EA-LSTM with fully connceted layer

forward(x_d: <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fac8>, x_s: <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fda0> = None) → Tuple[<sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fdd8>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fe10>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fe48>]¶

Run forward pass through the model. :param x_d: Tensor containing the dynamic input features of shape [batch, seq_length, n_features] :type x_d: torch.Tensor :param x_s: Tensor containing the static catchment characteristics, by default None :type x_s: torch.Tensor, optional

Returns:	out (torch.Tensor) – Tensor containing the network predictions h_n (torch.Tensor) – Tensor containing the hidden states of each time step c_n (torch,Tensor) – Tensor containing the cell states of each time step

mlstream.models.nseloss module¶

class mlstream.models.nseloss.NSELoss(eps: float = 0.1)¶

Bases: sphinx.ext.autodoc.importer._MockObject

Calculates (batch-wise) NSE Loss.

Each sample i is weighted by 1 / (std_i + eps)^2, where std_i is the standard deviation of the discharge of the basin to which the sample belongs.

Parameters:	eps (float) – Constant, added to the weight for numerical stability and smoothing, default to 0.1

forward(y_pred: <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fcf8>, y_true: <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fd30>, q_stds: <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fd68>)¶

Calculates the batch-wise NSE loss function.

Parameters:	y_pred (torch.Tensor) – Tensor containing the network prediction. y_true (torch.Tensor) – Tensor containing the true discharge values q_stds (torch.Tensor) – Tensor containing the discharge std (calculated over training period) of each sample
Returns:	The batch-wise NSE-Loss
Return type:	torch.Tenor

class mlstream.models.nseloss.XGBNSEObjective(dummy_target, actual_target, q_stds, eps: float = 0.1)¶

Bases: object

Custom NSE XGBoost objective.

This is a bit of a hack: We use a unique dummy target value for each sample, allowing us to look up the q_std that corresponds to the sample’s station. When calculating the loss, we replace the dummy with the actual target so the model learns the right thing.

neg_nse_metric_sklearn(estimator, X, y_true)¶: Negative NSE metric for sklearn.

nse(y_pred, y_true, q_stds)¶

nse_metric_xgb(y_pred, y_true)¶: NSE metric for XGBoost.

nse_objective_xgb(y_pred, dtrain)¶: NSE objective for XGBoost (non-sklearn API).

nse_objective_xgb_sklearn_api(y_true, y_pred)¶: NSE objective for XGBoost (sklearn API).

mlstream.models.sklearn_models module¶

class mlstream.models.sklearn_models.LumpedSklearnRegression(model: sklearn.base.BaseEstimator, no_static: bool = False, concat_static: bool = True, run_dir: pathlib.Path = None, n_jobs: int = 1)¶

Bases: mlstream.models.base_models.LumpedModel

Wrapper for scikit-learn regression models on lumped data.

load(model_file: pathlib.Path) → None¶

Loads a trained and pickled model.

Parameters:	model_file (Path) – Path to the stored model.

predict(ds: mlstream.datasets.LumpedBasin) → numpy.ndarray¶

Generates predictions for a basin.

Parameters:	ds (LumpedBasin) – Dataset of the basin to predict.
Returns:	Array of predictions.
Return type:	np.ndarray

train(ds: mlstream.datasets.LumpedH5) → None¶

Trains the model.

Parameters:	ds (LumpedH5) – Training dataset

mlstream.models.xgboost module¶

class mlstream.models.xgboost.LumpedXGBoost(no_static: bool = False, concat_static: bool = True, use_mse: bool = False, run_dir: pathlib.Path = None, n_jobs: int = 1, seed: int = 0, n_estimators: int = 100, learning_rate: float = 0.01, early_stopping_rounds: int = None, n_cv: int = 5, param_dist: Dict[KT, VT] = None, param_search_n_estimators: int = None, param_search_n_iter: int = None, param_search_early_stopping_rounds: int = None, reg_search_param_dist: Dict[KT, VT] = None, reg_search_n_iter: int = None, model_path: pathlib.Path = None)¶

Bases: mlstream.models.base_models.LumpedModel

Wrapper for XGBoost model on lumped data.

load(model_path: pathlib.Path)¶

Loads a trained and pickled model.

Parameters:	model_file (Path) – Path to the stored model.

predict(ds: mlstream.datasets.LumpedBasin) → numpy.ndarray¶

Generates predictions for a basin.

Parameters:	ds (LumpedBasin) – Dataset of the basin to predict.
Returns:	Array of predictions.
Return type:	np.ndarray

train(ds: mlstream.datasets.LumpedH5) → None¶

Trains the model.

Parameters:	ds (LumpedH5) – Training dataset

mlstream.models package¶

Submodules¶

mlstream.models.base_models module¶

mlstream.models.lstm module¶

mlstream.models.nseloss module¶

mlstream.models.sklearn_models module¶

mlstream.models.xgboost module¶

Module contents¶