mlstream.models package

Submodules

mlstream.models.base_models module

class mlstream.models.base_models.LumpedModel

Bases: object

Model that operates on lumped (daily, basin-averaged) inputs.

load(model_file: pathlib.Path) → None

Loads a trained and pickled model.

Parameters:model_file (Path) – Path to the stored model.
predict(ds: mlstream.datasets.LumpedBasin) → numpy.ndarray

Generates predictions for a basin.

Parameters:ds (LumpedBasin) – Dataset of the basin to predict.
Returns:Array of predictions.
Return type:np.ndarray
train(ds: mlstream.datasets.LumpedH5) → None

Trains the model.

Parameters:ds (LumpedH5) – Training dataset

mlstream.models.lstm module

Large parts of this implementation are taken over from https://github.com/kratzert/ealstm_regional_modeling.

class mlstream.models.lstm.EALSTM(input_size_dyn: int, input_size_stat: int, hidden_size: int, batch_first: bool = True, initial_forget_bias: int = 0)

Bases: sphinx.ext.autodoc.importer._MockObject

Implementation of the Entity-Aware-LSTM (EA-LSTM)

Model details: https://arxiv.org/abs/1907.08456

Parameters:
  • input_size_dyn (int) – Number of dynamic features, which are those, passed to the LSTM at each time step.
  • input_size_stat (int) – Number of static features, which are those that are used to modulate the input gate.
  • hidden_size (int) – Number of hidden/memory cells.
  • batch_first (bool, optional) – If True, expects the batch inputs to be of shape [batch, seq, features] otherwise, the shape has to be [seq, batch, features], by default True.
  • initial_forget_bias (int, optional) – Value of the initial forget gate bias, by default 0
forward(x_d: <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fef0>, x_s: <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5ff28>) → Tuple[<sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5ff60>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5ff98>]

Performs a forward pass on the model.

Parameters:
  • x_d (torch.Tensor) – Tensor, containing a batch of sequences of the dynamic features. Shape has to match the format specified with batch_first.
  • x_s (torch.Tensor) – Tensor, containing a batch of static features.
Returns:

  • h_n (torch.Tensor) – The hidden states of each time step of each sample in the batch.
  • c_n (torch.Tensor) – The cell states of each time step of each sample in the batch.

reset_parameters()

Initialize all learnable parameters of the LSTM

class mlstream.models.lstm.LSTM(input_size: int, hidden_size: int, batch_first: bool = True, initial_forget_bias: int = 0)

Bases: sphinx.ext.autodoc.importer._MockObject

Implementation of the standard LSTM.

Parameters:
  • input_size (int) – Number of input features
  • hidden_size (int) – Number of hidden/memory cells.
  • batch_first (bool, optional) – If True, expects the batch inputs to be of shape [batch, seq, features] otherwise, the shape has to be [seq, batch, features], by default True.
  • initial_forget_bias (int, optional) – Value of the initial forget gate bias, by default 0
forward(x: <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de6b0f0>) → Tuple[<sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de6b128>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de6b160>]

Performs a forward pass on the model.

Parameters:x (torch.Tensor) – Tensor, containing a batch of input sequences. Format must match the specified format, defined by the batch_first agrument.
Returns:
  • h_n (torch.Tensor) – The hidden states of each time step of each sample in the batch.
  • c_n (torch.Tensor) – The cell states of each time step of each sample in the batch.
reset_parameters()

Initializes all learnable parameters of the LSTM.

class mlstream.models.lstm.LumpedLSTM(num_dynamic_vars: int, num_static_vars: int, use_mse: bool = True, no_static: bool = False, concat_static: bool = False, run_dir: pathlib.Path = None, n_jobs: int = 1, hidden_size: int = 256, learning_rate: float = 0.001, learning_rates: Dict[KT, VT] = {}, epochs: int = 30, initial_forget_bias: int = 5, dropout: float = 0.0, batch_size: int = 256, clip_norm: bool = True, clip_value: float = 1.0)

Bases: mlstream.models.base_models.LumpedModel

(EA-)LSTM model for lumped data.

load(model_file: pathlib.Path) → None

Loads a trained and pickled model.

Parameters:model_file (Path) – Path to the stored model.
predict(ds: mlstream.datasets.LumpedBasin) → numpy.ndarray

Generates predictions for a basin.

Parameters:ds (LumpedBasin) – Dataset of the basin to predict.
Returns:Array of predictions.
Return type:np.ndarray
train(ds: mlstream.datasets.LumpedH5) → None

Trains the model.

Parameters:ds (LumpedH5) – Training dataset
class mlstream.models.lstm.Model(input_size_dyn: int, input_size_stat: int, hidden_size: int, initial_forget_bias: int = 5, dropout: float = 0.0, concat_static: bool = False, no_static: bool = False)

Bases: sphinx.ext.autodoc.importer._MockObject

Wrapper class that connects LSTM/EA-LSTM with fully connceted layer

forward(x_d: <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fac8>, x_s: <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fda0> = None) → Tuple[<sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fdd8>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fe10>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fe48>]

Run forward pass through the model. :param x_d: Tensor containing the dynamic input features of shape [batch, seq_length, n_features] :type x_d: torch.Tensor :param x_s: Tensor containing the static catchment characteristics, by default None :type x_s: torch.Tensor, optional

Returns:
  • out (torch.Tensor) – Tensor containing the network predictions
  • h_n (torch.Tensor) – Tensor containing the hidden states of each time step
  • c_n (torch,Tensor) – Tensor containing the cell states of each time step

mlstream.models.nseloss module

class mlstream.models.nseloss.NSELoss(eps: float = 0.1)

Bases: sphinx.ext.autodoc.importer._MockObject

Calculates (batch-wise) NSE Loss.

Each sample i is weighted by 1 / (std_i + eps)^2, where std_i is the standard deviation of the discharge of the basin to which the sample belongs.

Parameters:eps (float) – Constant, added to the weight for numerical stability and smoothing, default to 0.1
forward(y_pred: <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fcf8>, y_true: <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fd30>, q_stds: <sphinx.ext.autodoc.importer._MockObject object at 0x7f9f5de5fd68>)

Calculates the batch-wise NSE loss function.

Parameters:
  • y_pred (torch.Tensor) – Tensor containing the network prediction.
  • y_true (torch.Tensor) – Tensor containing the true discharge values
  • q_stds (torch.Tensor) – Tensor containing the discharge std (calculated over training period) of each sample
Returns:

The batch-wise NSE-Loss

Return type:

torch.Tenor

class mlstream.models.nseloss.XGBNSEObjective(dummy_target, actual_target, q_stds, eps: float = 0.1)

Bases: object

Custom NSE XGBoost objective.

This is a bit of a hack: We use a unique dummy target value for each sample, allowing us to look up the q_std that corresponds to the sample’s station. When calculating the loss, we replace the dummy with the actual target so the model learns the right thing.

neg_nse_metric_sklearn(estimator, X, y_true)

Negative NSE metric for sklearn.

nse(y_pred, y_true, q_stds)
nse_metric_xgb(y_pred, y_true)

NSE metric for XGBoost.

nse_objective_xgb(y_pred, dtrain)

NSE objective for XGBoost (non-sklearn API).

nse_objective_xgb_sklearn_api(y_true, y_pred)

NSE objective for XGBoost (sklearn API).

mlstream.models.sklearn_models module

class mlstream.models.sklearn_models.LumpedSklearnRegression(model: sklearn.base.BaseEstimator, no_static: bool = False, concat_static: bool = True, run_dir: pathlib.Path = None, n_jobs: int = 1)

Bases: mlstream.models.base_models.LumpedModel

Wrapper for scikit-learn regression models on lumped data.

load(model_file: pathlib.Path) → None

Loads a trained and pickled model.

Parameters:model_file (Path) – Path to the stored model.
predict(ds: mlstream.datasets.LumpedBasin) → numpy.ndarray

Generates predictions for a basin.

Parameters:ds (LumpedBasin) – Dataset of the basin to predict.
Returns:Array of predictions.
Return type:np.ndarray
train(ds: mlstream.datasets.LumpedH5) → None

Trains the model.

Parameters:ds (LumpedH5) – Training dataset

mlstream.models.xgboost module

class mlstream.models.xgboost.LumpedXGBoost(no_static: bool = False, concat_static: bool = True, use_mse: bool = False, run_dir: pathlib.Path = None, n_jobs: int = 1, seed: int = 0, n_estimators: int = 100, learning_rate: float = 0.01, early_stopping_rounds: int = None, n_cv: int = 5, param_dist: Dict[KT, VT] = None, param_search_n_estimators: int = None, param_search_n_iter: int = None, param_search_early_stopping_rounds: int = None, reg_search_param_dist: Dict[KT, VT] = None, reg_search_n_iter: int = None, model_path: pathlib.Path = None)

Bases: mlstream.models.base_models.LumpedModel

Wrapper for XGBoost model on lumped data.

load(model_path: pathlib.Path)

Loads a trained and pickled model.

Parameters:model_file (Path) – Path to the stored model.
predict(ds: mlstream.datasets.LumpedBasin) → numpy.ndarray

Generates predictions for a basin.

Parameters:ds (LumpedBasin) – Dataset of the basin to predict.
Returns:Array of predictions.
Return type:np.ndarray
train(ds: mlstream.datasets.LumpedH5) → None

Trains the model.

Parameters:ds (LumpedH5) – Training dataset

Module contents