GRUMML¶

class keras_mml.layers.recurrent.GRUMML[source]¶

Gated Recurrent Unit (GRU) layer, mostly without matrix multiplications.

The implementation of this layer mostly follows the \(\mathrm{MLGRU}\) implementation in Scalable MatMul-free Language Modeling (see section 3.3.1). We differ from the implementation of \(\mathrm{MLGRU}\) by allowing \(\mathbf{g}_t\) and \(\mathbf{o}_t\) to be regular matrix multiplications, rather than just matmul-free ternary weights. The option to make everything ternary weights is controlled by the fully_mml attribute.

Specifically, we perform the following recurrence steps.

\[\begin{split}\begin{align*} \mathbf{f}_t &= \sigma(\mathbf{x}_t\mathbf{W}_f + \mathbf{b}_f)\\ \mathbf{c}_t &= \tau(\mathbf{x}_t\mathbf{W}_c + \mathbf{b}_c)\\ \mathbf{h}_t &= \mathbf{f}_t\odot\mathbf{h}_{t-1} + (1-\mathbf{f}_t)\odot\mathbf{c}_t \\ \mathbf{g}_t &= \sigma(\mathbf{x}_t\mathbf{W}_g + \mathbf{b}_g)\\ \mathbf{o}_t' &= \mathbf{g}_t\odot\mathbf{h}_t\\ \mathbf{o}_t &= \mathbf{o}_t'\mathbf{W}_o + \mathbf{b}_o\\ \end{align*}\end{split}\]

where

\(\mathbf{W}_f\) and \(\mathbf{W}_c\) are ternary weights (and so do not use matrix multiplications during their operation);
\(\mathbf{W}_g\) and \(\mathbf{W}_o\) are (possible) ternary weights, or just regular weight matrices;
\(\sigma\) is the recurrent_activation (e.g., Sigmoid activation); and
\(\tau\) is the activation (e.g., Silu activation).

units¶: Dimensionality of the output space.

fully_mml¶: Whether to use matmul-free operations for all the layers.

num_heads¶: Number of heads to use when performing the recurrent step.

activation¶: Activation function to use.

recurrent_activation¶: Activation function to use for the recurrent step.

use_bias¶: Whether to use a bias vector for the layer.

weights_initializer¶: Initializer for the gates’ matrices. Used for the linear transformation of the inputs.

bias_initializer¶: Initializer for the bias vector.

weights_regularizer¶: Regularizer function applied to the gates’ matrices.

bias_regularizer¶: Regularizer function applied to the bias vector.

weights_constraint¶: Constraint function applied to the gates’ matrices.

bias_constraint¶: Constraint function applied to the bias vector.

__init__(units, fully_mml=False, num_heads=1, activation='silu', recurrent_activation='sigmoid', use_bias=True, weights_initializer='glorot_uniform', bias_initializer='zeros', weights_regularizer=None, bias_regularizer=None, weights_constraint=None, bias_constraint=None, **kwargs)[source]¶

Initializes a new instance of the layer.

Parameters:

units (int) – Dimensionality of the output space.
fully_mml (bool, default: False) – Whether to use matmul-free operations for all the layers.
num_heads (int, default: 1) – Number of heads to use for the recurrent step. See HGRN2: Gated Linear RNNs with State Expansion, section 3.2, for details on the multi-headed variant.
activation (str, default: 'silu') – Activation function to use.
recurrent_activation (str, default: 'sigmoid') – Activation function to use for the recurrent step.
use_bias (bool, default: True) – Whether to use a bias vector for the layer.
weights_initializer (str, default: 'glorot_uniform') – Initializer for the gates’ matrices. Used for the linear transformation of the inputs.
bias_initializer (str, default: 'zeros') – Initializer for the bias vector.
weights_regularizer (Optional[str], default: None) – Regularizer function applied to the gates’ matrices.
bias_regularizer (Optional[str], default: None) – Regularizer function applied to the bias vector.
weights_constraint (Optional[str], default: None) – Constraint function applied to the gates’ matrices.
bias_constraint (Optional[str], default: None) – Constraint function applied to the bias vector.
**kwargs – Keyword arguments for keras.Layer.

Raises:

ValueError – If the units provided is not a positive integer.
ValueError – If the number of heads to use is not a positive integer.
ValueError – If the number of heads does not divide the units provided.

call(sequences, initial_state=None, mask=None, training=False)[source]¶

Calling method of the layer.

Parameters:

sequences (Float[ndarray, 'batch_size timesteps features']) – Inputs into the layer.
initial_state (Optional[List], default: None) – List of initial state tensors to be passed to the first call of the cell. If not provided, will cause creation of zero-filled initial state tensors.
mask (Optional[Any], default: None) – Binary tensor indicating whether a given timestep should be masked. An individual True entry indicates that the corresponding timestep should be utilized, while a False entry indicates that the corresponding timestep should be ignored.
training (bool, default: False) – Indicates whether the layer should behave in training mode or in inference mode. This argument is passed to the cell when calling it.

Returns:

Float[ndarray, 'batch_size timesteps'] – Transformed inputs.

classmethod from_config(config)[source]¶

Creates the layer from the given configuration.

Parameters:: config (Dict[str, Any]) – Configuration dictionary.
Returns:: GRUMML – Created instance.

get_config()[source]¶

Gets the configuration for the layer.

Returns:: Dict[str, Any] – Layer configuration.