GLUMML

class keras_mml.layers.activations.GLUMML[source]

General Gated Linear Unit (GLU) without matrix multiplications.

This is a modified implementation of HGRNBitMLP from the GitHub repository of Scalable MatMul-free Language Modeling where, instead of just permitting the Swish activation, we permit other activations via the activation attribute.

See section 3.3.2 of the aforementioned paper for the notation used in the implementation of the code.

units

Dimensionality of the output space.

hidden_ratio

Ratio adjusting the intermediate size.

intermediate_size

Intermediate size. See the __init__() method on how the intermediate size is determined.

activation

GLU activation function.

__init__(units, hidden_ratio=4, intermediate_size=None, activation='sigmoid', **kwargs)[source]

Initializes a new instance of the layer.

Parameters:
  • units (int) – Dimensionality of the output space.

  • hidden_ratio (int, default: 4) – Ratio adjusting the intermediate size. Ignored if an intermediate size is specified.

  • intermediate_size (Optional[int], default: None) – Intermediate size. If None, will choose a multiple of 256 closest to \(\frac23 lr\) where \(l\) is the hidden shape given by the input into the layer and \(r\) is the hidden_ratio.

  • activation (str, default: 'sigmoid') – GLU activation function.

  • **kwargs – Keyword arguments for keras.Layer.

Raises:
  • ValueError – If the units provided is not a positive integer.

  • ValueError – If the activation function specified is not in the PERMITTED_ACTIVATIONS.

build(input_shape)[source]

Create layer weights.

Parameters:

input_shape (Tuple[int, ...]) – Shape of the input.

call(inputs)[source]

Calling method of the layer.

Parameters:

inputs (Float[ndarray, 'batch_size *dims last_dim']) – Inputs into the layer.

Returns:

Float[ndarray, 'batch_size *dims units'] – Transformed inputs.

compute_output_shape(input_shape)[source]

Computes the output shape of the layer.

Parameters:

input_shape (Tuple[int, ...]) – Shape of the input into the layer.

Returns:

Tuple[int, ...] – Shape of the output.

get_config()[source]

Gets the configuration for the layer.

Returns:

Dict[str, Any] – Layer configuration.