GLUMML¶
- class keras_mml.layers.activations.GLUMML[source]¶
General Gated Linear Unit (GLU) without matrix multiplications.
This is a modified implementation of
HGRNBitMLPfrom the GitHub repository of Scalable MatMul-free Language Modeling where, instead of just permitting the Swish activation, we permit other activations via theactivationattribute.See section 3.3.2 of the aforementioned paper for the notation used in the implementation of the code.
- units¶
Dimensionality of the output space.
Ratio adjusting the intermediate size.
- intermediate_size¶
Intermediate size. See the
__init__()method on how the intermediate size is determined.
- activation¶
GLU activation function.
- __init__(units, hidden_ratio=4, intermediate_size=None, activation='sigmoid', **kwargs)[source]¶
Initializes a new instance of the layer.
- Parameters:
units (
int) – Dimensionality of the output space.hidden_ratio (
int, default:4) – Ratio adjusting the intermediate size. Ignored if an intermediate size is specified.intermediate_size (
Optional[int], default:None) – Intermediate size. If None, will choose a multiple of 256 closest to \(\frac23 lr\) where \(l\) is the hidden shape given by the input into the layer and \(r\) is thehidden_ratio.activation (
str, default:'sigmoid') – GLU activation function.**kwargs – Keyword arguments for
keras.Layer.
- Raises:
ValueError – If the units provided is not a positive integer.
ValueError – If the activation function specified is not in the
PERMITTED_ACTIVATIONS.
- call(inputs)[source]¶
Calling method of the layer.
- Parameters:
inputs (
Float[ndarray, 'batch_size *dims last_dim']) – Inputs into the layer.- Returns:
Float[ndarray, 'batch_size *dims units']– Transformed inputs.