AttentionMML¶
- class keras_mml.layers.transformer.AttentionMML[source]¶
Multi-headed attention layer that is mostly without matrix multiplications.
Unlike the Keras implementation, this is not an implementation of multi-headed attention in the Attention Is All You Need paper. Rather, this layer follows the description of the token-mixer in Scalable MatMul-free Language Modeling (see section 3.3.1), where we use
GRUMMLas the attention mechanism.- num_heads¶
Number of attention heads.
- out_dim¶
Output dimension.
- fully_mml¶
Whether to use full matmul-less layers in the attention mechanism.
- __init__(num_heads, out_dim, fully_mml=True, **kwargs)[source]¶
Initializes a new instance of the layer.
- Parameters:
num_heads (
int) – Number of attention heads.out_dim (
int) – Output dimension.fully_mml (
bool, default:True) – Whether to use full matmul-less layers in the attention mechanism.**kwargs – Keyword arguments for
keras.Layer.
- Raises:
ValueError – If the number of heads is not a positive integer.
ValueError – If the output dimension is not a positive integer.