TransformerBlockMML¶
- class keras_mml.layers.transformer.TransformerBlockMML[source]¶
Transformer block layer that is mostly without matrix multiplications.
The core flow of the transformer block follows the Attention Is All You Need paper, while referencing the Keras example Text classification with Transformer for its high-level implementation. However, we use the custom
AttentionMMLclass for the attention mechanism andSwiGLUMMLfor the feed-forward network (FFN) part.- embedding_dim¶
Dimension of the embeddings.
- ffn_dim¶
Dimension of the intermediate (i.e., hidden) layer of the feed-forward network.
- num_heads¶
Number of heads to use for multi-headed attention.
- fully_mml¶
Whether to use full matmul-less layers in the attention mechanism.
- rate¶
Dropout rate to apply for the attention mechanism and the feed-forward network.
- __init__(embedding_dim, ffn_dim, num_heads, fully_mml=True, rate=0.1, **kwargs)[source]¶
Initializes a new instance of the layer.
- Parameters:
embedding_dim (
int) – Dimension of the embeddings.ffn_dim (
int) – Dimension of the intermediate (i.e., hidden) layer of the feed-forward network.num_heads (
int) – Number of heads to use for multi-headed attention.fully_mml (
bool, default:True) – Whether to use full matmul-less layers in the attention mechanism.rate (
float, default:0.1) – Dropout rate to apply for the attention mechanism and the feed-forward network.**kwargs – Keyword arguments for
keras.Layer.
- Raises:
ValueError – If the embedding dimension is not a positive integer.
ValueError – If the dimension of the intermediate layer of the feed-forward network is not a positive integer.
ValueError – If the number of heads is not a positive integer.
ValueError – If the embedding dimension is not divisible by the number of heads.