TransformerBlockMML

class keras_mml.layers.transformer.TransformerBlockMML[source]

Transformer block layer that is mostly without matrix multiplications.

The core flow of the transformer block follows the Attention Is All You Need paper, while referencing the Keras example Text classification with Transformer for its high-level implementation. However, we use the custom AttentionMML class for the attention mechanism and SwiGLUMML for the feed-forward network (FFN) part.

embedding_dim

Dimension of the embeddings.

ffn_dim

Dimension of the intermediate (i.e., hidden) layer of the feed-forward network.

num_heads

Number of heads to use for multi-headed attention.

fully_mml

Whether to use full matmul-less layers in the attention mechanism.

rate

Dropout rate to apply for the attention mechanism and the feed-forward network.

__init__(embedding_dim, ffn_dim, num_heads, fully_mml=True, rate=0.1, **kwargs)[source]

Initializes a new instance of the layer.

Parameters:
  • embedding_dim (int) – Dimension of the embeddings.

  • ffn_dim (int) – Dimension of the intermediate (i.e., hidden) layer of the feed-forward network.

  • num_heads (int) – Number of heads to use for multi-headed attention.

  • fully_mml (bool, default: True) – Whether to use full matmul-less layers in the attention mechanism.

  • rate (float, default: 0.1) – Dropout rate to apply for the attention mechanism and the feed-forward network.

  • **kwargs – Keyword arguments for keras.Layer.

Raises:
  • ValueError – If the embedding dimension is not a positive integer.

  • ValueError – If the dimension of the intermediate layer of the feed-forward network is not a positive integer.

  • ValueError – If the number of heads is not a positive integer.

  • ValueError – If the embedding dimension is not divisible by the number of heads.

build(input_shape)[source]

Build the layer.

Parameters:

input_shape (Tuple[int, int, int]) – Shape of the input.

call(inputs)[source]

Calling method of the layer.

Parameters:

inputs (Float[ndarray, 'batch_size sequence_length features']) – Inputs into the layer.

Returns:

Float[ndarray, 'batch_size sequence_length embedding_dim'] – Transformed inputs.

compute_output_shape(input_shape)[source]

Computes the output shape of the layer.

Parameters:

input_shape (Tuple[int, int, int]) – Shape of the input into the layer.

Returns:

Tuple[int, int, int] – Shape of the output.