DenseMML

class keras_mml.layers.core.DenseMML[source]

Dense layer without matrix multiplications.

The core of the layer is the BitLinear layer described in The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits. It uses ternary quantization to reduce matrix multiplication operations to simple addition and subtraction.

This implementation differs from BitLinear by allowing an activation function to be specified. More precisely, DenseMML implements the operation

\[\mathbf{y} = \sigma\left(\mathbf{x}\mathbf{W}^\intercal + \mathbf{b}\right)\]

where \(\mathbf{x}\) is the quantized input vector, \(\mathbf{W}\) is the quantized weights matrix (i.e., the kernel matrix), \(\mathbf{b}\) is the bias vector, and \(\sigma\) is the element-wise activation function.

Important

See the pitfalls when using this layer.

Note

If the input to the layer (say \(\mathbf{x}\)) has a rank greater than 2, then this computes the dot product of \(\mathbf{x}\) and \(\mathbf{W}\) along the last axis of the \(\mathbf{x}\) and axis 0 of \(\mathbf{W}\).

For example, suppose \(\mathbf{x}\) has shape (batch_size, d0, d1). Then \(\mathbf{W}\) is created to have shape (d1, units) and it operates along axis 2 of \(\mathbf{x}\) on every sub-tensor of shape (1, 1, d1) (there are batch_size * d0 such sub-tensors). The output in this case will have shape (batch_size, d0, units).

Warning

Once a model that uses this layer is loaded from a file, it cannot be retrained.

units

Dimensionality of the output space.

use_bias

Whether the layer uses a bias vector.

kernel_initializer

Initializer for the kernel matrix.

bias_initializer

Initializer for the bias vector.

kernel_regularizer

Regularizer function applied to the kernel matrix.

bias_regularizer

Regularizer function applied to the bias vector.

kernel_constraint

Constraint function applied to the kernel matrix.

bias_constraint

Constraint function applied to the bias vector.

__init__(units, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, **kwargs)[source]

Initializes a new DenseMML layer.

Parameters:
  • units (int) – Dimensionality of the output space.

  • activation (Optional[str], default: None) – Activation function to use. If you don’t specify anything, no activation is applied (i.e. “linear” activation: \(\sigma(\mathbf{x}) = \mathbf{x}\)).

  • use_bias (bool, default: True) – Whether the layer uses a bias vector.

  • kernel_initializer (str, default: 'glorot_uniform') – Initializer for the kernel matrix.

  • bias_initializer (str, default: 'zeros') – Initializer for the bias vector.

  • kernel_regularizer (Optional[str], default: None) – Regularizer function applied to the kernel matrix.

  • bias_regularizer (Optional[str], default: None) – Regularizer function applied to the bias vector.

  • activity_regularizer (Optional[str], default: None) – Regularizer function applied to the output of the layer (i.e., its activation).

  • kernel_constraint (Optional[str], default: None) – Constraint function applied to the kernel matrix.

  • bias_constraint (Optional[str], default: None) – Constraint function applied to the bias vector.

  • **kwargs – Keyword arguments for keras.Layer.

Raises:

ValueError – If the units provided is not a positive integer.

build(input_shape)[source]

Create layer weights.

Parameters:

input_shape (Tuple[int, ...]) – Shape of the input.

call(inputs)[source]

Calling method of the layer.

Parameters:

inputs (Float[ndarray, 'batch_size *dims last_dim']) – Inputs into the layer.

Returns:

Float[ndarray, 'batch_size *dims units'] – Transformed inputs.

compute_output_shape(input_shape)[source]

Computes the output shape given a tensor of a given shape.

Parameters:

input_shape (Tuple[int, ...]) – Input shape into the layer.

Returns:

Tuple[int, ...] – Output shape after passing through the layer.

load_own_variables(store)[source]

Loads the state of the layer.

Parameters:

store (Dict) – Dictionary from which the state of the model will be loaded.

Raises:

ValueError – If the layer is missing variables when loading from a file.

save_own_variables(store)[source]

Saves the state of the layer.

Parameters:

store (Dict) – Dictionary where the state of the model will be saved.