DenseMML¶
- class keras_mml.layers.core.DenseMML[source]¶
Dense layer without matrix multiplications.
The core of the layer is the
BitLinearlayer described in The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits. It uses ternary quantization to reduce matrix multiplication operations to simple addition and subtraction.This implementation differs from
BitLinearby allowing an activation function to be specified. More precisely,DenseMMLimplements the operation\[\mathbf{y} = \sigma\left(\mathbf{x}\mathbf{W}^\intercal + \mathbf{b}\right)\]where \(\mathbf{x}\) is the quantized input vector, \(\mathbf{W}\) is the quantized weights matrix (i.e., the kernel matrix), \(\mathbf{b}\) is the bias vector, and \(\sigma\) is the element-wise activation function.
Important
See the pitfalls when using this layer.
Note
If the input to the layer (say \(\mathbf{x}\)) has a rank greater than 2, then this computes the dot product of \(\mathbf{x}\) and \(\mathbf{W}\) along the last axis of the \(\mathbf{x}\) and axis
0of \(\mathbf{W}\).For example, suppose \(\mathbf{x}\) has shape
(batch_size, d0, d1). Then \(\mathbf{W}\) is created to have shape(d1, units)and it operates along axis2of \(\mathbf{x}\) on every sub-tensor of shape(1, 1, d1)(there arebatch_size * d0such sub-tensors). The output in this case will have shape(batch_size, d0, units).Warning
Once a model that uses this layer is loaded from a file, it cannot be retrained.
- units¶
Dimensionality of the output space.
- use_bias¶
Whether the layer uses a bias vector.
- kernel_initializer¶
Initializer for the kernel matrix.
- bias_initializer¶
Initializer for the bias vector.
- kernel_regularizer¶
Regularizer function applied to the kernel matrix.
- bias_regularizer¶
Regularizer function applied to the bias vector.
- kernel_constraint¶
Constraint function applied to the kernel matrix.
- bias_constraint¶
Constraint function applied to the bias vector.
- __init__(units, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, **kwargs)[source]¶
Initializes a new
DenseMMLlayer.- Parameters:
units (
int) – Dimensionality of the output space.activation (
Optional[str], default:None) – Activation function to use. If you don’t specify anything, no activation is applied (i.e. “linear” activation: \(\sigma(\mathbf{x}) = \mathbf{x}\)).use_bias (
bool, default:True) – Whether the layer uses a bias vector.kernel_initializer (
str, default:'glorot_uniform') – Initializer for the kernel matrix.bias_initializer (
str, default:'zeros') – Initializer for the bias vector.kernel_regularizer (
Optional[str], default:None) – Regularizer function applied to the kernel matrix.bias_regularizer (
Optional[str], default:None) – Regularizer function applied to the bias vector.activity_regularizer (
Optional[str], default:None) – Regularizer function applied to the output of the layer (i.e., its activation).kernel_constraint (
Optional[str], default:None) – Constraint function applied to the kernel matrix.bias_constraint (
Optional[str], default:None) – Constraint function applied to the bias vector.**kwargs – Keyword arguments for
keras.Layer.
- Raises:
ValueError – If the units provided is not a positive integer.
- call(inputs)[source]¶
Calling method of the layer.
- Parameters:
inputs (
Float[ndarray, 'batch_size *dims last_dim']) – Inputs into the layer.- Returns:
Float[ndarray, 'batch_size *dims units']– Transformed inputs.
- compute_output_shape(input_shape)[source]¶
Computes the output shape given a tensor of a given shape.
- load_own_variables(store)[source]¶
Loads the state of the layer.
- Parameters:
store (
Dict) – Dictionary from which the state of the model will be loaded.- Raises:
ValueError – If the layer is missing variables when loading from a file.