Tutorial¶

Jun 17, 2024

3 min read

Welcome to Keras-MML! This notebook will introduce you to the basics of working with Keras-MML.

Keras-MML mainly provides layers that replace in-built Keras layers with those that do not use matrix multiplications. For this notebook, we will focus on a matrix multiplication free implementation of a Dense layer, appropriately called DenseMML.

We will demonstrate its use in predicting handwritten digits from the MNIST dataset using a very simple multi-layer perceptron (MLP).

First, let’s prepare the imports.

import keras
import numpy as np

2024-06-22 06:28:23.911659: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-22 06:28:23.911953: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-06-22 06:28:23.913992: I external/local_tsl/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-06-22 06:28:23.938652: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-22 06:28:24.515606: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

Define constants relating to the data. In particular, we know that there are 10 distinct digits in the dataset, and that each entry is a \(28 \times 28\) greyscale image. This means that the input shape into the model is (28, 28).

NUM_CLASSES = 10
INPUT_SHAPE = (28, 28)

Let’s now load the data. Keras provides the MNIST dataset already, so we just need to load it in using the load_data() function for the mnist dataset.

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

We do some simple preprocessing. We normalize each pixel’s value to be in the interval \([0, 1]\) so that the model can learn better.

x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255

Finally, we convert the class vectors into binary class matrices.

y_train = keras.utils.to_categorical(y_train, NUM_CLASSES)
y_test = keras.utils.to_categorical(y_test, NUM_CLASSES)

Now we are ready to define the prediction model. Of course, we first have to import keras_mml before we can do anything.

import keras_mml

We are now ready to define the Sequential model. Notice that we swap out Dense layers with DenseMML layers. However, we need to leave the last layer alone in order for the model outputs to work correctly. This is because DenseMML uses quantization internally, which means that the outputs of the model have been treated in such a way that they are forced to not use matrix multiplications. This is fine and good for the most part, but for outputs of our model, we require the highest precision. So we are stuck with using the standard Dense layer.

model = keras.Sequential(
    [
        keras.Input(shape=INPUT_SHAPE),
        keras.layers.Flatten(),
        keras_mml.layers.DenseMML(256),
        keras_mml.layers.DenseMML(256),
        keras_mml.layers.DenseMML(256),
        keras.layers.Dense(NUM_CLASSES, activation="softmax"),
    ],
    name="MNIST-Classifier"
)

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

model.summary()

Model: "MNIST-Classifier"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ flatten (Flatten)               │ (None, 784)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_mml (DenseMML)            │ (None, 256)            │       200,960 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_mml_1 (DenseMML)          │ (None, 256)            │        65,792 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_mml_2 (DenseMML)          │ (None, 256)            │        65,792 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 10)             │         2,570 │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 335,114 (1.28 MB)

 Trainable params: 335,114 (1.28 MB)

 Non-trainable params: 0 (0.00 B)

We can now train the model.

model.fit(x_train, y_train, batch_size=128, epochs=20, validation_split=0.1)

Epoch 1/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.5650 - loss: 1.6223 - val_accuracy: 0.9083 - val_loss: 0.4047
Epoch 2/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.8851 - loss: 0.4254 - val_accuracy: 0.9168 - val_loss: 0.2996
Epoch 3/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9111 - loss: 0.3166 - val_accuracy: 0.9230 - val_loss: 0.2549
Epoch 4/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9167 - loss: 0.2852 - val_accuracy: 0.9420 - val_loss: 0.2092
Epoch 5/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.9247 - loss: 0.2537 - val_accuracy: 0.9387 - val_loss: 0.2084
Epoch 6/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9306 - loss: 0.2377 - val_accuracy: 0.9443 - val_loss: 0.1953
Epoch 7/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9325 - loss: 0.2285 - val_accuracy: 0.9442 - val_loss: 0.1904
Epoch 8/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9361 - loss: 0.2158 - val_accuracy: 0.9485 - val_loss: 0.1853
Epoch 9/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9384 - loss: 0.2026 - val_accuracy: 0.9477 - val_loss: 0.1905
Epoch 10/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9402 - loss: 0.1988 - val_accuracy: 0.9457 - val_loss: 0.1871
Epoch 11/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9411 - loss: 0.1926 - val_accuracy: 0.9422 - val_loss: 0.1989
Epoch 12/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9408 - loss: 0.1974 - val_accuracy: 0.9547 - val_loss: 0.1624
Epoch 13/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9466 - loss: 0.1764 - val_accuracy: 0.9455 - val_loss: 0.1786
Epoch 14/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9459 - loss: 0.1804 - val_accuracy: 0.9463 - val_loss: 0.1803
Epoch 15/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.9457 - loss: 0.1785 - val_accuracy: 0.9548 - val_loss: 0.1703
Epoch 16/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9469 - loss: 0.1734 - val_accuracy: 0.9445 - val_loss: 0.1770
Epoch 17/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 2s 6ms/step - accuracy: 0.9494 - loss: 0.1646 - val_accuracy: 0.9505 - val_loss: 0.1689
Epoch 18/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9494 - loss: 0.1642 - val_accuracy: 0.9517 - val_loss: 0.1614
Epoch 19/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9510 - loss: 0.1578 - val_accuracy: 0.9548 - val_loss: 0.1545
Epoch 20/20
422/422 ━━━━━━━━━━━━━━━━━━━━ 3s 6ms/step - accuracy: 0.9546 - loss: 0.1489 - val_accuracy: 0.9507 - val_loss: 0.1685

<keras.src.callbacks.history.History at 0x7ff2529a1ae0>

Once the model is trained, let’s evaluate it.

score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

Test loss: 0.19867141544818878
Test accuracy: 0.9429000020027161

Congratulations! You have seen how to use Keras-MML in your Keras models!