Hyperparameter Tuning With KerasTuner¶

Jul 5, 2024

7 min read

In this example, we will explore the use of KerasTuner to tune models that use layers from Keras-MML.

Important

You will need to install the KerasTuner package for this example.

%pip install keras-tuner~=1.4.7

Requirement already satisfied: keras-tuner~=1.4.7 in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (1.4.7)
Requirement already satisfied: keras in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from keras-tuner~=1.4.7) (3.3.3)
Requirement already satisfied: packaging in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from keras-tuner~=1.4.7) (24.1)
Requirement already satisfied: requests in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from keras-tuner~=1.4.7) (2.32.3)
Requirement already satisfied: kt-legacy in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from keras-tuner~=1.4.7) (1.0.5)
Requirement already satisfied: absl-py in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from keras->keras-tuner~=1.4.7) (2.1.0)
Requirement already satisfied: numpy in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from keras->keras-tuner~=1.4.7) (1.26.4)
Requirement already satisfied: rich in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from keras->keras-tuner~=1.4.7) (13.7.1)
Requirement already satisfied: namex in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from keras->keras-tuner~=1.4.7) (0.0.8)
Requirement already satisfied: h5py in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from keras->keras-tuner~=1.4.7) (3.11.0)
Requirement already satisfied: optree in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from keras->keras-tuner~=1.4.7) (0.11.0)
Requirement already satisfied: ml-dtypes in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from keras->keras-tuner~=1.4.7) (0.3.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from requests->keras-tuner~=1.4.7) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from requests->keras-tuner~=1.4.7) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from requests->keras-tuner~=1.4.7) (2.2.2)
Requirement already satisfied: certifi>=2017.4.17 in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from requests->keras-tuner~=1.4.7) (2024.6.2)
Requirement already satisfied: typing-extensions>=4.0.0 in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from optree->keras->keras-tuner~=1.4.7) (4.12.2)
Requirement already satisfied: markdown-it-py>=2.2.0 in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from rich->keras->keras-tuner~=1.4.7) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from rich->keras->keras-tuner~=1.4.7) (2.18.0)
Requirement already satisfied: mdurl~=0.1 in /home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->keras->keras-tuner~=1.4.7) (0.1.2)
Note: you may need to restart the kernel to use updated packages.

Note

We will use the jax backend for faster execution of the code. Feel free to ignore the cell below.

import os
os.environ["KERAS_BACKEND"] = "jax"

We will perform hyperparameter tuning on a simple multi-layer perceptron (MLP) that aims to classify handwritten digits in the MNIST dataset.

Of course, other neural network architectures such as convolutional neural networks (CNNs) are better suited for this task, but for this example we will stick with MLPs.

Setup¶

First, let’s define some constants relating to the data.

NUM_CLASSES = 10        # 10 distinct classes, 0 to 9
INPUT_SHAPE = (28, 28)  # 28 x 28 greyscale images

Load the data from the MNIST dataset, which is already available in Keras.

import keras

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

Now we perform some simple preprocessing.

x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255

y_train = keras.utils.to_categorical(y_train, NUM_CLASSES)
y_test = keras.utils.to_categorical(y_test, NUM_CLASSES)

We will further split the x_train and y_train into a training and validation set.

x_val = x_train[-10000:]
x_train = x_train[:-10000]

y_val = y_train[-10000:]
y_train = y_train[:-10000]

Defining (Our Initial) Tuneable Model¶

To allow KerasTuner to search for the best set of hyperparameters, we need to write a function that takes in the hyperparameters and returns a compiled Keras model. The convention for such a function is to accept an argument hp for the hyperparameters when building the model.

Defining the Search Space¶

In the following example, we will define a simple MLP with two DenseMML layers and a Dense layer (which acts as the classification head). Suppose we want to tune the number of units in the first DenseMML layer. To do so, we define an integer hyperparameter with hp.Int("units", min_value=32, max_value=512, step=32). This means that the hyperparameter

is named units;
can have a minimum value of 32;
can have a maximum value of 512; and
can take values in intervals of 32.

import keras_tuner
import keras_mml


def build_model(hp: keras_tuner.HyperParameters):
    model = keras.Sequential(
        [
            keras.Input(shape=INPUT_SHAPE),
            keras.layers.Flatten(),
            keras_mml.layers.DenseMML(hp.Int("units", min_value=32, max_value=512, step=32)),
            keras_mml.layers.DenseMML(256),
            keras.layers.Dense(NUM_CLASSES, activation="softmax"),  # The last layer needs to be `Dense` for the output to work
        ]
    )
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
    return model

We can quickly check that the model indeed builds successfully.

build_model(keras_tuner.HyperParameters())

An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.

<Sequential name=sequential, built=True>

Starting the Search¶

After defining the search space, we need to select a tuner class to run the search. Here we use RandomSearch as an example.

We need to specify several arguments to initialize the RandomSearch tuner.

hypermodel: The model-building function, which is build_model in this example.
objective: The name of the objective to optimize.
- Note that the decision whether to minimize or maximize the objective is automatically inferred for built-in metrics (e.g., loss, acc).
max_trials: The total number of trials to run during the search.
executions_per_trial: The number of models that should be built and fit for each trial.
overwrite: Control whether to overwrite the previous results in the same directory (True) or resume the previous search instead (False).
directory: A path to a directory for storing the search results.
project_name: The name of the subdirectory in the directory.

What is a “trial”?

In order to search for the best hyperparameter values, the tuners run multiple trials where each trial will use a different hyperparameter value. Executions within the same trial have the same hyperparameter values. The reason why we want to run multiple executions per trial is to reduce variance during model training. If you want to get results faster, you could set executions_per_trial = 1.

tuner = keras_tuner.RandomSearch(
    hypermodel=build_model,
    objective="val_accuracy",
    max_trials=3,
    executions_per_trial=2,
    overwrite=True,
    directory="misc/hyperparameter_tuning_example",
    project_name="my_tunable_model_1",
)

Once we defined the tuner, we can print out a summary of the search space.

tuner.search_space_summary()

Search space summary
Default search space size: 1
units (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 512, 'step': 32, 'sampling': 'linear'}

We can now start the search for the best hyperparameter configuration. All the arguments passed to search is passed to model.fit() in each execution.

Important

Remember to pass validation_data to evaluate the model!

tuner.search(x_train, y_train, epochs=2, validation_data=(x_val, y_val))

Trial 3 Complete [00h 00m 28s]
val_accuracy: 0.9167500138282776

Best val_accuracy So Far: 0.9240500032901764
Total elapsed time: 00h 01m 07s

Querying the Results¶

We can now retrieve the best models from the search.

models = tuner.get_best_models(num_models=2)  # Gets the top 2 models
best_model = models[0]
best_model.summary()

/home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages/keras/src/saving/saving_lib.py:415: UserWarning: Skipping variable loading for optimizer 'adam', because it has 2 variables whereas the saved optimizer has 18 variables. 
  saveable.load_own_variables(weights_store.get(inner_path))

Model: "sequential"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ flatten (Flatten)               │ (None, 784)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_mml (DenseMML)            │ (None, 160)            │       126,384 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_mml_1 (DenseMML)          │ (None, 256)            │        41,376 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 10)             │         2,570 │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 170,330 (665.35 KB)

 Trainable params: 170,330 (665.35 KB)

 Non-trainable params: 0 (0.00 B)

We can also get a summary of the search results.

tuner.results_summary()

Results summary
Results in misc/hyperparameter_tuning_example/my_tunable_model_1
Showing 10 best trials
Objective(name="val_accuracy", direction="max")

Trial 1 summary
Hyperparameters:
units: 160
Score: 0.9240500032901764

Trial 0 summary
Hyperparameters:
units: 96
Score: 0.9227499961853027

Trial 2 summary
Hyperparameters:
units: 352
Score: 0.9167500138282776

Retraining the Model¶

If you want to train the model with the entire dataset, you may retrieve the best hyperparameters and retrain the model by yourself.

# Get the top 2 hyperparameters
best_hps = tuner.get_best_hyperparameters(2)

# Build the model with the best hyperparameters
model = build_model(best_hps[0])

Combine training and validation into one big training dataset.

import numpy as np

x_all = np.concatenate((x_train, x_val))
y_all = np.concatenate((y_train, y_val))

Now fit the model on that set.

model.fit(x=x_all, y=y_all, epochs=2)

Epoch 1/2
1875/1875 ━━━━━━━━━━━━━━━━━━━━ 4s 2ms/step - accuracy: 0.8602 - loss: 0.4470
Epoch 2/2
1875/1875 ━━━━━━━━━━━━━━━━━━━━ 4s 2ms/step - accuracy: 0.9142 - loss: 0.2894

<keras.src.callbacks.history.History at 0x7f53a811b4f0>

A More Complicated Tunable Model¶

Now that we’ve seen an introduction of how KerasTuner works, let’s make a more complex model.

In our new model, we make the tuner

determine the number of hidden layers to use via the num_layers hyperparameter;
determine the number of units for each hidden layer via each individual units_{i} hyperparameter;
determine the common activation for all hidden layers via the activation hyperparameter; and
decide whether to include 25% dropout using the dropout parameter.

def build_model_new(hp: keras_tuner.HyperParameters):
    model = keras.Sequential()
    
    # These layers are the same as the previous model
    model.add(keras.Input(shape=INPUT_SHAPE))
    model.add(keras.layers.Flatten())
    
    # Tune the number of layers
    for i in range(hp.Int("num_layers", 1, 3)):  # 1 to 3 hidden layers
        model.add(
            keras_mml.layers.DenseMML(
                units=hp.Int(f"units_{i}", min_value=32, max_value=512, step=32),
                activation=hp.Choice("activation", ["relu", "tanh", "linear"])
            )
        )
    
    # Add dropout, if specified by the hyperparameters
    if hp.Boolean("dropout"):
        model.add(keras.layers.Dropout(rate=0.25))
    
    # Classification head
    model.add(keras.layers.Dense(NUM_CLASSES, activation="softmax"))  # The last layer needs to be `Dense` for the output to work
    
    # Compile and return the model
    model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
    return model

Again, we will use RandomTuner to find the best hyperparameters. However we will increase the number of trials to run to 5.

tuner = keras_tuner.RandomSearch(
    hypermodel=build_model_new,
    objective="val_accuracy",
    max_trials=5,
    executions_per_trial=2,
    overwrite=True,
    directory="misc/hyperparameter_tuning_example",
    project_name="my_tunable_model_2",
)

Let’s look at the search space now.

tuner.search_space_summary()

Search space summary
Default search space size: 4
num_layers (Int)
{'default': None, 'conditions': [], 'min_value': 1, 'max_value': 3, 'step': 1, 'sampling': 'linear'}
units_0 (Int)
{'default': None, 'conditions': [], 'min_value': 32, 'max_value': 512, 'step': 32, 'sampling': 'linear'}
activation (Choice)
{'default': 'relu', 'conditions': [], 'values': ['relu', 'tanh', 'linear'], 'ordered': False}
dropout (Boolean)
{'default': False, 'conditions': []}

Start the search.

tuner.search(x_train, y_train, epochs=2, validation_data=(x_val, y_val))

Trial 5 Complete [00h 00m 33s]
val_accuracy: 0.9699999988079071

Best val_accuracy So Far: 0.9699999988079071
Total elapsed time: 00h 02m 13s

Get the best model…

models = tuner.get_best_models(num_models=1)  # Even when `num_models` is 1, `models` returns a list...
best_model = models[0]                        # ...so we still have to do this
best_model.summary()

/home/vscode/.cache/pypoetry/virtualenvs/keras-matmulless-b9IALFmu-py3.10/lib/python3.10/site-packages/keras/src/saving/saving_lib.py:415: UserWarning: Skipping variable loading for optimizer 'adam', because it has 2 variables whereas the saved optimizer has 18 variables. 
  saveable.load_own_variables(weights_store.get(inner_path))

Model: "sequential"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ flatten (Flatten)               │ (None, 784)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_mml (DenseMML)            │ (None, 448)            │       352,464 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_mml_1 (DenseMML)          │ (None, 128)            │        57,920 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout (Dropout)               │ (None, 128)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 10)             │         1,290 │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 411,674 (1.57 MB)

 Trainable params: 411,674 (1.57 MB)

 Non-trainable params: 0 (0.00 B)

…and a summary of the results.

tuner.results_summary()

Results summary
Results in misc/hyperparameter_tuning_example/my_tunable_model_2
Showing 10 best trials
Objective(name="val_accuracy", direction="max")

Trial 4 summary
Hyperparameters:
num_layers: 2
units_0: 448
activation: relu
dropout: True
units_1: 128
Score: 0.9699999988079071

Trial 3 summary
Hyperparameters:
num_layers: 2
units_0: 384
activation: relu
dropout: True
units_1: 352
Score: 0.9695000052452087

Trial 2 summary
Hyperparameters:
num_layers: 2
units_0: 224
activation: tanh
dropout: False
units_1: 448
Score: 0.9628500044345856

Trial 0 summary
Hyperparameters:
num_layers: 2
units_0: 256
activation: relu
dropout: False
units_1: 32
Score: 0.9627000093460083

Trial 1 summary
Hyperparameters:
num_layers: 1
units_0: 64
activation: tanh
dropout: True
units_1: 224
Score: 0.9449999928474426

Conclusion¶

In this code example, we showed how KerasTuner can be used with Keras-MML layers for hyperparameter tuning.