Hyperparameter Tuning in Deep Learning

Across this series we have built networks, chosen activations, trained them with backpropagation, picked optimizers, explored architectures, and reused pretrained models. Through all of it, a handful of decisions kept coming up that the model does not make for you: how fast it should learn, how big it should be, how hard you should push back against overfitting. These are the hyperparameters, and getting them roughly right is usually what separates a model that works from one that doesn't.

Parameters Versus Hyperparameters

It helps to be clear on the distinction. Parameters are the weights and biases inside the network, and the whole point of training is that the model learns these on its own. Hyperparameters are the settings you choose before and around training, the ones the learning process never touches. The learning rate, the number of layers, the number of neurons per layer, the batch size, how much dropout to apply, and how many epochs to train for are all hyperparameters. The model cannot discover them by gradient descent, so the job falls to you.

The One That Matters Most

If you only tune one thing, tune the learning rate. As we saw in the chapters on backpropagation and optimizers, a rate that is too high makes training diverge and a rate that is too low makes it crawl or stall. A good starting point with Adam is 0.001, and a useful habit is to try a few values spaced apart, like 0.01, 0.001, and 0.0001, and watch how the loss behaves in the first few epochs before committing.

Overfitting, Underfitting, and Why You Need a Validation Set

Most hyperparameter decisions come down to balancing two failure modes. Underfitting is when the model is too simple or undertrained to capture the patterns in the data, so it performs poorly even on the examples it was trained on. Overfitting is the opposite: the model memorises the training data, including its noise, and then fails on anything new.

You cannot see either problem if you only look at training accuracy, which is why you always hold out a separate validation set that the model never trains on. When training accuracy keeps climbing while validation accuracy stalls or drops, you are overfitting. When both are low, you are underfitting. Watching the gap between the two is the single most informative thing you can do while tuning.

The Knobs and What They Do

Model size, meaning the number of layers and neurons, controls capacity. Too small underfits; too large overfits and trains slowly. Start modest and grow only if the model is underfitting.
Batch size affects training speed and stability. Common values are 32, 64, and 128. Larger batches train faster per epoch but can generalise slightly worse.
Regularization fights overfitting. Dropout randomly switches off a fraction of neurons during training so the network can't lean too hard on any one path. Weight decay gently shrinks weights. Both are worth adding once you see a train-validation gap.
Epochs decide how long you train. Too few underfits, too many overfits, and the clean fix is early stopping, which halts training automatically when validation performance stops improving.

How to Search

You do not need an elaborate method to start. Tuning a few values by hand, guided by the train-validation gap, gets you a long way. When you want to be more systematic, two automatic approaches are common. Grid search tries every combination from lists you specify, which is thorough but expensive. Random search samples combinations at random and, perhaps surprisingly, usually finds good settings faster than grid search for the same budget, because it explores the important hyperparameters more freely. Beyond those, tools that use smarter strategies can automate the whole process, but they are worth reaching for only after the simpler approaches stop being enough.

A Practical Workflow

A reliable order of operations looks like this. Get a baseline model training at all, even a mediocre one, so you have something to improve. Tune the learning rate first, since it has the largest effect. If the model is underfitting, increase its capacity or train longer. If it is overfitting, add dropout or weight decay and lean on early stopping. Change one thing at a time so you can tell what actually helped, and let the validation set be your judge throughout.

Useful Tools in Keras

Two callbacks handle a lot of this automatically:

from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

callbacks = [
    # stop when validation loss stops improving, keep the best weights
    EarlyStopping(monitor='val_loss', patience=5,
                  restore_best_weights=True),
    # drop the learning rate when progress plateaus
    ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3),
]

model.fit(X_train, y_train,
          validation_data=(X_val, y_val),
          epochs=100,
          callbacks=callbacks)

Between early stopping and an automatic learning rate reduction, you can set a generous epoch count and let training find its own sensible stopping point.

Key Takeaways

Parameters are learned during training; hyperparameters are the choices you make around it.
The learning rate is the most important hyperparameter to get right.
A held-out validation set is what reveals overfitting and underfitting.
Model size, batch size, regularization, and epochs are the main knobs.
Tune by hand first, reach for random search when you need to be systematic, and change one thing at a time.

Where to Go From Here

That completes the journey from a single neuron to training, tuning, and adapting modern deep learning models. From here the path splits by interest. If images are your focus, the ideas here lead straight into computer vision, which builds on the CNN material. If language is yours, they lead into Natural Language Processing, picking up where the Transformer chapter left off. And when you are ready to build real applications on top of large language models, the RAG field manual takes the embeddings and Transformer ideas from this series straight into production systems.

Discussion

Transfer Learning and Fine-Tuning Models