When building a neural network, should you overfit or underfit it first?

quangngoc

When building a neural network, it is generally recommended to start with underfitting and then gradually increase the model's complexity until you find an appropriate level of fit. Here's a step-by-step approach:

Start with a simple model: Begin by designing a neural network architecture that is relatively simple. This means using fewer layers and neurons than you might ultimately need. A simple model serves as a baseline to understand the problem and establish a starting point for performance evaluation.
Train and evaluate: Train the simple model on your training data and evaluate its performance on a separate validation dataset. This will give you an initial assessment of how well the model is learning from the data. If the model underfits, which means it performs poorly on both the training and validation data, it's an indicator that your model is too simple to capture the underlying patterns.
Gradually increase complexity: To address underfitting, gradually increase the complexity of your model. You can do this by adding more layers, neurons, or using a more complex architecture. Experiment with different network architectures and hyperparameters to find the right balance.
Regularization: As you increase complexity, be mindful of overfitting. If your model starts to overfit the training data (performing well on training data but poorly on validation data), apply regularization techniques such as dropout, L1/L2 regularization, or early stopping to prevent excessive complexity.
Iterate: Repeat the process of training, evaluating, and adjusting the model's complexity until you achieve a satisfactory level of performance on both the training and validation datasets.
Test on unseen data: Once you have a model that performs well on both training and validation data, evaluate it on a separate test dataset to ensure that it generalizes effectively to new, unseen examples.

Starting with underfitting and gradually increasing complexity allows you to systematically explore the model's capacity to learn from the data without risking immediate overfitting. This approach helps you find an appropriate level of model complexity that strikes a balance between underfitting and overfitting for your specific task.