Underfitting

SUMMARY

Underfitting occurs when a statistical model or machine learning algorithm is too simple to capture the underlying patterns in the data. This results in poor performance on both training and test datasets, indicating the model lacks sufficient complexity to represent the true relationships between variables.

Understanding underfitting

Underfitting represents one side of the fundamental bias-variance tradeoff in statistical learning. When a model underfits, it exhibits high bias - meaning it makes strong assumptions about the data that may not be valid. This typically occurs when:

The model is too simple for the complexity of the underlying data
Important features or variables are omitted
The chosen model structure cannot represent the true relationship

Mathematical representation

Consider a true relationship $f(x)$ and a model $\hat{f}(x)$ . Underfitting occurs when:

$\text{Complexity}(\hat{f}) \ll \text{Complexity}(f)$

This results in a large approximation error:

$\mathbb{E}[(f(x) - \hat{f}(x))^2] \gg \sigma^2$

Where $\sigma^2$ represents irreducible error.

Next generation time-series database

QuestDB is an open-source time-series database optimized for market and heavy industry data. Built from scratch in Java and C++, it offers high-throughput ingestion and fast SQL queries with time-series extensions.

Try live demo Read documentation

Detecting underfitting

Common indicators of underfitting include:

High training error
High validation error
Similar error metrics across training and validation sets
Poor performance on even simple examples
Residuals showing clear patterns

Addressing underfitting in financial models

In financial applications, underfitting can have serious consequences for risk management and trading strategies. Common solutions include:

Increasing model complexity
- Adding relevant features
- Using more sophisticated model architectures
- Incorporating non-linear relationships
Feature engineering
- Creating interaction terms
- Polynomial features
- Domain-specific transformations
Model selection
- Testing more complex model classes
- Using ensemble methods
- Incorporating domain knowledge

Next generation time-series database

Try live demo Read documentation

Contrast with overfitting

While overfitting occurs when models are too complex and capture noise, underfitting represents the opposite extreme. Finding the right balance is crucial for model performance:

Aspect	Underfitting	Optimal Fit	Overfitting
Complexity	Too low	Appropriate	Too high
Bias	High	Balanced	Low
Variance	Low	Balanced	High
Training Error	High	Medium	Low
Test Error	High	Low	High

Applications in time series analysis

In time series modeling, underfitting often manifests when:

Using linear models for non-linear trends
Insufficient lags in autoregressive models
Overlooking seasonal patterns
Ignoring relevant external factors

For example, in ARIMA models, underfitting can occur when the order parameters (p,d,q) are too low to capture the data's temporal dynamics.

Best practices to avoid underfitting

Systematic model selection
- Cross-validation
- Information criteria (AIC, BIC)
- Learning curves analysis
Feature analysis
- Correlation studies
- Feature importance rankings
- Domain expert consultation
Iterative refinement
- Start simple and gradually increase complexity
- Monitor both training and validation metrics
- Document model improvements
Regular model evaluation
- Out-of-sample testing
- Backtesting on historical data
- Performance monitoring in production

Conclusion

Underfitting represents a fundamental challenge in statistical modeling and machine learning. Understanding its causes and remedies is crucial for developing effective models, particularly in financial applications where accurate predictions and risk assessments are essential. By following systematic approaches to model development and validation, practitioners can better identify and address underfitting issues.