A new study from Google and DeepMind presents Geometric Complexity (GC) for analyzing neural networks and understanding deep learning models
Understanding how regularization affects the properties of the learned solution is a growing research topic. This is a particularly crucial element of deep learning. Whether we include it explicitly as a penalty term in a loss function or implicitly through choice of hyperparameters, model architecture, or initialization, regularization can take many forms. In practice, regularization is commonly used to control model complexity, putting pressure on a model to identify simple solutions rather than complicated answers, even though these shapes are often not intended to be analytically tractable.
A clear definition of model “complexity” for deep neural networks is necessary to understand regularization in deep learning. Thanks to complexity theory, many methods for evaluating the complexity of a model have problems when used in neural networks. The recently discovered phenomenon of “double descent” is a perfect example of this: neural networks with high model complexity can closely match training data while having minimal test error. However, neural networks appear to be able to interpolate while simultaneously having minimal test error, which contradicts the classical expectation that interpolation of training data is evidence of overfitting.
A new study by researchers from Google and DeepMind presents Geometric Complexity (GC), a new measure of model complexity with traits well suited for studying deep neural networks. Using theoretical and empirical methods, researchers show that a wide variety of regularization and training heuristics can regulate geometric complexity using a variety of mechanisms. These standard training heuristics include:
- Overparameterized template startup routines with many layers
- Common initialization schemes
- Above-average learning rates, smaller batches, and implicit gradient regularization
- Regularizations for flatness, label noise, spectral norms, and explicit parameter norms.
Their results demonstrate that geometric complexity captures the double descent behavior observed in test loss with increasing number of model parameters.
During model training, researchers use non-pulse stochastic gradient descent to isolate the studied effects from the optimization strategies employed. To avoid masking effects, they study the influence of a single training heuristic on geometric complexity. Therefore, the main body of the article does not use data augmentation or learning rate schedules. In the SM, they replicated most of the tests using SGD with momentum and Adam and found comparable results. In a setting where a learning rate schedule, data augmentation, and explicit regularization are all used together to improve model performance, the researchers see the same geometric complexity behavior.
Theoretical reasons for this study are limited to ReLU activations and DNN architectures with exponential family log-likelihood losses, such as multidimensional regression with least-squares loss or multi-class classification with cross-entropy loss. Their findings show DNN and ResNet architectures on MNIST and CIFAR image datasets. Now they are focused on improving their understanding of the effects of widely used training techniques. After that, they plan to improve the training methods.
Overall, geometric complexity is a useful lens for understanding deep learning because it explains how neural networks can achieve low test errors while producing highly expressive models. The team hopes the results will spur further investigation into this link, both illuminating the state of the art and paving the way for the discovery of even better methods for training models.
This Article is written as a research summary article by Marktechpost Staff based on the research preprint-paper 'Why neural networks find simple solutions: the many regularizers of geometric complexity'. All Credit For This Research Goes To Researchers on This Project. Check out the paper. Please Don't Forget To Join Our ML Subreddit
Tanushree Shenwai is an intern consultant at MarktechPost. She is currently pursuing her B.Tech from Indian Institute of Technology (IIT), Bhubaneswar. She is a data science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring new technological advancements and applying them to real life.