Mathematical Guarantees of Near-Optimal Cross-Entropy Loss through Neural Collapse
Author(s)
Pan, Leyan
Advisor(s)
Editor(s)
Collections
Supplementary to:
Permanent Link
Abstract
Neural Collapse is a recently observed geometric structure that emerges in the final layer of neural network classifiers. Specifically, Neural Collapse states that at the terminal phase of neural networks training, 1) the intra-class variability of last layer features tends to zero, 2) the class feature means form a simplex Equiangular Tight Frame (ETF) up to scaling, 3) last-layer class features and weights becomes equal up the scaling, and 4) classification behavior collapses to the nearest class center (NCC) decision rule. This thesis investigates the emergence of Neural Collapse for unbiased neural network classifiers when the training cross-entropy loss is near-optimal. Theoretically, we show that, within a small neighborhood of the optimal cross-entropy loss, with $C$ target classes, neural network classifiers with batch normalization and weight decay achieve intra-class feature cosine similarity near one and inter-class cosine similarity near $-\frac{1}{C-1}$, which justifies multiple observations of Neural Collapse under realistic conditions guaranteed by neural network training. Our theorems also imply that higher weight decay values imply more significant Neural Collapse and that batch normalization with affine transformation along with weight decay is critical to the emergence of Neural Collapse, which is supported by experiments. We also empirically investigate the effect of various parameters on the inter- and intra-class cosine similarity of representations obtained.
Sponsor
Date
2023-05-04
Extent
Resource Type
Text
Resource Subtype
Thesis