Organizational Unit:
Machine Learning Center
Machine Learning Center
2018-03-14
,
Aghasi, Alireza
We introduce and analyze a new technique for model reduction in deep neural
networks. Our algorithm prunes (sparsifies) a trained network layer-wise, removing
connections at each layer by addressing a convex problem. We present both parallel and
cascade versions of the algorithm along with the mathematical analysis of the consistency
between the initial network and the retrained model. We also discuss an ADMM
implementation of Net-Trim, easily applicable to large scale problems. In terms of the
sample complexity, we present a general result that holds for any layer within a network
using rectified linear units as the activation. If a layer taking inputs of size N can be
described using a maximum number of s non-zero weights per node, under some mild
assumptions on the input covariance matrix, we show that these weights can be learned
from O(slog N/s) samples.