The zero-one loss function is less sensitive to outliers than convex surrogate losses such as hinge and cross-entropy. However, as a non-convex function, it has a large number of local minima, andits undifferentiable attribute makes it impossible to use backpropagation, a method widely used in training current state-of-the-art neural networks. When zero-one loss is applied to deep neural networks, the entire training process becomes challenging. On the other hand, a massive non-unique solution probably also brings different decision boundaries when optimizing zero-one loss, making it possible to fight against transferable adversarial examples, which is a common weakness in deep learning neural network models.
This dissertation introduces a stochastic coordinate descent to optimize the linear classification model based on zero-one loss. Moreover, its variants are successfully applied to multi-layer neural networks using sign activation and multi-layer convolutional neural networks to obtain higher image classification performance. In some image benchmark tests, the stochastic coordinate descent method achieves accuracy close to that of the stochastic gradient descent method. At the same time, some heuristic techniques are used, such as random node optimization, feature pool, warm start, step training, additional backpropagation penetration, and other methods to speed up training and save memory usage. Furthermore, the model's adversarial robustness is analyzed by conducting white-box attacks, decision boundary attacks, and comparing zero-one loss models to those using more traditional loss functions such as cross-entropy.
|