【arXiv 2209】Git Re-Basin：合并模型的模排列对称性（git ...

邓志刚 · 发表于 2022-9-21 15:05:50

Git Re-Basin: Merging Models modulo Permutation Symmetries

Samuel K. Ainsworth, Jonathan Hayase, Siddhartha Srinivasa

The success of deep learning is thanks to our ability to solve certain massive non-convex optimization problems with relative ease. Despite non-convex optimization being NP-hard, simple algorithms -- often variants of stochastic gradient descent -- exhibit surprising effectiveness in fitting large neural networks in practice. We argue that neural network loss landscapes contain (nearly) a single basin, after accounting for all possible permutation symmetries of hidden units. We introduce three algorithms to permute the units of one model to bring them into alignment with units of a reference model. This transformation produces a functionally equivalent set of weights that lie in an approximately convex basin near the reference model. Experimentally, we demonstrate the single basin phenomenon across a variety of model architectures and datasets, including the first (to our knowledge) demonstration of zero-barrier linear mode connectivity between independently trained ResNet models on CIFAR-10 and CIFAR-100. Additionally, we identify intriguing phenomena relating model width and training time to mode connectivity across a variety of models and datasets. Finally, we discuss shortcomings of a single basin theory, including a counterexample to the linear mode connectivity hypothesis.

深度学习的成功归功于我们能够相对轻松地解决某些大规模的非凸优化问题。尽管非凸优化是 NP 难的，但简单的算法（通常是SGD的变体）在实践中拟合大型神经网络时表现出令人惊讶的有效性。在考虑了隐藏单元所有可能的排列对称性之后，我们认为神经网络损失包含（几乎）一个单一的极值点。我们引入了三种算法来置换一个模型的单元，使它们与参考模型的单元对齐。这种转换产生了一组功能等效的权重，它们位于参考模型附近的近似凸极值点中。在实验上，我们在各种模型架构和数据集中展示了单极值点现象，包括首次（据我们所知）在 CIFAR-10 和 CIFAR-100 上独立训练的 ResNet 模型之间的零障碍线性模式连接性演示。此外，我们还发现了将模型宽度和训练时间与各种模型和数据集的模式连通性相关的有趣现象。最后，我们讨论了单极值点理论的缺点，包括线性模式连通性假设的反例。

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2209.04836 [cs.LG]