- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
发现并解释神经网络的表征瓶颈
对神经网络表征能力的研究,一直是深度学习领域最核心的问题之一。前人研究通常采用单个量化指标(如准确率、模型复杂度、对抗鲁棒性等)评价神经网络的表达能力。这项研究首次从神经网络中海量交互概念的复杂度出发,探索神经网络在特征表达方面的公共瓶颈。我们发现了神经网络的表征瓶颈现象,即神经网络往往善于建模极为简单和极为复杂的交互概念,而不善于建模中等复杂度的交互概念。这一表征瓶颈普遍存在于不同任务、不同结构下的神经网络。这项研究进一步理论解释了这一表征瓶颈的内在机理。此外,该研究提出了新的方法反馈指导神经网络学习特定复杂度的交互概念,并研究了概念复杂度与表征能力的联系,为神经网络表达能力的研究提供了众多新的视角。
展开查看详情
1 . Huiqi Deng∗, Qihan Ren∗, Hao Zhang, Quanshi Zhang† Shanghai Jiao Tong University * Equal contribution. † corresponding author.
2 . Background and Motivation Traditional models Deep neural networks (DNNs) Performance DNNs Traditional model Amount of data Why superior performance? Representation capacities A common hypothesis
3 . Background and Motivation • How to study the representation capacities of DNNs ? parameter complexity overfitting Representation capacity Generalization ability DNNs Adversarial + = robustness Panda Gibbon ✓ Reflect the representation capacity from a certain perspective ......
4 . Motivation • Unlike previous studies, we focus on the following questions ⚫ Any common tendencies of DNNs in representing concepts, i.e., which types of concepts are (un)likely to be encoded in DNNs? ⚫ Does a DNN encode similar visual concepts to human beings for image classification?
5 . Conclusions Types of concepts ? Complexity of interactions ⚫ Any common tendencies of DNNs in representing concepts, i.e., which types of concepts are (un)likely to be encoded in DNNs? ✓ Simple interactions × Middle-complex interactions ✓ Complex interactions ⚫ Does a DNN encode similar visual concepts to human beings for image classification? visual concepts encoded in a DNN ≠ human beings
6 . Interactions • Interactions and interaction concepts independently DNN inference Intetaction utility interact 0.1 trigger the importance of i and changed by j variable i variables j Interaction concept (head) The inference of DNN: × considers input variables working independently. ✓ encodes the interaction between input variables to form an interaction concept for inference.
7 . Multi-order interactions • Complexity of interaction concepts variable i variable i interact interact variable j variable j A few large amounts of contexts S contexts S Simple interactions Complex interactions • Multi-order interactions to represent complexity The importance of i changed by j the importance of i the importance of i (when j is present) (when j is absent) Multi-order interaction order
8 . Multi-order interactions • Complexity of interaction concepts variable i variable i interact interact variable j variable j A few large amounts contexts S of contexts S Simple interactions Complex interactions • Multi-order interactions to represent complexity A simple collaboration A small m (low-order) between a few variables A complex collaboration A larger m (high-order) between massive variables
9 . Output vs Multi-order interactions • Efficiency axiom of multi-order interactions Efficiency axiom. the network output can be decomposed into utilities of multi-order interactions of differnent orders (i.e., interaction concepts of different complexities). Output independent utilities Overall Utilities of multi-order intearctions small m large m medium m low-order middle-order high-order (simple) (middle-complex) (complex) ➢ Therefore, interaction concepts can be exactly categorized into concepts of low-order (simple), middle-order (middle-complex), and high-order (complex).
10 . Discovering the bottleneck • The relative interaction strength of the m-th order Representation bottleneck: A DNN usually encodes strong low-order and strong high-order interactions, but encodes weak middle-order interactions. 𝑱(𝒎) • The representation bottleneck phenomenon is widely shared by different DNN architectures trained on different datasets.
11 . Bottleneck → Cognition gap Cognition gap. i.e., DNNs and humans encode different types of interaction patterns for inference. context: a few patches context: middle number context: massive patches of patches • Whether humans/DNNs can extract new information from additional new patches under contexts of different sizes.
12 . Explaining the bottleneck • Proof: the change of network weights can be decomposed into the sum of gradients of multi-order interactions w.r.t. weights. the strength of learning the m-order interactions. much higher when the order m is small or large Training strength Bottleneck much lower when the order m is medium
13 . Explaining the bottleneck • Verification of the theory Simulate Theoretical training strength Empirical interaction strength (in Theorem 1) (in real applications) Simulations of the distributions based on curves on ImageNet.
14 . Train DNNs encoding specific orders of interactions • Can we force the DNN to encode interactions of specific orders ? We prove that the mainly encodes interactions of orders.
15 . Train DNNs encoding specific orders of interactions Encourage specific orders of interactions: Penalize specific orders of interactions: Total loss: In experiments, we found that the two losses usually could encourage/penalize interactions of the -th orders.
16 . Investigating representation capacities We investigate the representation capacities of four types of DNNs • Normal DNN: normally trained DNN • Low-order DNN: penalize high-order interactions. • Middle-order DNN: encourage middle-order interactions. • High-order DNN: penalize low-order interactions.
17 . Investigating representation capacities Part I: Classification accuracy ➢ The four types of DNNs achieved similar accuracies ➢ Middle-order interactions can also provide discriminative information 17
18 . Investigating representation capacities Part II: Adversarial robustness ➢ High-order interactions are vulnerable to adversarial attacks ➢ Low-order interactions are more robust to adversarial attacks Low-order Middle-order High-order 18
19 . Investigating representation capacities Part III: Bag-of-words vs. structural representations. random masking masked surrounding masking masked ➢ The high-order DNN encodes more structural information. High-order DNN High-order DNN 19
20 . Investigating representation capacities Part III: Bag-of-words vs. structural representations. random masking masked surrounding masking masked ➢ The low-order DNN prefers bag-of-words representation. 20
21 . Conclusions • Discover a representation bottleneck phenomenon of DNNs. • Theoretically explain the representation bottleneck. • Propose losses to force DNNs to encode interactions of specific orders. • Investigate the representation capacities of low-order, middle-order, and high-order DNNs. 21