- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
- <iframe src="https://www.slidestalk.com/Baiyulan/5_11_baiyulan21395?embed&video" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
- 微信扫一扫分享
相关代码已开源在 https://github.com/Thinklab-SJTU/robustMatch
1 .深度视觉图匹配的鲁棒性探索 及数据中心视角下的展望 任麒冰 上海交通大学 2022年5月11日
2 . Background: Visual graph matching Visual graph matching finds correspondence in images by graph matching. Figure credit to Jiaxin Lu, SJTU
3 . Background: Deep visual graph matching ❏ Current state-of-the-art[1][2] Keypoint Graph Matching Graph construct feature extractor solver VGG16 with Delaunay GNN solver or SplineConv triangulation black box solver Images with Keypoints Correspondence [1] Runzhong Wang, J. Yan, X. Yang. “Neural Graph Matching [2] Rolínek et al., “Deep Graph Matching Network: Learning Lawler Quadratic Assignment Problem via Blackbox Differentiation of with Extensions to Multi-Graph Matching and Hyper-Graph Combinatorial Solvers.”ECCV 2020. Matching Learning." TPAMI 2021. Figure credit to Runzhong Wang, SJTU
4 . Background: Adversary threat model ❏Adversary goal: evasion attack • Natural risk with Standard training (ST): 𝑅𝑛𝑎𝑡 𝑓, 𝐷 = 𝔼 𝑥,𝑦 ~𝐷 [𝐿(𝑓 𝑥 , 𝑦)] • Evasion attack at testing time: 𝑚𝑎𝑥 𝑡𝑒𝑠𝑡 𝑅𝑛𝑎𝑡 𝑓𝐷𝑡𝑟𝑎𝑖𝑛 , 𝐷 𝑡𝑒𝑠𝑡 ∈ ℬ(𝐷𝑡𝑒𝑠𝑡 ,𝜖) 𝐷 𝑠. 𝑡. 𝑓𝐷𝑡𝑟𝑎𝑖𝑛 = 𝑎𝑟𝑔 𝑚𝑖𝑛 𝑅𝑛𝑎𝑡 𝑓, 𝐷𝑡𝑟𝑎𝑖𝑛 ❏Adversary capabilities: 𝑓 • Similarity metric: ℬ 𝐷, 𝜖 = {𝑥 ′ : 𝑑(𝑥, 𝑥′) ≤ 𝜖, ∀𝑥 ∈ 𝐷} • Common practice for vision data: On image pixels: 𝑑 𝑥, 𝑥 ′ =∥ 𝑥 − 𝑥 ′ ∥𝑝 • Common practice for graph data: Node injection, edge manipulation (addition or deletion), etc.
5 . Background: Adversary threat model ❏ Adversary knowledge: white-box attack • Fast Gradient Signed Method (FGSM): 𝑥 ′ = 𝑥 + 𝛼 ⋅ sign(𝛻𝑥 𝐿(𝑓 𝑥 , 𝑦)) 𝑠. 𝑡. 𝑥′ ∈ ℬ 𝑥, 𝜖 ′ • Projected Gradient Descent (PGD): 𝑥𝑡+1 = 𝛱𝜖 (𝑥𝑡′ +𝛼 ⋅ sign(𝛻𝑥𝑡′ 𝐿(𝑓 𝑥𝑡′ , 𝑦)) Left: natural image. Middle: adversarial perturbation found by PGD attack. Right: adversarial example. Trajectory visualization of PGD attack on loss surface Image from https://towardsdatascience.com/know-your-enemy-7f7c5038bdf3
6 . Background: Adversarial defense ❏ Proactive defense: • Adversarial risk by Adversarial training (AT): 𝑅𝑎𝑑𝑣 𝑓, 𝐷𝑡𝑟𝑎𝑖𝑛 = 𝔼 𝑥,𝑦 ~𝐷𝑡𝑟𝑎𝑖𝑛 [max𝑥 ′ ∈ℬ 𝑥,𝜖 𝐿(𝑓 𝑥′ , 𝑦)] • Adversarial risk by TRADES: 𝑅𝑎𝑑𝑣 𝑓, 𝐷𝑡𝑟𝑎𝑖𝑛 = 𝔼 𝑥,𝑦 ~𝐷𝑡𝑟𝑎𝑖𝑛 𝐿 𝑓 𝑥 , 𝑦 + max𝑥′ ∈ℬ 𝑥,𝜖 𝐾𝐿(𝑓 𝑥 ′ , 𝑓(𝑥) /𝜆] ❏A closer look at the decision boundary (DB): • Current limitations: 1. robustness-accuracy trade-off 2. robust generalization Left. A set of separatable points. Middle. DB under ST. Right. DB under AT. 3. robustness overestimation
7 . Vulnerabilities of deep visual GM ❏ Challenges: edge manipulation and node injection attack are NOT feasible. • Our solution: attack the hidden graph structure G via perturbing keypoint locality z. • The attack objective: ′ ′ ′ max ′ ′ max ′ L(f(c , z , G ), y) c ,z G 𝑠. 𝑡. d∞ 𝑐 ′ , 𝑐 ≤ 𝜖𝑐 , d∞ 𝑧 ′ , 𝑧 ≤ 𝜖𝑧 Pixel attack (epsilon= 8/255) Attack direction Newly added edge Deleted edge Cat: 9 / 11 locality attack (epsilon= 8) Cat: 2 / 11
8 . Towards robustness of deep visual GM ❏Challenges: defenses on single graph are NOT feasible for two (multi)-GM. • Key observation: appearance-similar keypoints are easily mismatched with each other. Figure. Visualizations of matching result (before) after being attacked in Before attack After attack sample-level. Figure. Visualizations of matching result (before) after being attacked in statistic-level.
9 . Appearance aware regularizer (AAR) ❏Key insight: appearance-similar keypoints can be discovered by attack priors. • The working pipeline: Hungarian Attack Step 4 Step 1 AAR matrix appearance aware matrix appearance-similar group p1 a Step 4 Step 3 Step 2 b c p2 d e p3
10 . Proposed framework: ASAR-GM ❏ASAR-GM: our proposed AAR is orthogonal to adversarial training. • Min-max optimization framework: ′ ′ ′ Pixel attack (epsilon= 8/255) min L Pixel f cattack , z(epsilon= y + 𝛽 ∗ AAR(f c ′ , z ′ , G′ , y) , G ,8/255) 𝜃 Pixel attack (epsilon= 8/255) 𝑠. 𝑡. c ′ , z ′ , G′ = arg max ′ ′ max ′ L(f(c ′ , z ′ , G′ ), y) c ,z G • Burn-in period for a better trade-off between accuracy and robustness. Pixel attack (epsilon= 8/255) Appearance Aware Regularizer Pixel attack (epsilon= 8/255) Pixel attack (epsilon= 8/255) Pixel attack (epsilon= 8/255) feature � � extractor Cat: 9 / 11 locality attack (epsilon= 4) Cat: 2 / 11 Cat: 9 / 11 locality attack (epsilon= 4) Cat: 2 / 11 55) Pixel attack (epsilon= Cat: 9 / 8/255) 11 locality attack (epsilon= 4) Cat: 2 / 11 Pixel attack (epsilon= 8/255) affinity doubly- stochastic matrix ground-truth matching learning Cat: 9 / 11 locality attack (epsilon= 4) Cat: 2 / 11 loss 55) Cat: 9 / 11 locality attack (epsilon= 4) Cat: 92 // 11 Cat: 11 locality attack (epsilon= 4) Cat: 2 / 11 Cat: 9 / 11 locality attack (epsilon= 4) Cat: 2 / 11 correspondence solver localityCat: attack Cat: 2 / 11 Deep GM cross entropy 4) 2 / (epsilon= 11 4) Adversarial example generation locality attack (epsilon= 4) Cat: 2 / 11
11 . Experiments on Pascal VOC Datasets ❑ Evaluation: • clean accuracy: evaluation on clean test-set w/o being attacked. • robust accuracy: evaluation on the worst-case test-set being attacked. ❑ Attack Baselines: • White-box Attack: pixel, our locality, and combo attack with varying attack iterations. • Black-box Attack: query-based square attack and transfer-based MI-FGSM attack. • Adaptive Attack: generate adversarial examples via maximizing our defense loss. ❑ Defense Baselines: • Different GM Standard training baselines: PCA-GM, CIE-H, BBGM, etc. • Adversarial training with variants of inner maximization. Table. White-box robust accuracy (%) under various attacks. Obfuscated gradients!
12 . Experiments on Pascal VOC Datasets Table. Black-box robust accuracy (%) under various attack Obfuscated gradients! Table. White-box robust accuracy (%) under various attacks for ablation study. Results: 1) our AAR brings 2% higher acc and 7% higher robo over AT. 2) our locality attack, as a data augmentation, make a new state-of-the-art, 81.82% acc. 3) our locality attack is much stronger than vanilla pixel attack on Pixel AT with a 17.32% acc drop. 4) ASAR-GM (ours) outperforms baselines in all cases with averagely 25.8% impv.
13 . Experiments on Pascal VOC Datasets ❏ Visualizations of more matching results. Figure. Visualizations of the matching result of the baseline and our robust model under attacks.
14 . A data-centric view towards robustness ❑ Data quality matters for robustness: • Human annotated keypoint locality is sub-optimal. Our locality attack induces better discriminative features via perturbing locality. • Graphs constructed by Delaunay triangulation is vulnerable to small noise. Our locality attack improves graph structure diversity via perturbing locality. ❑ Good priors for graph construction: • Delaunay triangulation delivers good locality. Locality helps GM solver aggregate neighbor bias for feature updating. • Approximating the isomorphic topology structure may be the next step. ❑ From coarse to fine-grained graph matching: • Current methods overlooks intra-graph keypoint intersections. Each keypoint has its unique semantic label. • Our attack reveals semantic similarity by identifying appearance-similar keypoint groups. Thanks for listening!







