深度学习已广泛应用于计算机视觉、自然语言处理和数据挖掘等各个领域。尽管深度学习可以高效地处理各领域问题,但是它同时会受到对抗攻击的威胁,从而使其在特定领域的应用变得不稳定,甚至出现安全漏洞。提升对抗样本在不同模型之间的可转移性有助于提高网络鲁棒性,同时基于可转移的攻击方式相比其他攻击方式也更加高效。本文针对图像分类领域,在总结现有对抗攻击算法的基础上,聚焦于提升迭代攻击、中间层攻击和对ViT模型攻击三个方面生成对抗样本的可转移能力,提出了对应的改进措施和解决方案。概括而言,本文的主要内容包括以下三个方面:
(1)为了提升迭代攻击算法的可转移性,本文从几何的角度分析了过拟合现象和可转移性的关系,提出了基于几何夹角的可转移攻击算法(ANI-FGSM)。该算法以损失函数在对抗样本处及其邻域内随机样本的梯度方向夹角为正则项,能够平滑模型的损失函数,从而生成高可转移性的对抗样本。进一步地,本文给出了ANI-FGSM提升可转移性的理论证明。此外,ANI-FGSM在正常训练模型、对抗训练模型和防御模型上的实际表现均强于传统算法,具有更高的黑盒攻击成功率和更强的可用性。
(2)为了充分利用两阶段方法中增强阶段的梯度信息,本文提出了基于增强阶段梯度的中间层攻击算法(UILA)。该算法使用快照点存储最新的指导方向信息,并设置外循环进行快照点的更新,改善了现有算法对输入敏感度高的缺点。另外,本文引入了一个以经验观察为指导的程序,能够单独使用源模型来选择最佳层索引,避免了超参数优化期间对目标模型再次进行评估,降低了实验成本。最后,在不同规模数据集上的实验结果表明了本文所提UILA算法在白盒攻击上的有效性,并且使用不同源模型验证了该算法生成对抗样本的可转移性。
(3)为了提高对抗样本在ViT模型上的可转移性,本文提出了分块与稀疏交替攻击算法(PSAA)。本文首先介绍了ViT模型的基本结构,并比较了ViT和CNN模型特征提取方式的不同。根据自注意力机制可以同时对全局和局部特征提取的特点,交替地使用分块和稀疏攻击来生成对抗样本,有效地干扰了ViT模型对补丁之间交互信息和图像特征的提取。此外,通过在稀疏攻击上引入PNA机制进一步提升了PSAA算法的可转移性。实验结果表明,PSAA算法不仅可以在白盒设置下有效地攻击ViT模型和CNN模型,同时在黑盒设置下也具有良好效果。
综上所述,本文提出了三个有效算法来产生具有高可转移性的对抗样本,在不同的模型上均有良好的表现,可以广泛地应用于实际场景中。
Deep learning has been widely used in computer vision, natural language processing, data mining and other fields. Although deep learning can efficiently handle problems in various fields, it can also be threatened by adversarial attacks, making its applications in specific fields unstable and even leading to security vulnerabilities. Improving the transferability of adversarial examples between different models can help improve network robustness, and at the same time, attacks based on transferability are more efficient compared to other attack methods. In this thesis, we focus on improving the transferability of adversarial samples in three areas: iterative attacks, middle layer attacks and attacks on ViT models, and propose corresponding improvements and solutions based on the summary of existing adversarial attack algorithms for the image classification domain. In summary, the main contents of this thesis include the following three aspects:
(1) In order to improve the transferability of the iterative attack algorithm, this thesis analyses the relationship between the overfitting phenomenon and transferability from a geometric perspective and proposes the angle-based transferable attack algorithm (ANI-FGSM). The algorithm takes the angle of the gradient direction of the loss function at the adversarial sample and the random samples in its neighbourhood as the regular term, and is able to smooth the loss function of the model so as to generate the adversarial samples with high transferability. Further, a theoretical proof of the enhanced transferability of ANI-FGSM is given in this thesis. In addition, ANI-FGSM outperforms the conventional algorithm in practice on the normal training model, the adversarial training model and the defence model, with higher success rate of black-box attacks and stronger usability.
(2) In order to make full use of the gradient information in the augmentation phase of the two-stage approach, this thesis proposes an augmentation phase gradient-based intermediate layer attack algorithm (UILA). The algorithm uses snapshot points to store the latest guidance direction information and sets up an outer loop for updating the snapshot points, overcoming the weakness of existing algorithms that are highly sensitive to input. In addition, this thesis introduces an empirical observation-guided procedure that is able to use the source model alone to select the best layer index, avoiding the need to re-evaluate the target model during hyperparameter optimization and reducing the cost of experimentation. Finally, experimental results on datasets of different sizes demonstrate the effectiveness of the UILA algorithm proposed in this thesis on white-box attacks, and the transferability of the algorithm for generating adversarial samples using different source models is verified.
(3) To improve the transferability of the adversarial samples on the ViT model, the chunking and sparse alternating attack algorithm (PSAA) is proposed in this thesis. This thesis first introduces the basic structure of the ViT model and compares the differences in feature extraction methods between ViT and CNN models. Based on the feature that the self-attentive mechanism can simultaneously extract global and local features, the alternating chunking and sparse attacks are used to generate adversarial samples, which effectively interfere with the extraction of interaction information between patches and image features by the ViT model. In addition, the transferability of the PSAA algorithm is further enhanced by introducing the PNA mechanism on sparse attacks. The experimental results show that the PSAA algorithm can not only effectively attack the ViT model and CNN model in the white-box setting, but also has good results in the black-box setting.
In summary, this article proposes three effective algorithms to generate adversarial examples with high transferability, which have shown good performance on different models and can be widely applied in practical scenarios.