寻找 Transformer 文本模型训练数据与对抗鲁棒性之间相关性的奇怪案例,arXiv - CS - Machine Learning

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

深沉的黄豆 · 易观分析：当当网2014年第1季度财报分析 ...· 4 月前 ·

近视的镜子 · 周蔚延 (Hui Yan Chew) - ...· 6 月前 ·

满身肌肉的包子 · e: gnupg, gnupg2 and ...· 6 月前 ·

大力的冰棍 · iOS ...· 6 月前 ·

讲道义的企鹅 · 还珠楼主：现代武侠小说之王-新西部网-《新西 ...· 8 月前 ·

现有的工作表明，经过微调的文本转换器模型可以实现最先进的预测性能，但也容易受到对抗性文本扰动的影响。传统的对抗性评估通常是在微调模型并忽略训练数据之后才进行的。在本文中，我们想要证明训练数据和模型鲁棒性之间也存在很强的相关性。为此，我们提取了代表各种输入微调语料库属性的 13 个不同特征，并使用它们来预测微调模型的对抗鲁棒性。我们主要关注仅编码器的 Transformer 模型 BERT 和 RoBERTa，以及 BART、ELECTRA 和 GPT2 的其他结果，提供了多种证据来支持我们的论点。首先，实证分析表明，（a）提取的特征可以与随机森林等轻量级分类器一起使用，以有效预测攻击成功率；（b）对模型鲁棒性影响最大的特征与鲁棒性有明显的相关性。其次，我们的框架可以用作鲁棒性评估的快速有效的附加工具，因为它（a）与传统技术相比节省了 30 倍至 193 倍的运行时间，（b）可以跨模型转移，（c）可以在对抗性训练下使用，(d) 对统计随机性具有鲁棒性。我们的代码将公开。 Existing works have shown that fine-tuned textual transformer models achieve state-of-the-art prediction performances but are also vulnerable to adversarial text perturbations. Traditional adversarial evaluation is often done \textit{only after} fine-tuning the models and ignoring the training data. In this paper, we want to prove that there is also a strong correlation between training data and model robustness. To this end, we extract 13 different features representing a wide range of input fine-tuning corpora properties and use them to predict the adversarial robustness of the fine-tuned models. Focusing mostly on encoder-only transformer models BERT and RoBERTa with additional results for BART, ELECTRA and GPT2, we provide diverse evidence to support our argument. First, empirical analyses show that (a) extracted features can be used with a lightweight classifier such as Random Forest to effectively predict the attack success rate and (b) features with the most influence on the model robustness have a clear correlation with the robustness. Second, our framework can be used as a fast and effective additional tool for robustness evaluation since it (a) saves 30x-193x runtime compared to the traditional technique, (b) is transferable across models, (c) can be used under adversarial training, and (d) robust to statistical randomness. Our code will be publicly available.