研究生: 潘家洋
研究生(外文): Jia-Yang Pan
論文名稱: 針對基於操作碼的惡意軟體檢測器在組合語言層級使用 Transformer 之對抗式攻擊
論文名稱(外文): Adversarial Attacks Against Opcode-based Malware Detectors Using Transformer at the Assembly Level
指導教授: 李漢銘 李漢銘引用關係 鄭欣明 鄭欣明引用關係
指導教授(外文): Hahn-Ming Lee Shin-Ming Cheng
口試委員: 李育杰 黃意婷
口試委員(外文): Yuh-Jye Lee Yi-Ting Huang
口試日期: 2023-07-26
學位類別: 碩士
校院名稱: 國立臺灣科技大學
系所名稱: 資訊工程系
學門: 工程學門
學類: 電資工程學類
論文種類: 學術論文
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 58
中文關鍵詞: 對抗式攻擊 機器學習 Transformer 惡意軟體檢測 靜態分析
外文關鍵詞: Adversarial Attack Machine Learning Transformer Malware Detection Static Analysis
隨著數位世界的快速發展,惡意軟體已成為網路安全上的重大威脅。在惡意軟體檢測中,機器學習扮演著關鍵的角色。然而,攻擊者持續不斷地尋找建立對抗性樣本以繞過檢測器的方法,這使得惡意軟體檢測器的穩健性成為一個重要的問題。在本研究中,我們針對基於操作碼的惡意軟體檢測器,透過使用 Transformer 生成 benign-looking payload,並在不影響完整性和功能性的情況下將其插入目標二進制文件中,以引導檢測器做出誤判。Transformer 是一種具有自我注意力機制的模型,這種機制使得 Transformer 在處理長文本和序列等資料時能夠更好地捕捉長距離的相互作用,同時具有平行計算的能力。因此,我們選擇使用合法且具有意義的良性操作碼序列來訓練 Transformer 模型。這些操作碼序列能夠準確描述軟體的操作行為,並有效地影響模型的預測能力。此外,為了減少樣本生成的迭代次數,本研究採用加權抽選優化算法(Weighted Sampling Optimization Algorithm, WSOA),旨在提高注入效率減少注入量。我們對四種不同操作碼特徵設置下的檢測器進行了評估,實驗結果顯示,相較於現有方法,我們節省了超過1/2的攻擊成本。總結來說,我們提出了一種新的方式來測試惡意軟體檢測器的穩健性,透過 Transformer 生成不同的可能性,以提高檢測器的防禦能力,為防禦者提供有益的啟示。
With the rapid development of the digital world, malware has become a significant threat to cybersecurity. Machine learning plays a crucial role in malware detection. However, attackers persistently seek ways to create adversarial examples to evade the detectors, making the robustness of malware detectors a critical concern. In this study, we target an opcode-based malware detector and employ Transformer to generate benign-looking payloads. These payloads are inserted into the target binary files without compromising their executability and functionality, thus misleading the detector's judgments. Transformer is a model with a self-attention mechanism, which allows it to better capture long-range interactions in data such as long texts and sequences while possessing parallel computing capabilitie. Therefore, we choose to train the Transformer model using legitimate and meaningful opcode sequences. These opcode sequences accurately describe the software's behavior and effectively influence the model's predictive capability. Moreover, to reduce the number of iterations in sample generation, this study adopts the Weighted Sampling Optimization Algorithm (WSOA), aiming to improve injection efficiency and reduce injection quantity. We evaluate the detector under four different opcode feature settings, and the experimental results show that compared to existing methods, we save over half of the attack costs. In conclusion, we propose a novel approach to test the robustness of malware detectors by leveraging Transformer to generate diverse possibilities, thereby enhancing the detector's defense capabilities and providing valuable insights to defenders.
中文摘要 i
誌謝 iv
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Challenges and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Background and Related Work 9
2.1 Static Malware Detection . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 ELF File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Adversarial Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.1 Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.2 Functionality-preserving . . . . . . . . . . . . . . . . . . . . 17
2.4 Sequential Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.1 Recurrent Neural Networks(RNNs) . . . . . . . . . . . . 20
2.4.2 Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Methodology 24
3.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.1 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Code Injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Training Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Payload Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5 Attack Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4 Experimental Results and Robustness Analysis 37
4.1 Dataset and Experiment Setting . . . . . . . . . . . . . . . . . . . . . 37
4.2 Malware Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Analyzing Attack Results . . . . . . . . . . . . . . . . . . . . . . . . 39
5 Limitations and Future Work 47
5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6 Conclusions 49
