摘要:
目前许多深度学习检测模型在各项指标上达到较好的效果,但是由于安全管理者不理解深度学习模型的决策依据,导致一方面无法信任模型的判别结果,另一方面不能很好地诊断和追踪模型的错误,这极大地限制了深度学习模型在该领域的实际应用。面对这样的问题,文章提出了一个基于稀疏自动编码器的可解释性异常流量检测模型(Sparse Autoencoder Based Anomaly Traffic Detection,SAE-ATD)。该模型利用稀疏自动编码器学习正常流量特征,并在此基础上引入了阈值迭代选取最佳阈值,以提高模型的检测率。模型预测完毕后,将预测结果的异常值送入解释器中,通过解释器对参考值进行迭代更新后,返回每个特征参考值和异常值的差值,并结合原始数据进行可解释性分析。文章在CICIDS2017数据集和CIRA-CIC-DoHBrw-2020数据集上进行实验,实验结果表明SAE-ATD在两个数据集上对大部分攻击检测的精确率和召回率达到99%,且能给模型提供可解释性。
Abstract:
Although many deep learning detection models achieve good results in various indicators, security managers do not understand the decision-making basis of deep models, on the one hand, they cannot trust the discrimination results of the model, and on the other hand, they cannot diagnose and track the errors of the model well, which greatly limit the practical application of deep learning models in this field. Faced with such a problem, this paper proposed a Sparse Autoencoder Based Anomaly Traffic Detection (SAE-ATD). The model used the sparse autoencoder to learn the normal traffic characteristics, and on this basis, a threshold was introduced to iteratively select the best threshold to improve the detection rate of the model. After the model was predicted, the outliers in the prediction results were fed into the explainer, and after iteratively updating the reference values through the explainer, the difference between each feature reference value and the outlier was returned, and interpretability analysis was carried out in combination with the original data. In this paper, experiments are carried out on the CICIDS2017 dataset and the CIRA-CIC-DoHBrw-2020 dataset, and the experimental results show that SAE-ATD has 99% accuracy and recall for most attacks detection on the two datasets, and can also provide explainability for the model.
Key words:
anomaly traffic detection,
autoencoder,
deep learning,
explainability
由信息增益率产生的特征排名
编号
|
特征名字
|
信息增益率
|
1
|
min_seg_size_forward
|
41.9%
|
2
|
Init_Win_bytes_backward
|
41.2%
|
3
|
Init_Win_bytes_forward
|
41.1%
|
4
|
Bwd Packet Length Min
|
40.4%
|
5
|
Total Length of Bwd Packets
|
40.4%
|
6
|
Subflow Bwd Bytes
|
39.9%
|
7
|
Bwd Header Length
|
39.5%
|
8
|
Fwd Header Length
|
39.2%
|
9
|
Fwd Header Length.1
|
38.2%
|
10
|
Fwd PSH Flags
|
35.6%
|
11
|
SYN Flag Count
|
35.6%
|
12
|
Max Packet Length
|
34.8%
|
13
|
Bwd Packet Length Mean
|
34.6%
|
14
|
Avg Bwd Segment Size
|
34.4%
|
15
|
Bwd Packet Length Max
|
33.8%
|
16
|
FIN Flag Count
|
33.5%
|
17
|
Total Backward Packets
|
32.0%
|
18
|
Subflow Bwd Packets
|
32.0%
|
19
|
ACK Flag Count
|
31.7%
|
20
|
Destination Port
|
29.9%
|
21
|
Total Fwd Packets
|
29.7%
|
22
|
Subflow Fwd Packets
|
29.1%
|
23
|
act_data_pkt_fwd
|
25.4%
|
24
|
Min Packet Length
|
25.4%
|
25
|
Fwd Packet Length Min
|
25.2%
|
26
|
Fwd Packet Length Max
|
25.1%
|
27
|
Total Length of Fwd Packets
|
24.7%
|
28
|
Subflow Fwd Bytes
|
23.4%
|
29
|
PSH Flag Count
|
23.3%
|
30
|
Down/Up Ratio
|
23.1%
|
31
|
Bwd Packet Length Std
|
21.9%
|
32
|
Average Packet Size
|
21.8%
|
33
|
Packet Length Mean
|
21.5%
|
34
|
Packet Length Std
|
21.4%
|
BINBUSAYYIS A, VAIYAPURI T. Unsupervised Deep Learning Approach for Network Intrusion Detection Combining Convolutional Autoencoder and One-Class SVM[J]. Applied Intelligence, 2021, 51(10): 7094-7108.
doi:
10.1007/s10489-021-02205-9
JAFAR M T, AL-FAWA'REH M, AL-HRAHSHEH Z, et al. Analysis and Investigation of Malicious DNS Queries Using CIRA-CIC-DoHBrw-2020 Dataset[EB/OL]. [2022-11-17]. https://mjaias.co.uk/mj-en/article/view/24.
SAMMOUR M, HUSSIN B, OTHMAN F I. Comparative Analysis for Detecting DNS Tunneling Using Machine Learning Techniques[J]. International Journal of Applied Engineering Research, 2017, 12(22): 12762-12766.
AIELLO M, MONGELLI M, PAPALEO G. Basic Classifiers for DNS Tunneling Detection[C]// IEEE. 2013 IEEE Symposium on Computers and Communications(ISCC). New York: IEEE, 2013: 880-885.
ZHAO Hong, CHANG Zhaobin, BAO Guangbin, et al. Malicious Domain Names Detection Algorithm Based on N-gram[EB/OL]. [2022-11-17]. https://www.hindawi.com/journals/jcnc/2019/4612474/.
ALLARD F, DUBOIS R, GOMPEL P, et al. Tunneling Activities Detection Using Machine Learning Techniques[J]. Journal of Telecommunications and Information Technology, 2011: 37-42.
BANADAKI Y M. Detecting Malicious DNS over Https Traffic in Domain Name System Using Machine Learning Classifiers[J]. Journal of Computer Sciences and Applications, 2020, 8(2): 46-55.
doi:
10.12691/jcsa-8-2-2
IMAN S, ARASH H, ALI A. CIRA-CIC-DoHBrw-2020[EB/OL]. [2022-11-29]. https://www.unb.ca/cic/datasets/dohbrw-2020.html
IMAN SHARAFALDIN, ARASH Habibi Lashkari, ALI A. Ghorba-ni, Intrusion Detection Evaluation Dataset(CICIDS2017)[EB/OL]. [2022-11-29]. http://www.unb.ca/cic/datasets/ids2017.html.
ZHAO Ruijie, HUANG Yiteng, DENG Xianwen, et al. A Novel Traffic Classifier with Attention Mechanism for Industrial Internet of Things[J]. IEEE Transactions on Industrial Informatics, 2023: 1-12.
DU Min, CHEN Zhi, LIU Chang, et al. Lifelong Anomaly Detection through Unlearning[C]// ACM. Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2019: 1283-1297.
YAN Yu, QI Lin, WANG Jie, et al. A Network Intrusion Detection Method Based on Stacked Autoencoder and LSTM[C]// IEEE.ICC 2020-2020 IEEE International Conference on Communications(ICC). New York: IEEE, 2020: 1-6.
KINGMA D P, BA J. Adam: A Method for Stochastic Optimization[EB/OL]. [2022-12-17]. https://arxiv.org/pdf/1412.6980.pdf.
ZENATI H, ROMAIN M, FOO C S, et al. Adversarially Learned Anomaly Detection[C]// IEEE. 2018 IEEE International Conference on Data Mining(ICDM). New York: IEEE, 2018: 727-736.
XU Haowen, CHEN Wenxiao, ZHAO Nengwen, et al. Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIS in Web Applications[C]// IEEE. Proceedings of the 2018 World Wide Web Conference. New York: IEEE, 2018: 187-196.
BACH S, BINDER A, MONTAVON G, et al. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation[J]. PLOS ONE, 2015, 10(7): 1-46.
沪ICP备12039260号-9
电话:010-88118778/88114408/88111078 E-mail:
[email protected]
地址:北京市海淀区阜成路58号新洲商务大厦6层610 邮编:100142
本系统由北京玛格泰克科技发展有限公司设计开发