数据集中的所有上市公司来自 19 个行业,其中制造业的公司是最多的,有 2667 家。其他行业的上市公司数量分布相对均匀,且数量较少。因此我们就将整个数据分成制造业和其他行业,在此基础上训练的模型效果会更好。

首先,财务风险评估指标体系的构建: SMOTE 采样数据训练机器学习算法去求对公司财务数据造假有较大影响的特征,基于投票打分机制选出排名前30的特征作为不同类型上市公司财务数据评估的关键指标。其次,财务造假预测模型的搭建:本文用深度学习模型代替了传统的机器学习模型,以多层感知机,多层残差网络,Cross 网络作为子网络构建了(DCRN)网络模型,并进行 Bagging 集成提高模型的泛化能力。最终预测效果好,稳定性好,给出了下一年有财务造假风险的公司名单。

本文使用了机器学习算法挑选特征,融合多种深度学习算法模型建立了 Baggin+DCRN 集成学习模型,具有较高的参考价值和实际意义。

Fraudulent financial data of listed companies can cause huge financial losses to shareholders and the securities market. Therefore, investors can effectively reduce investment risks and protect their capital if they can identify companies with falsified data for investment avoidance before investing.

All listed companies in the data set come from 19 industries, with the manufacturing industry having the largest number of companies, with 2667. The number of listed companies in other industries is relatively evenly distributed and the number is small. Therefore, we divided the whole data into manufacturing and other industries, and the model trained on this basis will be more effective.

Firstly, the construction of financial risk assessment index system: this paper uses SMOTE sampling data to train the classical machine learning algorithm to find the features that have a greater impact on the company's financial data falsification, and selects the top 30 features based on the voting scoring mechanism as the key indicators for financial data assessment of different types of listed companies. Second, the construction of the financial falsification prediction model: this paper replaces the traditional machine learning model with a deep learning model, constructs a (DCRN) network model with a multi-layer perceptron, a multi-layer residual network, and a Cross network as sub-networks, and carries out Bagging integration to improve the generalization ability of the model. The final prediction is good and stable, giving a list of companies at risk of financial fraud in the following year.

This paper uses a machine learning algorithm to select features and fuses multiple deep learning algorithm models to build a Baggin+DCRN integrated learning model, which is of high reference value and practical significance.

