7.4.5. 分析工具使用指南 — Horizon Open Explorer

link管理
链接快照平台
输入网页链接，自动生成快照
标签化管理网页链接
7.4.5.1. 总览 ¶

各种工具的使用接口和使用场景总结如下表。除了模型可视化工具，其它工具均在 horizon-plugin-profiler 工具包中。您可以使用 pip3 install horizon-plugin-profiler 命令安装 profiler 工具包。
使用接口/方式
model_profiler
调用其它 debug 工具并将结果集中显示到一个 html 页面;
目前会调用相似度、统计量、共享 op 检查、fuse 检查、weight 比较和量化配置检查这几个工具
fuse 检查
check_unfused_operations
检查 浮点模型 中是否有可以 fuse 但是没有 fuse 的 op pattern
共享 op 检查
get_module_called_count
检查模型中是否有共享使用的 op
量化配置检查
check_qconfig
检查 QAT 模型中量化配置是否符合预期
模型可视化
export_to_onnx /
export_quantized_onnx
导出 onnx 模型以查看模型结构， 不支持 onnx run
相似度对比
featuremap_similarity
当量化模型精度降低时，定位出现问题的 op
get_raw_features /
profile_featuremap
输出模型中每一层输出的数值特征，用于评估当前的数据分布和量化精度是否适合量化
模型 weight 比较
compare_weights
比较模型中每一层 weight 的相似度
qconfig=None
当 QAT 模型训练困难时，通过将模型中的某一部分设置为浮点来寻找精度损失的瓶颈
单算子转换精度调试
set_preserve_qat_mode
当出现 QAT 模型转定点精度降低时，通过此接口将定点模型中的部分 op 替换为 QAT 的形式来寻找精度损失的瓶颈
异构模型部署 device 检查
check_deploy_device
检查异构模型部署时每个 op 是否按照预期运行在 BPU 或者 CPU 上
torchscript 和 hbdk 结果对比
script_profile
比较 horizon_plugin_pytorch 生成的定点 pt 中每一个 op 和 hbdk 的解析结果是否一致
不同版本 torchscript 的结果对比
compare_script_models
比较相同模型，使用不同版本的 horizon_plugin_pytorch 生成的定点 pt 中每一个 op 的结果
模型显存占用分析工具
show_cuda_memory_consumption
分析模型显存占用情况，定位显存瓶颈
7.4.5.2. 集成接口  ¶

为方便使用和查看，horizon_plugin_profiler 提供了一个集成接口 model_profiler ，该接口会调用其它 debug 工具并将结果集中显示到一个 html 页面中，所有其它 debug 工具的结果也会同时保存。目前会调用相似度、统计量、共享 op 检查、fuse 检查、weight 比较和量化配置检查这几个工具。
该接口涉及两个模型之间的比较，fx 模式下，模型转换的过程默认都是 inplace 的，如果需要使用该工具，请您手动在进行转换前 deepcopy 一份原始模型。否则转换后，会错误地比较两个相同模型。
# from horizon_plugin_profiler import model_profiler
def model_profiler(
    model1: torch.nn.Module,
    model2: torch.nn.Module,
    example_inputs: Any,
    mode: str,
    out_dir: Optional[str] = None,
    kwargs_dict: Optional[dict] = None,
    """运行各种检查分析工具并将结果统一展示到一个 html 页面
    该函数会比较：
    1）两个模型中各个 op 的相似度，统计量，weight 的相似度，同时检查模型中的共享 op
    2）检查浮点模型中是否有未 fuse 的 pattern，检查 QAT 模型的量化配置
    结果会统一展示在`profiler.html`中。
        1）该接口仅支持同一个模型的相邻两个阶段，并按转换顺序输入的比较。如`浮点 vs QAT`
        或者`QAT vs 定点`。不支持浮点模型直接和定点模型比较，`QAT 模型 vs 浮点模型`这样
        的输入顺序也是不支持的。
        2）模型结构的 onnx 可视化结果，以及各层 featuremap 的统计直方图并没有在 html 页面中
        显示。您可以手动调用`export_to_onnx/export_quantized_onnx`和
        `profile_featuremap(with_tensorboard=True)`。此外，该接口也支持通过
        `kwargs_dict`参数来传递调用各个 debug 工具时的自定义参数。
        model1: 浮点/校准/QAT模型
        model2: 校准/QAT/定点模型
        example_inputs: 模型输入
        mode：表示进行比较的是哪两个模型，仅支持以下三种模式
            - `FvsQ`：float 模型和 qat/calibration 模型对比
            - `QvsQ`：qat 模型和 quantized 模型对比
            - `CvsQ`：calibration 模型和 qat 模型对比
        out_dir：指定输出的结果文件`profiler.html`和所有 debug 工具调用结果的路径。默认
        为`None`，会在`ckpt_dir`指定的目录下或当前目录下生成`profiler`目录，并将所有
        结果存储在该目录下。
        kwargs_dict：调用其他 debug 工具时的参数，以`dict`的形式给出。**具体的参数可以
        参考上面每个工具的具体介绍**。支持 7 个 key 值
            1）`featuremap_similarity`：相似度
            2）`get_raw_features`：计算每一层 op 输入输出 feature 的相关特征
            3）`profile_featuremap`：统计量函数，输出模型中每一层结果的最大最小值，均
            值和方差等
            4）`get_module_called_count`：检查模型是否有共享 op
            5）`check_unfused_operations`：检查模型是否有未 fuse 的 pattern
            6）`compare_weights`：比较两个模型中 weight 的相似度
            7）`check_qconfig`：检查 QAT 模型中的 Qconfig 配置
                1) `model`和`example_inputs`两个参数已在`model_profiler`接口中定
                义，kwargs_dict 中必须没有这两个参数的定义
                2) kwargs_dict 中的`out_dir`参数会被`model_profiler`接口中的
                `out_dir`参数替换
使用示例：
from copy import deepcopy
import numpy as np
import pytest
import torch
from torch import nn
from torch.quantization import DeQuantStub, QuantStub
import horizon_plugin_pytorch as horizon
from horizon_plugin_pytorch import nn as horizon_nn
from horizon_plugin_pytorch.march import March, set_march
from horizon_plugin_pytorch.nn.quantized import FloatFunctional
from horizon_plugin_pytorch.qat_mode import QATMode, set_qat_mode
from horizon_plugin_pytorch.quantization import (
    convert_fx,
    prepare_qat_fx,
from horizon_plugin_pytorch.quantization.qconfig import (
    default_qat_8bit_fake_quant_qconfig,
from horizon_plugin_profiler import model_profiler
class Conv2dModule(nn.Module):
    def __init__(
        self,
        in_channels,
        out_channels,
        kernel_size=1,
        stride=1,
        padding=0,
        dilation=1,
        groups=1,
        bias=True,
        padding_mode="zeros",
        super().__init__()
        self.conv2d = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size,
            stride,
            padding,
            dilation,
            groups,
            bias,
            padding_mode,
        self.add = FloatFunctional()
        self.bn_mod = nn.BatchNorm2d(out_channels)
        self.relu_mod = nn.ReLU()
    def forward(self, x, y):
        x = self.conv2d(x)
        x = self.bn_mod(x)
        x = self.add.add(x, y)
        x = self.relu_mod(x)
        return x
class TestFuseNet(nn.Module):
    def __init__(self, channels) -> None:
        super().__init__()
        self.quantx = QuantStub()
        self.quanty = QuantStub()
        self.convmod1 = Conv2dModule(channels, channels)
        self.convmod2 = Conv2dModule(channels, channels)
        self.convmod3 = Conv2dModule(channels, channels)
        self.shared_conv = nn.Conv2d(channels, channels, 1)
        self.bn1 = nn.BatchNorm2d(channels)
        self.bn2 = nn.BatchNorm2d(channels)
        self.sub = FloatFunctional()
        self.relu = nn.ReLU()
        self.dequant = DeQuantStub()
    def forward(self, x, y):
        x = self.quantx(x)
        y = self.quanty(y)
        x = self.convmod1(x, y)
        x = self.convmod2(y, x)
        x = self.convmod3(x, y)
        x = self.shared_conv(x)
        x = self.bn1(x)
        y = self.shared_conv(y)
        y = self.bn2(y)
        x = self.sub.sub(x, y)
        x = self.relu(x)
        return self.dequant(x)
set_march(March.BAYES)
device = torch.device("cpu")
data = torch.arange(1 * 3 * 4 * 4) / 100 + 1
data = data.reshape((1, 3, 4, 4))
data = data.to(torch.float32).to(device)
float_net = TestFuseNet(3).to(device)
float_net(data, data)
qat_net = prepare_qat_fx(float_net, {"": default_qat_8bit_fake_quant_qconfig})
qat_net = qat_net.to(device)
qat_net(data, data)
# fx 模式下，需要 deepcopy 转换前的模型
qat_net2 = deepcopy(qat_net)
quantized_net = convert_fx(qat_net2)
model_profiler(qat_net, quantized_net, (data, data), mode="QvsQ")
若没有指定out_dir参数，则会在当前目录下生成horizon_quant_debug文件夹，profiler.html和各个 debug 工具的运行结果均会保存到该文件夹下。每个 debug 工具的输出详解请参考下列各个工具的具体介绍。




    

7.4.5.3. fuse 检查¶
模型 fuse 的正确性包含两方面：
可以 fuse 的算子是否都 fuse 了。
已经 fuse 的算子是否正确。
该接口只能对第一种情况进行检查，对于第二种情况，请使用相似度对比工具对 fuse 前后模型的 feature 相似度进行对比，若发现从某一个算子之后所有 feature 的相似度都有问题，则这个算子的 fuse 可能是错误的（fuse 过程会将几个 op 合并为一个，其他位置用 Identity 代替，因此在这些 Identity 的位置出现 feature 相似度低的情况可能是正常的）。
该接口仅接受浮点模型输入。
# from horizon_plugin_profiler import check_unfused_operations
def check_unfused_operations(
    model: torch.nn.Module,
    example_inputs,
    print_tabulate=True,
"""检查模型中是否有可融合但是未融合的 op。
    该接口只能检查是否有未融合的 op。不能检查融合的正确性，若要检查 op 融合是否正确，
    请使用`featuremap_similarity`接口比较 fuse 前后两个模型的相似度。
        model：输入模型
        example_inputs：模型输入参数
        print_tabulate：是否打印结果。默认为 True。
        List[List[str]]：可融合的 op pattern 列表
使用示例：
该示例为 eager 模式下的示例（手动定义 fuse pattern 并调用 fuse 函数）。若使用 fx 进行量化，会自动对模型中所有可以 fuse 的 pattern 做 fuse 操作。
import horizon_plugin_pytorch as horizon
import numpy as np
import torch
from horizon_plugin_pytorch import nn as horizon_nn
from horizon_plugin_pytorch.march import March, set_march
from horizon_plugin_pytorch.nn.quantized import FloatFunctional
from horizon_plugin_profiler import check_unfused_operations
from torch import nn
from torch.quantization import DeQuantStub, QuantStub
class Conv2dModule(nn.Module):
    def __init__(
        self,
        in_channels,
        out_channels,
        kernel_size=1,
        stride=1,
        padding=0,
        dilation=1,
        groups=1,
        bias=True,
        padding_mode="zeros",
        super().__init__()
        self.conv2d = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size,
            stride,
            padding,
            dilation,
            groups,
            bias,
            padding_mode,
        self.add = FloatFunctional()
        self.bn_mod = nn.BatchNorm2d(out_channels)
        self.relu_mod = nn.ReLU()
    def forward(self, x, y):
        x = self.conv2d(x)
        x = self.bn_mod(x)
        x = self.add.add(x, y)
        x = self.relu_mod(x)
        return x
    def fuse_model(self):
        from horizon_plugin_pytorch.quantization import fuse_modules
        fuse_list = ["conv2d", "bn_mod", "add", "relu_mod"]
        fuse_modules(
            self,
            fuse_list,
            inplace=True,
class TestFuseNet(nn.Module):
    def __init__(self, channels) -> None:
        super().__init__()
        self.convmod1 = Conv2dModule(channels, channels)
        self.convmod2 = Conv2dModule(channels, channels)
        self.convmod3 = Conv2dModule(channels, channels)
        self.shared_conv = nn.Conv2d(channels, channels, 1)
        self.bn1 = nn.BatchNorm2d(channels)
        self.bn2 = nn.BatchNorm2d(channels)
        self.sub = FloatFunctional()
        self.relu = nn.ReLU()
    def forward(self, x, y):
        x = self.convmod1(x, y)
        x = self.convmod2(y, x)
        x = self.convmod3(x, y)
        x = self.shared_conv(x)
        x = self.bn1(x)
        y = self.shared_conv(y)
        y = self.bn2(y)
        x = self.sub.sub(x, y)
        x = self.relu(x)
        return x
    def fuse_model(self):
        self.convmod1.fuse_model()
        self.convmod3.fuse_model()
shape = np.random.randint(10, 20, size=4).tolist()
data0 = torch.rand(size=shape)
data1 = torch.rand(size=shape)
float_net = TestFuseNet(shape[1])
float_net.fuse_model()
check_unfused_operations(float_net, (data0, data1))
输出结果如下：




    

name                 type
-------------------  ------------------------------------------------
shared_conv(shared)  <class 'torch.nn.modules.conv.Conv2d'>
bn1                  <class 'torch.nn.modules.batchnorm.BatchNorm2d'>
name                 type
-------------------  ------------------------------------------------
shared_conv(shared)  <class 'torch.nn.modules.conv.Conv2d'>
bn2                  <class 'torch.nn.modules.batchnorm.BatchNorm2d'>
name               type
-----------------  --------------------------------------------------------------------------------
convmod2.conv2d    <class 'torch.nn.modules.conv.Conv2d'>
convmod2.bn_mod    <class 'torch.nn.modules.batchnorm.BatchNorm2d'>
convmod2.add       <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.FloatFunctional'>
convmod2.relu_mod  <class 'torch.nn.modules.activation.ReLU'>
每一组可以 fuse 但是未 fuse 的 pattern 都会以表格的形式输出，第一列为 module 在模型中定义的 name，第二列为 module 的类型。
7.4.5.4. 共享 op 检查¶
此接口统计并打印模型在一次 forward 过程中每个 op 被调用的次数，以此检查模型中是否存在共享 op。若一个 module 实例在模型中以不同的名字出现了多次，函数会使用第一个名字，且将所有的调用记在这个名字上（您可以看到相关警告）。
# from horizon_plugin_profiler import get_module_called_count
def get_module_called_count(
    model: torch.nn.Module,
    example_inputs,
    check_leaf_module: callable = None,
    print_tabulate: bool = True,
) -> Dict[str, int]:
"""计算模型中叶子节点的调用次数
        model：模型
        example_inputs：模型输入
        check_leaf_module：检查 module 是否是一个叶子节点。默认为 None，使用预定义的
        is_leaf_module，将所有 horizon_plugin_pytorch 中定义的 op 以及未支持的浮点 op 当作为叶子节点。
        print_tabulate：是否打印结果。默认为 True。
        Dict[str, int]：模型中每一层的 name 以及对应的调用次数。
使用示例：
import numpy as np
import torch
from torch import nn
from torch.quantization import DeQuantStub, QuantStub
import horizon_plugin_pytorch as horizon
from horizon_plugin_pytorch import nn as horizon_nn
from horizon_plugin_pytorch.march import March, set_march
from horizon_plugin_pytorch.nn.quantized import FloatFunctional
from horizon_plugin_profiler import get_module_called_count
class Net(nn.Module):
    def __init__(self, quant=False, share_op=True):
        super(Net, self).__init__()
        self.quant_stubx = QuantStub()
        self.quant_stuby = QuantStub()
        self.mul_op = FloatFunctional()
        self.cat_op = FloatFunctional()
        self.quantized_ops = nn.Sequential(
            nn.ReLU(),
            nn.Sigmoid(),
            nn.Softmax(),
            nn.SiLU(),
            horizon_nn.Interpolate(
                scale_factor=2, recompute_scale_factor=True
            horizon_nn.Interpolate(
                scale_factor=2.3, recompute_scale_factor=True
            nn.AvgPool2d(kernel_size=4),
            nn.Upsample(scale_factor=1.3, mode="bilinear"),
            nn.UpsamplingBilinear2d(scale_factor=0.7),
        self.dequant_stub = DeQuantStub()
        self.float_ops = nn.Sequential(
            nn.Tanh(),
            nn.LeakyReLU(),
            nn.PReLU(),
            nn.UpsamplingNearest2d(scale_factor=0.7),
        self.quant = quant
        self.share_op = share_op
    def forward(self, x, y):
        x = self.quant_stubx(x)
        y = self.quant_stuby(y)
        z = self.mul_op.mul(x, y)
        x = self.cat_op.cat((x, y), dim=1)
        if self.share_op:
            x = self.cat_op.cat((x, y), dim=1)
        x = self.quantized_ops(x)
        x = self.dequant_stub(x)
        if not self.quant:
            x = self.float_ops(x)
        return x
shape = np.random.randint(10, 20, size=4).tolist()
data0 = torch.rand(size=shape)
data1 = torch.rand(size=shape)
float_net = Net()
get_module_called_count(float_net, (data0, data1))
输出为一个表格，记录了模型中每个 module 的调用次数。正常情况下，每个 module 均调用 1 次；若为 0 次，则说明该 module 定义了但未被使用；若大于 1 次，则说明该 module 被共享使用了多次：
name               called times
---------------  --------------
quant_stubx                   1
quant_stuby                   1
unused                        0
mul_op                        1
cat_op                        2
quantized_ops.0               1
quantized_ops.1               1
quantized_ops.2               1
quantized_ops.3               1
quantized_ops.4               1
quantized_ops.5               1
quantized_ops.6               1
quantized_ops.7               1
quantized_ops.8               1
dequant_stub                  1
float_ops.0                   1
float_ops.1                   1
float_ops.2                   1
float_ops.3                   1
7.4.5.5. 量化配置检查¶
检查 calibration/QAT 模型中每一层 op 的量化配置。 输入必须为 QAT 或 calibration 模型 。输出结果会保存到 qconfig_info.txt 文件。
# from horizon_plugin_profiler import check_qconfig
def check_qconfig(
    model: torch.nn.Module,
    example_inputs: Any,
    prefixes: Tuple = (),
    types: Tuple = (),
    custom_check_func: Optional[Callable] = None,
    out_dir: Optional[str] = None,
    """检查 calibration/QAT 模型量化配置。
    1）检查模型中每一层的输出 activation 和 weight 的量化配置。配置信息会保存在
    `qconfig_info.txt`中。
    2）检查模型中每一层的输入输出类型
    默认情况下，函数在检查到下列情况时会打印提示信息。
    1）输出层 activation 没有量化
    2）固定 scale
    3）非 int8 量化的 weight（目前仅支持 int8 量化的 weight）
    4）模型输入输出类型不一样
    如果要检查更多的信息，您可以通过`custom_check_func`传入自定义的检查函数
        model：输入模型，必须为 qat 模型
        example_inputs：模型输入
        prefixes：指定要检查量化配置的 op 在模型中对应的 layer name（以 prefixes 开
        头的 layer）
        types：指定要检查量化配置的 op 的类型
        custom_check_func：自定义函数，用于检查其他信息。这个函数在 module 的 hook
        中调用，因此需要定义为如下格式：
            func(module, input, output) -> None
        out_dir：保存结果文件`qconfig_info.txt`的路径。若为 None，则默认保存在当前
使用示例：




    

import numpy as np
import torch
from horizon_plugin_pytorch import nn as horizon_nn
from horizon_plugin_pytorch.dtype import qint16
from horizon_plugin_pytorch.march import March, set_march
from horizon_plugin_pytorch.nn.quantized import FloatFunctional
from horizon_plugin_pytorch.quantization import get_default_qconfig
from horizon_plugin_pytorch.quantization.qconfig import (
    default_qat_8bit_fake_quant_qconfig,
from horizon_plugin_pytorch.quantization.quantize_fx import (
    convert_fx,
    prepare_qat_fx,
from horizon_plugin_pytorch.quantization.observer import FixedScaleObserver
from horizon_plugin_profiler import check_qconfig
from torch import nn
from torch.quantization import DeQuantStub, QuantStub
class Conv2dModule(nn.Module):
    def __init__(
        self,
        in_channels,
        out_channels,
        kernel_size=1,
        stride=1,
        padding=0,
        dilation=1,
        groups=1,
        bias=True,
        padding_mode="zeros",
        super().__init__()
        self.conv2d = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size,
            stride,
            padding,
            dilation,
            groups,
            bias,
            padding_mode,
        self.add = FloatFunctional()
        self.bn_mod = nn.BatchNorm2d(out_channels)
        self.relu_mod = nn.ReLU()
    def forward(self, x, y):
        x = self.conv2d(x)
        x = self.bn_mod(x)
        x = self.add.add(x, y)
        x = self.relu_mod(x)
        return x
class TestFuseNet(nn.Module):
    def __init__(self, channels) -> None:
        super().__init__()
        self.convmod1 = Conv2dModule(channels, channels)
        self.convmod2 = Conv2dModule(channels, channels)
        self.convmod3 = Conv2dModule(channels, channels)
        self.shared_conv = nn.Conv2d(channels, channels, 1)
        self.bn1 = nn.BatchNorm2d(channels)
        self.bn2 = nn.BatchNorm2d(channels)
        self.sub = FloatFunctional()
        self.relu = nn.ReLU()
    def forward(self, x, y):
        x = self.convmod1(x, y)
        x = self.convmod2(y, x)
        x = self.convmod3(x, y)
        x = self.shared_conv(x)
        x = self.bn1(x)
        y = self.shared_conv(y)
        y = self.bn2(y)
        x = self.sub.sub(x, y)
        x = self.relu(x)
        return x
float_net = TestFuseNet(3)
set_march(March.BAYES)
# 手动构造不支持的或特殊的 cases
sub_qconfig = get_default_qconfig(
    # 固定 sub 的输出 scale
    activation_qkwargs={
        "observer": FixedScaleObserver,
        "scale": 1 / 2 ** 15,
        "dtype": qint16,
qat_net = prepare_qat_fx(
    float_net,
        "": get_default_qconfig(
            weight_qkwargs={
                "qscheme": torch.per_channel_symmetric,
                "ch_axis": 0,
                # 不支持 weight 的 int16 量化
                "dtype": qint16,
        "module_name": [("sub", sub_qconfig)]
shape = np.random.randint(10, 20, size=4).tolist()
shape[1] = 3
data0 = torch.rand(size=shape)
data1 = torch.rand(size=shape)
check_qconfig(qat_net, (data0, data1))
输出结果：
qconfig_info.txt
Each layer out qconfig:
+-------------------+----------------------------------------------------------------------------+--------------------+-------------+----------------+
| Module Name       | Module Type                                                                | Input dtype        | out dtype   | ch_axis        |
|-------------------+----------------------------------------------------------------------------+--------------------+-------------+----------------|
| quantx            | <class 'horizon_plugin_pytorch.nn.qat.stubs.QuantStub'>                    | torch.float32      | qint8       | -1             |
| quanty            | <class 'horizon_plugin_pytorch.nn.qat.stubs.QuantStub'>                    | torch.float32      | qint8       | -1             |
| convmod1.add      | <class 'horizon_plugin_pytorch.nn.qat.conv2d.ConvAddReLU2d'>               | ['qint8', 'qint8'] | qint8       | -1             |
| convmod2.conv2d   | <class 'horizon_plugin_pytorch.nn.qat.conv2d.Conv2d'>                      | qint8              | qint8       | -1             |
| convmod2.bn_mod   | <class 'horizon_plugin_pytorch.nn.qat.batchnorm.BatchNorm2d'>              | qint8              | qint8       | -1             |
| convmod2.add[add] | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | ['qint8', 'qint8'] | qint8       | -1             |
| convmod2.relu_mod | <class 'horizon_plugin_pytorch.nn.qat.relu.ReLU'>                          | qint8              | qint8       | qconfig = None |
| convmod3.add      | <class 'horizon_plugin_pytorch.nn.qat.conv2d.ConvAddReLU2d'>               | ['qint8', 'qint8'] | qint8       | -1             |
| shared_conv       | <class 'horizon_plugin_pytorch.nn.qat.conv2d.Conv2d'>                      | qint8              | qint8       | -1             |
| bn1               | <class 'horizon_plugin_pytorch.nn.qat.batchnorm.BatchNorm2d'>              | qint8              | qint8       | -1             |
| shared_conv(1)    | <class 'horizon_plugin_pytorch.nn.qat.conv2d.Conv2d'>                      | qint8              | qint8       | -1             |
| bn2               | <class 'horizon_plugin_pytorch.nn.qat.batchnorm.BatchNorm2d'>              | qint8              | qint8       | -1             |
| sub[sub]          | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | ['qint8', 'qint8'] | qint16      | -1             |
| relu              | <class 'horizon_plugin_pytorch.nn.qat.relu.ReLU'>                          | qint16             | qint16      | qconfig = None |
+-------------------+----------------------------------------------------------------------------+--------------------+-------------+----------------+
Weight qconfig:
+-----------------+--------------------------------------------------------------+----------------+-----------+
| Module Name     | Module Type                                                  | weight dtype   |   ch_axis |
|-----------------+--------------------------------------------------------------+----------------+-----------|
| convmod1.add    | <class 'horizon_plugin_pytorch.nn.qat.conv2d.ConvAddReLU2d'> | qint16         |         0 |
| convmod2.conv2d | <class 'horizon_plugin_pytorch.nn.qat.conv2d.Conv2d'>        | qint16         |         0 |
| convmod3.add    | <class 'horizon_plugin_pytorch.nn.qat.conv2d.ConvAddReLU2d'> | qint16         |         0 |
| shared_conv     | <class 'horizon_plugin_pytorch.nn.qat.conv2d.Conv2d'>        | qint16         |         0 |
| shared_conv(1)  | <class 'horizon_plugin_pytorch.nn.qat.conv2d.Conv2d'>        | qint16         |         0 |
+-----------------+--------------------------------------------------------------+----------------+-----------+
Please check if these OPs qconfigs are expected..
+-----------------+----------------------------------------------------------------------------+------------------------------------------------------------------+
| Module Name     | Module Type                                                                | Msg                                                              |
|-----------------+----------------------------------------------------------------------------+------------------------------------------------------------------|
| convmod1.add    | <class 'horizon_plugin_pytorch.nn.qat.conv2d.ConvAddReLU2d'>               | qint16 weight!!!                                                 |
| convmod2.conv2d | <class 'horizon_plugin_pytorch.nn.qat.conv2d.Conv2d'>                      | qint16 weight!!!                                                 |
| convmod3.add    | <class 'horizon_plugin_pytorch.nn.qat.conv2d.ConvAddReLU2d'>               | qint16 weight!!!                                                 |
| shared_conv     | <class 'horizon_plugin_pytorch.nn.qat.conv2d.Conv2d'>                      | qint16 weight!!!                                                 |
| shared_conv(1)  | <class 'horizon_plugin_pytorch.nn.qat.conv2d.Conv2d'>                      | qint16 weight!!!                                                 |
| sub[sub]        | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | input dtype ['qint8', 'qint8'] is not same with out dtype qint16 |
| sub[sub]        | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | Fixed scale 3.0517578125e-05                                     |
+-----------------+----------------------------------------------------------------------------+------------------------------------------------------------------+
输出的 txt 文件中保存了三个表格，按照从上到下的顺序，每个表格的含义如下：




    

每一层输出的量化信息，从左到右每一列分别表示：
Module Name：每个 module 在模型中定义的 name
Module Type：每个 module 的实际类型
Input dtype：每个 module 的输入类型
out dtype：每个 module 的输出类型
ch_axis：在哪一维度上进行量化。-1 表示 per-tensor 量化；若显示 qconfig=None，则说明该 module 没有配置 qconfig，不会进行量化操作
每一层中 weight 的量化信息，从左到右每一列分别表示：
Module Name：每个 module 在模型中定义的 name
Module Type：每个 module 的实际类型
weight dtype：对 weight 采用的何种量化精度，目前仅支持 qint8 量化
ch_axis：在哪一维度上进行量化。-1 表示 per-tensor 量化；默认 weight 均在第 0 维上量化，若显示 qconfig=None，则说明该 module 的 weight 没有配置 qconfig，不会进行量化操作
模型中特殊量化配置的 module（并不表示配置错误，需要逐个检查）。该表格也会在屏幕上输出。
Module Name：每个 module 在模型中定义的 name
Module Type：每个 module 的实际类型
Msg：特殊的量化配置
Please check if these OPs qconfigs are expected..
+---------------+----------------------------------------------------------------------------+------------------------------------------------------------------+
| Module Name   | Module Type                                                                | Msg                                                              |
|---------------+----------------------------------------------------------------------------+------------------------------------------------------------------|
| convmod1.add  | <class 'horizon_plugin_pytorch.nn.qat.conv2d.ConvAddReLU2d'>               | qint16 weight!!!                                                 |
| convmod2.add  | <class 'horizon_plugin_pytorch.nn.qat.conv2d.ConvAddReLU2d'>               | qint16 weight!!!                                                 |
| convmod3.add  | <class 'horizon_plugin_pytorch.nn.qat.conv2d.ConvAddReLU2d'>               | qint16 weight!!!                                                 |
| bn1           | <class 'horizon_plugin_pytorch.nn.qat.conv2d.Conv2d'>                      | qint16 weight!!!                                                 |
| shared_conv   | <class 'horizon_plugin_pytorch.nn.qat.conv2d.Conv2d'>                      | qint16 weight!!!                                                 |
| sub           | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | input dtype ['qint8', 'qint8'] is not same with out dtype qint16 |
| sub           | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | Fixed scale 3.0517578125e-05                                     |
+---------------+----------------------------------------------------------------------------+------------------------------------------------------------------+
7.4.5.6. 可视化：ONNX 模型可视化¶
目前 horizon_plugin_pytorch 支持任意阶段的模型可视化。这里的可视化指的是可视化模型结构，默认导出 onnx，可以使用 netron 查看。目前导出的 onnx 不支持推理，仅支持可视化查看模型结构。
# from horizon_plugin_pytorch.utils.onnx_helper import (
#     export_to_onnx,
#     export_quantized_onnx,
export_to_onnx(
    model,
    args,
    export_params=True,
    verbose=False,
    training=TrainingMode.EVAL,
    input_names=None,
    output_names=None,
    operator_export_type=OperatorExportTypes.ONNX_FALLTHROUGH,
    do_constant_folding=True,
    example_outputs=None,
    dynamic_axes=None,
    enable_onnx_checker=False,
export_quantized_onnx(
    model,
    args,
    export_params=True,
    verbose=False,
    training=TrainingMode.EVAL,
    input_names=None,
    output_names=None,
    operator_export_type=OperatorExportTypes.ONNX_FALLTHROUGH,
    opset_version=None,
    do_constant_folding=True,
    example_outputs=None,
    dynamic_axes=None,
    keep_initializers_as_inputs=None,
    custom_opsets=None,
参数的含义和 torch.onnx.export 保持一致，唯一的区别是参数operator_export_type=OperatorExportTypes.ONNX_FALLTHROUGH 。
使用时需注意：
浮点模型和 QAT 模型导出 onnx 请使用 export_to_onnx 。
定点模型导出 onnx 请使用 export_quantized_onnx 。
可视化的粒度为
horizon_plugin_pytorch 中自定义的 op，包括浮点 op 和定点 op，op 内部的实现不会被可视化。
浮点模型中使用的社区 op 的可视化粒度由社区决定。
使用示例：
from copy import deepcopy
import torch
from torch import nn
from torch.quantization import DeQuantStub, QuantStub
import horizon_plugin_pytorch as horizon
from horizon_plugin_pytorch import nn as horizon_nn
from horizon_plugin_pytorch.quantization.quantize_fx import (
    convert_fx,
    prepare_qat_fx,
from horizon_plugin_pytorch.quantization.qconfig import (
    default_qat_8bit_fake_quant_qconfig,
from horizon_plugin_pytorch.march import March, set_march
from horizon_plugin_pytorch.nn.quantized import FloatFunctional
from horizon_plugin_pytorch.utils.onnx_helper import (
    export_to_onnx,
    export_quantized_onnx,
class Net(nn.Module):
    def __init__(self, quant=False, share_op=True):
        super(Net, self).__init__()
        self.quant_stubx = QuantStub()
        self.quant_stuby = QuantStub()
        self.mul_op = FloatFunctional()
        self.cat_op = FloatFunctional()
        self.quantized_ops = nn.Sequential(
            nn.ReLU(),
            nn.Sigmoid(),
            nn.Softmax(),
            nn.SiLU(),
            horizon_nn.Interpolate(
                scale_factor=2, recompute_scale_factor=True
            horizon_nn.Interpolate(
                scale_factor=2.3, recompute_scale_factor=True
            nn.AvgPool2d(kernel_size=4),
            nn.Upsample(scale_factor=1.3, mode="bilinear"),
            nn.UpsamplingBilinear2d(scale_factor=0.7),
        self.dequant_stub = DeQuantStub()
        self.float_ops = nn.Sequential(
            nn.Tanh(),
            nn.LeakyReLU(),
            nn.PReLU(),
            nn.UpsamplingNearest2d(scale_factor=0.7),
        self.quant = quant
        self.share_op = share_op
    def forward(self, x, y):
        x = self.quant_stubx(x)
        y = self.quant_stuby(y)
        z = self.mul_op.mul(x, y)
        x = self.cat_op.cat((x, y), dim=1)
        if self.share_op:
            x = self.cat_op.cat((x, y), dim=1)
        x = self.quantized_ops(x)
        x = self.dequant_stub(x)
        if not self.quant:
            x = self.float_ops(x)
        return x
set_march(March.BAYES)
device = torch.device("cuda")
float_net = Net(quant=True, share_op=True).to(device)
float_net2 = deepcopy(float_net)
qat_net = prepare_qat_fx(
    float_net2, {"": default_qat_8bit_fake_quant_qconfig}
qat_net(data, data)
qat_net2 = deepcopy(qat_net)
quantized_net = convert_fx(qat_net2)
data = torch.arange(1 * 3 * 4 * 4) / 100 + 1
data = data.reshape((1, 3, 4, 4))
data = data.to(torch.float32).to(device)
export_to_onnx(float_net, (data, data), "float_test.onnx")
export_to_onnx(qat_net, (data, data), "qat_test.onnx")
export_quantized_onnx(quantized_net, (data, data), "quantized_test.onnx")
7.4.5.7. 相似度对比¶
当出现定点模型相比 QAT 模型精度下降较多的情况时，可以使用相似度对比工具比较模型中每一层输出的相似度，快速定位到是哪一个 op 导致的精度下降。




    

若某一层的输出全为0，使用余弦相似度计算时相似度结果也是0。此时可以检查一下该层输出是否为全0，或者根据打印的 atol 等指标确认一下输出是否相同。若某一层的输出完全相同，使用信噪比计算相似度时结果为inf；
若device=None，工具不会做模型和输入数据的搬运，需要您手动保证模型和模型输入均在同一个device上；
支持任意两阶段的模型以任意输入顺序，在任意两个 device 上比较相似度。推荐按照 float/qat/quantized 的顺序输入，比如（float，qat）（qat，quantized）这样。如果是（qat，float）的顺序，对相似度和单算子误差没有影响，但是结果中相同输入下的单算子误差项可能会有偏差，因为无法生成和 float 模型完全对应的输入给 QAT 模型。此外，因为 QAT 训练之后，模型参数会改变，所以直接比较 float 和训练之后的 QAT 模型的相似度参考意义不大，建议比较 float 和经过 calibration 之后且未训练的 QAT 模型的相似度；
fx 模式下，模型转换的过程默认都是 inplace 的，如果需要使用相似度工具，请您手动在进行转换前 deepcopy 一份原始模型。否则转换后，会错误地比较两个相同模型的相似度。
# from horizon_plugin_profiler import featuremap_similarity
def featuremap_similarity(
    model1: torch.nn.Module,
    model2: torch.nn.Module,
    inputs: Any,
    similarity_func: Union[str, Callable] = "Cosine",
    threshold: Optional[Real] = None,
    devices: Union[torch.device, tuple, None] = None,
    out_dir: Optional[str] = None,
    相似度对比函数，计算并对比两个输入模型中每一层输出特征的相似度。输入模型可以是
    浮点模型、算子融合后的模型、校准模型、QAT 模型或者定点模型。
        model1：可以是浮点模型、算子融合后的模型、校准模型、QAT 模型或者定点模型
        model2：可以是浮点模型、算子融合后的模型、校准模型、QAT 模型或者定点模型
        inputs：模型输入
        similarity_func：计算相似度的方法。默认为余弦相似度 Cosine。支持 Cosine/
            MSE/L1/KL/SQNR/自定义的相似度计算函数。如果是自定义相似度函数，最好返回一个
            常量或者仅有一个数值的 tensor，否则显示的结果可能不符合预期。
        threshold：阈值。默认为 None，会根据不同的相似度计算函数设置成不同的默认阈值。
            如果您传进一个数值，按照相似度比较方法的不同，超过或者小于该阈值的值和对应
            op 的相似度信息会在屏幕打印。
        devices：指定计算相似度时模型在哪个 device 上进行 forward。若为 None，则默认在模
            型输入时的 device 上进行 forward；若仅有一个参数如 torch.device("cpu")，则
            会把两个模型均移动到指定的 device 上 forward；若指定了两个值如
            (torch.device("cpu"), torch.device("cuda"))，则会把两个模型分别移动到
            对应的 device 上 forward。一般用于比较同一个模型同一个阶段的 CPU/GPU 的中间结果。
        out_dir: 指定输出的结果文件和图片的路径。默认为 None，保存到当前路径。
        输出为一个列表，列表中每一项都是一个子列表，每个子列表代表每一层的相似度信息，
        格式为 [索引，模块名，模块类型，相似度，输出值的 scale，最大误差，
        单算子误差（N scale），相同输入时输出的单算子误差（N scale）]
使用示例：
from copy import deepcopy
import torch
from torch import nn
from torch.quantization import DeQuantStub, QuantStub
import horizon_plugin_pytorch as horizon
from horizon_plugin_pytorch import nn as horizon_nn
from horizon_plugin_pytorch.quantization.quantize_fx import (
    convert_fx,
    prepare_qat_fx,
from horizon_plugin_pytorch.quantization.qconfig import (
    default_qat_8bit_fake_quant_qconfig,
from horizon_plugin_pytorch.march import March, set_march
from horizon_plugin_pytorch.nn.quantized import FloatFunctional
from horizon_plugin_profiler import featuremap_similarity
class Net(nn.Module):
    def __init__(self, quant=False, share_op=True):
        super(Net, self).__init__()
        self.quant_stubx = QuantStub()
        self.quant_stuby = QuantStub()
        self.mul_op = FloatFunctional()
        self.cat_op = FloatFunctional()
        self.quantized_ops = nn.Sequential(
            nn.ReLU(),
            nn.Sigmoid(),
            nn.Softmax(),
            nn.SiLU(),
            horizon_nn.Interpolate(
                scale_factor=2, recompute_scale_factor=True
            horizon_nn.Interpolate(
                scale_factor=2.3, recompute_scale_factor=True
            nn.AvgPool2d(kernel_size=4),
            nn.Upsample(scale_factor=1.3, mode="bilinear"),
            nn.UpsamplingBilinear2d(scale_factor=0.7),
        self.dequant_stub = DeQuantStub()
        self.float_ops = nn.Sequential(
            nn.Tanh(),
            nn.LeakyReLU(),
            nn.PReLU(),
            nn.UpsamplingNearest2d(scale_factor=0.7),
        self.quant = quant
        self.share_op = share_op
    def forward(self, x, y):
        x = self.quant_stubx(x)
        y = self.quant_stuby(y)
        z = self.mul_op.mul(x, y)
        x = self.cat_op.cat((x, y), dim=1)
        if self.share_op:
            x = self.cat_op.cat((x, y), dim=1)
        x = self.quantized_ops(x)
        x = self.dequant_stub(x)
        if not self.quant:
            x = self.float_ops(x)
        return x
set_march(March.BAYES)
device = torch.device("cuda")
float_net = Net(quant=True, share_op=True).to(device)
# fx 均为 inplace 的修改，如果需要比较相似度，需要手动将模型 deepcopy 一份再进行转换
float_net2 = deepcopy(float_net)
qat_net = prepare_qat_fx(
    float_net2, {"": default_qat_8bit_fake_quant_qconfig}
qat_net(data, data)
qat_net2 = deepcopy(qat_net)
bpu_net = convert_fx(qat_net2)
data = torch.arange(1 * 3 * 4 * 4) / 100 + 1
data = data.reshape((1, 3, 4, 4))
data = data.to(torch.float32).to(device)
featuremap_similarity(qat_net, bpu_net, (data, data))
运行后会在当前目录或者 out_dir 参数指定的目录下生成如下文件：




    

similarity.txt：以表格的形式，按照模型 forward 的顺序打印每一层的相似度和单算子误差等结果，表格中从左到右每一列分别为：
Index：索引，按照模型 forward 顺序，从 0 开始为模型中每一个 op 编号。无实际意义，用于相似度图像中的横轴编号；
Module Name：该 op 在模型中定义使用的名字，如 backbone.mod1.conv；不同格式的后缀代理了不同的含义：
若模块名有后缀’(I)’，表示该 op 在某一个模型中为 Identity；
若模块名有后缀’(I vs I)’，表示该 op 在比较的两个模型中均为 Identity；
若模块名有后缀’(i)’ （i >= 1），表示该层为共享 op，且被共享了 i 次，目前是第 i+1 次调用。共享 op 第 1 次被调用时和其他 op 一样，不带后缀。
Module Type：该 op 的类型，如 torch.nn.Conv2d，horizon_plugin_pytorch.nn.qat.stubs.QuantStub 等；
Similarity：两个模型中对应 op 输出的相似度。一般来说，如果某一层相似度突然大幅降低且后续没有上升，则大概率就是该层导致的模型精度降低，可以结合统计量等工具对该层进一步分析；
qscale：量化模型中该 op 的 scale 值；如果是 per-channel 量化，则不会输出；
Acc Error(float atol)：两个模型中对应 op 输出的最大差值，Acc Error = N * qscale；
Acc Error(N out_qscale)：两个模型中对应 op 输出的最大差值为几个 scale；
Op Error with Same Input (N out_qscale)：两个模型中对应 op 的输入若完全相同（排除累积误差的影响），输出的最大差值为几个 scale。理论上相同输入下的单算子误差应该都在几个 scale 之内，如果相差很大，则说明该 op 转换可能存在问题导致结果相差很多。
---------------------------------------------------------------
Note:
* Suffix '(I)' means this layer is Identity in one model
* Suffix '(I vs I)' means this layer is Identity in both models
* Suffix '(i)'(i >= 1) means this op is shared i times
---------------------------------------------------------------
+---------+----------------------------+----------------------------------------------------------------------------+--------------+-----------+----------------+------------------+------------------------+
| Index   | Module Name                | Module Type                                                                | Similarity   | qscale    | Acc Error      | Acc Error        | Op Error with Same     |
|         |                            |                                                                            |              |           | (float atol)   | (N out_qscale)   | Input (N out_qscale)   |
|---------+----------------------------+----------------------------------------------------------------------------+--------------+-----------+----------------+------------------+------------------------|
| 0       | quant_stubx                | <class 'horizon_plugin_pytorch.nn.qat.stubs.QuantStub'>                    | 1.0000000    | 0.0115294 | 0.0000000      | 0                | 0                      |
| 1       | quant_stuby                | <class 'horizon_plugin_pytorch.nn.qat.stubs.QuantStub'>                    | 1.0000000    | 0.0115294 | 0.0000000      | 0                | 0                      |
| 2       | mul_op                     | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | 0.9999989    | 0.0168156 | 0.0168156      | 1                | 1                      |
| 3       | cat_op                     | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | 0.9999971    | 0.0167490 | 0.0334979      | 2                | 0                      |
| 4       | cat_op(1)                  | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | 0.9999980    | 0.0167490 | 0.0334979      | 2                | 0                      |
| 5       | quantized_ops.0            | <class 'horizon_plugin_pytorch.nn.qat.relu.ReLU'>                          | 0.9999980    | 0.0167490 | 0.0334979      | 2                | 0                      |
| 6       | quantized_ops.1            | <class 'horizon_plugin_pytorch.nn.qat.segment_lut.SegmentLUT'>             | 1.0000000    | 0.0070079 | 0.0000000      | 0                | 0                      |
| 7       | quantized_ops.2.sub        | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | 0.9999999    | 0.0000041 | 0.0000041      | 1                | 1                      |
| 8       | quantized_ops.2.exp        | <class 'horizon_plugin_pytorch.nn.qat.segment_lut.SegmentLUT'>             | 1.0000000    | 0.0000305 | 0.0000305      | 1                | 1                      |
| 9       | quantized_ops.2.sum        | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | 1.0000000    | 0.0002541 | 0.0005081      | 2                | 2                      |
| 10      | quantized_ops.2.reciprocal | <class 'horizon_plugin_pytorch.nn.qat.segment_lut.SegmentLUT'>             | 1.0000001    | 0.0000037 | 0.0000186      | 5                | 5                      |
| 11      | quantized_ops.2.mul        | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | 1.0000000    | 0.0009545 | 0.0000000      | 0                | 0                      |
| 12      | quantized_ops.3            | <class 'horizon_plugin_pytorch.nn.qat.segment_lut.SegmentLUT'>             | 1.0000000    | 0.0005042 | 0.0000000      | 0                | 0                      |
| 13      | quantized_ops.4            | <class 'horizon_plugin_pytorch.nn.qat.interpolate.Interpolate'>            | 1.0000000    | 0.0005042 | 0.0005042      | 1                | 1                      |
| 14      | quantized_ops.5            | <class 'horizon_plugin_pytorch.nn.qat.interpolate.Interpolate'>            | 0.9999999    | 0.0005042 | 0.0005042      | 1                | 0                      |
| 15      | quantized_ops.6            | <class 'horizon_plugin_pytorch.nn.qat.avg_pool2d.AvgPool2d'>               | 0.9999995    | 0.0005022 | 0.0005022      | 1                | 1                      |
| 16      | quantized_ops.7            | <class 'horizon_plugin_pytorch.nn.qat.upsampling.Upsample'>                | 0.9999998    | 0.0005022 | 0.0005022      | 1                | 0                      |
| 17      | quantized_ops.8            | <class 'horizon_plugin_pytorch.nn.qat.upsampling.UpsamplingBilinear2d'>    | 1.0000000    | 0.0005022 | 0.0000000      | 0                | 0                      |
| 18      | dequant_stub               | <class 'horizon_plugin_pytorch.nn.qat.stubs.DeQuantStub'>                  | 1.0000000    |           | 0.0000000      | 0                | 0                      |
+---------+----------------------------+----------------------------------------------------------------------------+--------------+-----------+----------------+------------------+------------------------+
ordered_op_error_similarity.txt：同样以表格的形式，按照相同输入下单算子误差从高到低进行排序的结果，方便您快速定位是哪个 op 的 convert 误差较大，表格中每一列的含义和 similarity.txt 相同。
---------------------------------------------------------------
Note:
* Suffix '(I)' means this layer is Identity in one model
* Suffix '(I vs I)' means this layer is Identity in both models
* Suffix '(i)'(i >= 1) means this op is shared i times
---------------------------------------------------------------
+---------+----------------------------+----------------------------------------------------------------------------+--------------+-----------+----------------+------------------+------------------------+
| Index   | Module Name                | Module Type                                                                | Similarity   | qscale    | Acc Error      | Acc Error        | Op Error with Same     |
|         |                            |                                                                            |              |           | (float atol)   | (N out_qscale)   | Input (N out_qscale)   |
|---------+----------------------------+----------------------------------------------------------------------------+--------------+-----------+----------------+------------------+------------------------|
| 10      | quantized_ops.2.reciprocal | <class 'horizon_plugin_pytorch.nn.qat.segment_lut.SegmentLUT'>             | 1.0000001    | 0.0000037 | 0.0000186      | 5                | 5                      |
| 9       | quantized_ops.2.sum        | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | 1.0000000    | 0.0002541 | 0.0005081      | 2                | 2                      |
| 2       | mul_op                     | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | 0.9999989    | 0.0168156 | 0.0168156      | 1                | 1                      |
| 7       | quantized_ops.2.sub        | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | 0.9999999    | 0.0000041 | 0.0000041      | 1                | 1                      |
| 8       | quantized_ops.2.exp        | <class 'horizon_plugin_pytorch.nn.qat.segment_lut.SegmentLUT'>             | 1.0000000    | 0.0000305 | 0.0000305      | 1                | 1                      |
| 13      | quantized_ops.4            | <class 'horizon_plugin_pytorch.nn.qat.interpolate.Interpolate'>            | 1.0000000    | 0.0005042 | 0.0005042      | 1                | 1                      |
| 15      | quantized_ops.6            | <class 'horizon_plugin_pytorch.nn.qat.avg_pool2d.AvgPool2d'>               | 0.9999995    | 0.0005022 | 0.0005022      | 1                | 1                      |
| 0       | quant_stubx                | <class 'horizon_plugin_pytorch.nn.qat.stubs.QuantStub'>                    | 1.0000000    | 0.0115294 | 0.0000000      | 0                | 0                      |
| 1       | quant_stuby                | <class 'horizon_plugin_pytorch.nn.qat.stubs.QuantStub'>                    | 1.0000000    | 0.0115294 | 0.0000000      | 0                | 0                      |
| 3       | cat_op                     | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | 0.9999971    | 0.0167490 | 0.0334979      | 2                | 0                      |
| 4       | cat_op(1)                  | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | 0.9999980    | 0.0167490 | 0.0334979      | 2                | 0                      |
| 5       | quantized_ops.0            | <class 'horizon_plugin_pytorch.nn.qat.relu.ReLU'>                          | 0.9999980    | 0.0167490 | 0.0334979      | 2                | 0                      |
| 6       | quantized_ops.1            | <class 'horizon_plugin_pytorch.nn.qat.segment_lut.SegmentLUT'>             | 1.0000000    | 0.0070079 | 0.0000000      | 0                | 0                      |
| 11      | quantized_ops.2.mul        | <class 'horizon_plugin_pytorch.nn.qat.functional_modules.FloatFunctional'> | 1.0000000    | 0.0009545 | 0.0000000      | 0                | 0                      |
| 12      | quantized_ops.3            | <class 'horizon_plugin_pytorch.nn.qat.segment_lut.SegmentLUT'>             | 1.0000000    | 0.0005042 | 0.0000000      | 0                | 0                      |
| 14      | quantized_ops.5            | <class 'horizon_plugin_pytorch.nn.qat.interpolate.Interpolate'>            | 0.9999999    | 0.0005042 | 0.0005042      | 1                | 0                      |
| 16      | quantized_ops.7            | <class 'horizon_plugin_pytorch.nn.qat.upsampling.Upsample'>                | 0.9999998    | 0.0005022 | 0.0005022      | 1                | 0                      |
| 17      | quantized_ops.8            | <class 'horizon_plugin_pytorch.nn.qat.upsampling.UpsamplingBilinear2d'>    | 1.0000000    | 0.0005022 | 0.0000000      | 0                | 0                      |
| 18      | dequant_stub               | <class 'horizon_plugin_pytorch.nn.qat.stubs.DeQuantStub'>                  | 1.0000000    |           | 0.0000000      | 0                | 0                      |
+---------+----------------------------+----------------------------------------------------------------------------+--------------+-----------+----------------+------------------+------------------------+
similarity.html：一个可交互的图片，显示随着模型 forward，每一层相似度的变化曲线。可以放大缩小，光标移动到对应的点可以显示具体的相似度数值（这里展示的是 html 网页截图，没有交互功能）。
7.4.5.8. 统计量¶
计算模型中每一层输入输出的数值特征 min/max/mean/var/scale 。统计量可以帮助您观察当前模型中的数据分布情况，并评估需要选用何种量化精度（int8/int16 量化）。该工具也会同时检查模型中是否有 NaN 或者 inf 这样的数值异常层。
# from horizon_plugin_profiler import get_raw_features, profile_featuremap
get_raw_features(
    model: torch.nn.Module,
    example_inputs: Any,
    prefixes: Tuple = (),
    types: Tuple = (),
    device: torch.device = None,
    preserve_int: bool = False,
    use_class_name: bool = False,
    skip_identity: bool = False,
        model：需要输出统计量的模型
        example_inputs：model 的输入
        prefixes：指定要输出统计量的 op 在模型中对应的 layer name（以 prefixes 开头
        的 layer）
        types：指定要输出统计量的 op 的类型
        device：指定模型在 CPU/GPU 上 forward
        preserve_int：是否以定点数值的形式输出。默认输出为浮点值。该参数仅对 qat 和定
            点模型生效，且只会在该层输出有 scale 的情况下生效（比如，dequant 层输出的结
            果是浮点，该参数就不起效果）
        use_class_name：是否打印每一层 op 的 name，默认打印的是 op 的类型
        skip_identity：是否跳过 Identity op 的统计。默认所有类型的 op 都会输出统计量
        list(dict)：返回的是一个列表，列表里的每个元素都是 dict，表示每一层的输入输出
        值和一些参数值，格式如下
        - "module_name": (str) 该 module 在原模型中的名字
        - "attr": (str) module 的属性。可以是 input/output/weight/bias 等等。
          input/output 表示这一层的输入/输出，其他的则表示 module 中的参数
        - "data": (Tensor) 该层对应属性的数值。若数据为 QTensor，这里记录的是反量化
          之后的数值
        - "scale": (Tensor | None) 若 data 为 QTensor，表示对应的 scale，可能是
          per-tensor 量化的 scale，也可能是 per-channel 量化的 scale；否则为 None
        - "ch_axis": (int) 若 data 为 per-channel 量化的数据，表示量化的维度。否则为 -1
        - “ff_method”: (str) 若当前module为FloatFunctional/QFunctional，记录实际
          调用的 method（add/sub/mul/...）。否则为 None
profile_featuremap(
    featuremap: List[Dict],
    with_tensorboard: bool = False,
    tensorboard_dir: Optional[str] = None,
    print_per_channel_scale: bool = False,
    show_per_channel: bool = False,
    out_dir: Optional[str] = None,
    file_name: Optional[str] = None,
        featuremap：get_raw_features 的输出
        with_tensorboard：是否使用 tensorboard 显示数据分布。默认 False
        tensorboard_dir：tensorboard log 文件路径。默认 None。仅在
        with_tensorboard=True 时有效
        print_per_channel_scale：是否打印 per channel 量化的 scale。默认 False。
        show_per_channel：在 tensorboard 中是否以 per channel 的方式显示 feature
            中每个 channel 的数据直方图。默认为 False。
        out_dir：指定输出的结果文件和图片的路径。若未指定，则默认保存到当前路径。
        file_name：保存的文件和图片的名字。若未指定，默认为“statistic.txt”和一个
            可交互的“statistic.html”。
默认两个接口配合使用 profile_featuremap(get_raw_features(model, example_inputs), with_tensorboard=True)。




    

默认会将统计量结果保存到 statistic.txt，并将结果绘图，保存到statistic.html文件，可用浏览器打开查看。
若您需要统计其他信息，可以自定义 featuremap 统计处理函数，处理 get_raw_features 函数的返回数据。
函数 get_raw_features 使用插入 hooks 的方法记录模型每一层的输入输出。但是社区的 hooks 暂时不支持 kwargs （参考这里），这会导致两个问题。
cat((x,y), 1)：这种写法，参数dim=1会被过滤掉，只记录 x 和 y 两个 tensor，这也符合预期；
cat(x=(x,y), dim=1)：这种写法下，两个关键字参数在 hook 运行时不会起作用。目前没有方法处理这样的情况，需要您保证模型 forward 时 tensor 类型的数据不是以关键字参数的形式传递的 。
import torch
from torch import nn
from torch.quantization import DeQuantStub, QuantStub
import horizon_plugin_pytorch as horizon
from horizon_plugin_pytorch import nn as horizon_nn
from horizon_plugin_pytorch.quantization.quantize_fx import (
    convert_fx,
    prepare_qat_fx,
from horizon_plugin_pytorch.quantization.qconfig import (
    default_qat_8bit_fake_quant_qconfig,
from horizon_plugin_pytorch import nn as horizon_nn
from horizon_plugin_pytorch.march import March, set_march
from horizon_plugin_pytorch.nn.quantized import FloatFunctional
from horizon_plugin_profiler import get_raw_features, profile_featuremap
class Net(nn.Module):
    def __init__(self, quant=False, share_op=True):
        super(Net, self).__init__()
        self.quant_stubx = QuantStub()
        self.quant_stuby = QuantStub()
        self.mul_op = FloatFunctional()
        self.cat_op = FloatFunctional()
        self.quantized_ops = nn.Sequential(
            nn.ReLU(),
            nn.Sigmoid(),
            nn.Softmax(),
            nn.SiLU(),
            horizon_nn.Interpolate(
                scale_factor=2, recompute_scale_factor=True
            horizon_nn.Interpolate(
                scale_factor=2.3, recompute_scale_factor=True
            nn.AvgPool2d(kernel_size=4),
            nn.Upsample(scale_factor=1.3, mode="bilinear"),
            nn.UpsamplingBilinear2d(scale_factor=0.7),
        self.dequant_stub = DeQuantStub()
        self.float_ops = nn.Sequential(
            nn.Tanh(),
            nn.LeakyReLU(),
            nn.PReLU(),
            nn.UpsamplingNearest2d(scale_factor=0.7),
        self.quant = quant
        self.share_op = share_op
    def forward(self, x, y):
        x = self.quant_stubx(x)
        y = self.quant_stuby(y)
        z = self.mul_op.mul(x, y)
        x = self.cat_op.cat((x, y), dim=1)
        if self.share_op:
            x = self.cat_op.cat((x, y), dim=1)
        x = self.quantized_ops(x)
        x = self.dequant_stub(x)
        if not self.quant:
            x = self.float_ops(x)
        return x
set_march(March.BAYES)
device = torch.device("cuda")
float_net = Net(quant=True, share_op=True).to(device)
qat_net = prepare_qat_fx(
    float_net, {"": default_qat_8bit_fake_quant_qconfig}
qat_net = qat_net.to(device)
data = torch.arange(1 * 3 * 4 * 4) / 100 + 1
data = data.reshape((1, 3, 4, 4))
data = data.to(torch.float32).to(device)
profile_featuremap(get_raw_features(qat_net, (data, data)), True)
运行后会在当前目录或者out_dir参数指定的目录下生成如下文件：




    

statistic.txt：以表格的形式，输出每一层输入输出的统计信息。表格中从左到右每一列分别表示：
Module Index：索引，按照模型 forward 顺序，从 0 开始为模型中每一个 op 编号。无实际意义，用于相似度图像中的横轴编号；
Module Name：该 op 在模型中定义使用的名字，如 backbone.mod1.conv；不同格式的后缀代理了不同的含义：
若模块名有后缀’(i)’ （i >= 1），表示该层为共享 op，且被共享了 i 次，目前是第 i+1 次调用。共享 op 第 1 次被调用时和其他 op 一样，不带后缀。
Module Type：该 op 的类型，如 torch.nn.Conv2d，horizon_plugin_pytorch.nn.qat.stubs.QuantStub 等；
属性：当前行打印的是 module 哪一个属性，可以是输入、输出、weight、bias 等；
Min：数据的最小值；
Max：数据的最大值。通过 min 和 max 可以得到当前的数据范围，结合 scale 数值可以判断当前量化精度（int8/int16）是否满足精度要求；
Mean：数据的均值；
Var：数据的方差。若方差为 NaN，且 min=max=mean，说明仅有一个数值；若方差很大，说明该组数组分布不均匀，可能不适合量化；
Scale：数据的量化 scale，若为空，说明该组数据是 per-channel 量化或者没有量化；
Dtype：当前层的量化 dtype，如 qint8/qint16。若当前层没有量化，则直接打印浮点数据类型。
正常情况下，statistic.txt 中会包含两个上述格式的表格，一个是按照模型 forward 顺序打印的每一层的统计量；另一个是按照量化数据的范围从大到小打印的每一层的统计量信息，方便您快速定位到某些数值范围很大的层。若模型中某些层存在 NaN 或者 inf，那 statistic.txt 中也会额外包含一个哪些层 NaN 或者 inf 的表格，该表格也会在屏幕打印，提示您检查这些异常层。
+----------------+----------------------------+-------------------------------------------------------------------------------+---------------------+------------+-----------+------------+-----------+-----------+---------------+
| Module Index   | Module Name                | Module Type                                                                   | Input/Output/Attr   | Min        | Max       | Mean       | Var       | Scale     | Dtype         |
|----------------+----------------------------+-------------------------------------------------------------------------------+---------------------+------------+-----------+------------+-----------+-----------+---------------|
| 0              | quant_stubx                | <class 'horizon_plugin_pytorch.nn.quantized.quantize.Quantize'>               | input               | -2.9943717 | 2.9613159 | -0.0791836 | 2.7670853 |           | torch.float32 |
| 0              | quant_stubx                | <class 'horizon_plugin_pytorch.nn.quantized.quantize.Quantize'>               | output              | -2.9826291 | 2.9591436 | -0.0786467 | 2.7688842 | 0.0234853 | qint8         |
| 1              | quant_stuby                | <class 'horizon_plugin_pytorch.nn.quantized.quantize.Quantize'>               | input               | 0.5011058  | 0.9995295 | 0.7525039  | 0.0210502 |           | torch.float32 |
| 1              | quant_stuby                | <class 'horizon_plugin_pytorch.nn.quantized.quantize.Quantize'>               | output              | 0.5017246  | 0.9956098 | 0.7525385  | 0.0210164 | 0.0078394 | qint8         |
| 2              | mul_op[mul]                | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-0             | -2.9826291 | 2.9591436 | -0.0786467 | 2.7688842 | 0.0234853 | qint8         |
| 2              | mul_op[mul]                | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-1             | 0.5017246  | 0.9956098 | 0.7525385  | 0.0210164 | 0.0078394 | qint8         |
| 2              | mul_op[mul]                | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | output              | -2.9577060 | 2.5648856 | -0.0374420 | 1.5830494 | 0.0231071 | qint8         |
| 3              | cat_op[cat]                | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-0-0           | -2.9826291 | 2.9591436 | -0.0786467 | 2.7688842 | 0.0234853 | qint8         |
| 3              | cat_op[cat]                | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-0-1           | -2.9577060 | 2.5648856 | -0.0374420 | 1.5830494 | 0.0231071 | qint8         |
| 3              | cat_op[cat]                | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | output              | -2.9942081 | 2.9474237 | -0.0580113 | 2.1627743 | 0.0233923 | qint8         |
| 4              | cat_op[cat](1)             | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-0             | -2.9942081 | 2.9474237 | -0.0580113 | 2.1627743 | 0.0233923 | qint8         |
| 4              | cat_op[cat](1)             | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-1             | 0.5017246  | 0.9956098 | 0.7525385  | 0.0210164 | 0.0078394 | qint8         |
| 4              | cat_op[cat](1)             | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | output              | -2.9942081 | 2.9474237 | 0.2123352  | 1.5946714 | 0.0233923 | qint8         |
| 5              | quantized_ops.0            | <class 'horizon_plugin_pytorch.nn.quantized.relu.ReLU'>                       | input               | -2.9942081 | 2.9474237 | 0.2123352  | 1.5946714 | 0.0233923 | qint8         |
| 5              | quantized_ops.0            | <class 'horizon_plugin_pytorch.nn.quantized.relu.ReLU'>                       | output              | 0.0000000  | 2.9474237 | 0.6510122  | 0.4357365 | 0.0233923 | qint8         |
| 6              | quantized_ops.1            | <class 'horizon_plugin_pytorch.nn.quantized.segment_lut.SegmentLUT'>          | input               | 0.0000000  | 2.9474237 | 0.6510122  | 0.4357365 | 0.0233923 | qint8         |
| 6              | quantized_ops.1            | <class 'horizon_plugin_pytorch.nn.quantized.segment_lut.SegmentLUT'>          | output              | 0.4992901  | 0.9464155 | 0.6408262  | 0.0163976 | 0.0074521 | qint8         |
| 7              | quantized_ops.2.sub[sub]   | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-0             | 0.4992901  | 0.9464155 | 0.6408262  | 0.0163976 | 0.0074521 | qint8         |
| 7              | quantized_ops.2.sub[sub]   | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-1             | 0.6334277  | 0.9464155 | 0.7888176  | 0.0090090 | 0.0074521 | qint8         |
| 7              | quantized_ops.2.sub[sub]   | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | output              | -0.4471186 | 0.0000000 | -0.1479909 | 0.0140247 | 0.0000136 | qint16        |
| 8              | quantized_ops.2.exp        | <class 'horizon_plugin_pytorch.nn.quantized.segment_lut.SegmentLUT'>          | input               | -0.4471186 | 0.0000000 | -0.1479909 | 0.0140247 | 0.0000136 | qint16        |
| 8              | quantized_ops.2.exp        | <class 'horizon_plugin_pytorch.nn.quantized.segment_lut.SegmentLUT'>          | output              | 0.6394446  | 0.9999847 | 0.8683713  | 0.0100195 | 0.0000305 | qint16        |
| 9              | quantized_ops.2.sum[sum]   | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input               | 0.6394446  | 0.9999847 | 0.8683713  | 0.0100195 | 0.0000305 | qint16        |
| 9              | quantized_ops.2.sum[sum]   | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | output              | 4.6700654  | 5.9043884 | 5.2101822  | 0.0529649 | 0.0001802 | qint16        |
| 10             | quantized_ops.2.reciprocal | <class 'horizon_plugin_pytorch.nn.quantized.segment_lut.SegmentLUT'>          | input               | 4.6700654  | 5.9043884 | 5.2101822  | 0.0529649 | 0.0001802 | qint16        |
| 10             | quantized_ops.2.reciprocal | <class 'horizon_plugin_pytorch.nn.quantized.segment_lut.SegmentLUT'>          | output              | 0.1693695  | 0.2141069 | 0.1923085  | 0.0000730 | 0.0000065 | qint16        |
| 11             | quantized_ops.2.mul[mul]   | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-0             | 0.6394446  | 0.9999847 | 0.8683713  | 0.0100195 | 0.0000305 | qint16        |
| 11             | quantized_ops.2.mul[mul]   | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-1             | 0.1693695  | 0.2141069 | 0.1923085  | 0.0000730 | 0.0000065 | qint16        |
| 11             | quantized_ops.2.mul[mul]   | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | output              | 0.1326724  | 0.2132835 | 0.1666716  | 0.0003308 | 0.0016794 | qint8         |
| 12             | quantized_ops.3            | <class 'horizon_plugin_pytorch.nn.quantized.segment_lut.SegmentLUT'>          | input               | 0.1326724  | 0.2132835 | 0.1666716  | 0.0003308 | 0.0016794 | qint8         |
| 12             | quantized_ops.3            | <class 'horizon_plugin_pytorch.nn.quantized.segment_lut.SegmentLUT'>          | output              | 0.0703202  | 0.1175087 | 0.0903590  | 0.0001112 | 0.0009253 | qint8         |
| 13             | quantized_ops.4            | <class 'horizon_plugin_pytorch.nn.quantized.interpolate.Interpolate'>         | input               | 0.0703202  | 0.1175087 | 0.0903590  | 0.0001112 | 0.0009253 | qint8         |
| 13             | quantized_ops.4            | <class 'horizon_plugin_pytorch.nn.quantized.interpolate.Interpolate'>         | output              | 0.0712454  | 0.1147329 | 0.0903947  | 0.0000526 | 0.0009253 | qint8         |
| 14             | quantized_ops.5            | <class 'horizon_plugin_pytorch.nn.quantized.interpolate.Interpolate'>         | input               | 0.0712454  | 0.1147329 | 0.0903947  | 0.0000526 | 0.0009253 | qint8         |
| 14             | quantized_ops.5            | <class 'horizon_plugin_pytorch.nn.quantized.interpolate.Interpolate'>         | output              | 0.0712454  | 0.1147329 | 0.0903947  | 0.0000461 | 0.0009253 | qint8         |
| 15             | quantized_ops.6            | <class 'horizon_plugin_pytorch.nn.quantized.avg_pool2d.AvgPool2d'>            | input               | 0.0712454  | 0.1147329 | 0.0903947  | 0.0000461 | 0.0009253 | qint8         |
| 15             | quantized_ops.6            | <class 'horizon_plugin_pytorch.nn.quantized.avg_pool2d.AvgPool2d'>            | output              | 0.0747764  | 0.1091563 | 0.0903856  | 0.0000372 | 0.0008595 | qint8         |
| 16             | quantized_ops.7            | <class 'horizon_plugin_pytorch.nn.quantized.upsampling.Upsample'>             | input               | 0.0747764  | 0.1091563 | 0.0903856  | 0.0000372 | 0.0008595 | qint8         |
| 16             | quantized_ops.7            | <class 'horizon_plugin_pytorch.nn.quantized.upsampling.Upsample'>             | output              | 0.0756359  | 0.1074373 | 0.0903877  | 0.0000286 | 0.0008595 | qint8         |
| 17             | quantized_ops.8            | <class 'horizon_plugin_pytorch.nn.quantized.upsampling.UpsamplingBilinear2d'> | input               | 0.0756359  | 0.1074373 | 0.0903877  | 0.0000286 | 0.0008595 | qint8         |
| 17             | quantized_ops.8            | <class 'horizon_plugin_pytorch.nn.quantized.upsampling.UpsamplingBilinear2d'> | output              | 0.0773549  | 0.1048589 | 0.0903853  | 0.0000251 | 0.0008595 | qint8         |
| 18             | dequant_stub               | <class 'horizon_plugin_pytorch.nn.quantized.quantize.DeQuantize'>             | input               | 0.0773549  | 0.1048589 | 0.0903853  | 0.0000251 | 0.0008595 | qint8         |
| 18             | dequant_stub               | <class 'horizon_plugin_pytorch.nn.quantized.quantize.DeQuantize'>             | output              | 0.0773549  | 0.1048589 | 0.0903853  | 0.0000251 |           | torch.float32 |
+----------------+----------------------------+-------------------------------------------------------------------------------+---------------------+------------+-----------+------------+-----------+-----------+---------------+
Statistics with quant range in descending order...
+----------------+----------------------------+-------------------------------------------------------------------------------+---------------------+------------+-----------+------------+-----------+-----------+---------------+
| Module Index   | Module Name                | Module Type                                                                   | Input/Output/Attr   | Min        | Max       | Mean       | Var       | Scale     | Dtype         |
|----------------+----------------------------+-------------------------------------------------------------------------------+---------------------+------------+-----------+------------+-----------+-----------+---------------|
| 9              | quantized_ops.2.sum[sum]   | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | output              | 4.6700654  | 5.9043884 | 5.2101822  | 0.0529649 | 0.0001802 | qint16        |
| 10             | quantized_ops.2.reciprocal | <class 'horizon_plugin_pytorch.nn.quantized.segment_lut.SegmentLUT'>          | input               | 4.6700654  | 5.9043884 | 5.2101822  | 0.0529649 | 0.0001802 | qint16        |
| 0              | quant_stubx                | <class 'horizon_plugin_pytorch.nn.quantized.quantize.Quantize'>               | input               | -2.9943717 | 2.9613159 | -0.0791836 | 2.7670853 |           | torch.float32 |
| 3              | cat_op[cat]                | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | output              | -2.9942081 | 2.9474237 | -0.0580113 | 2.1627743 | 0.0233923 | qint8         |
| 4              | cat_op[cat](1)             | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-0             | -2.9942081 | 2.9474237 | -0.0580113 | 2.1627743 | 0.0233923 | qint8         |
| 4              | cat_op[cat](1)             | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | output              | -2.9942081 | 2.9474237 | 0.2123352  | 1.5946714 | 0.0233923 | qint8         |
| 5              | quantized_ops.0            | <class 'horizon_plugin_pytorch.nn.quantized.relu.ReLU'>                       | input               | -2.9942081 | 2.9474237 | 0.2123352  | 1.5946714 | 0.0233923 | qint8         |
| 0              | quant_stubx                | <class 'horizon_plugin_pytorch.nn.quantized.quantize.Quantize'>               | output              | -2.9826291 | 2.9591436 | -0.0786467 | 2.7688842 | 0.0234853 | qint8         |
| 2              | mul_op[mul]                | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-0             | -2.9826291 | 2.9591436 | -0.0786467 | 2.7688842 | 0.0234853 | qint8         |
| 3              | cat_op[cat]                | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-0-0           | -2.9826291 | 2.9591436 | -0.0786467 | 2.7688842 | 0.0234853 | qint8         |
| 2              | mul_op[mul]                | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | output              | -2.9577060 | 2.5648856 | -0.0374420 | 1.5830494 | 0.0231071 | qint8         |
| 3              | cat_op[cat]                | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-0-1           | -2.9577060 | 2.5648856 | -0.0374420 | 1.5830494 | 0.0231071 | qint8         |
| 5              | quantized_ops.0            | <class 'horizon_plugin_pytorch.nn.quantized.relu.ReLU'>                       | output              | 0.0000000  | 2.9474237 | 0.6510122  | 0.4357365 | 0.0233923 | qint8         |
| 6              | quantized_ops.1            | <class 'horizon_plugin_pytorch.nn.quantized.segment_lut.SegmentLUT'>          | input               | 0.0000000  | 2.9474237 | 0.6510122  | 0.4357365 | 0.0233923 | qint8         |
| 8              | quantized_ops.2.exp        | <class 'horizon_plugin_pytorch.nn.quantized.segment_lut.SegmentLUT'>          | output              | 0.6394446  | 0.9999847 | 0.8683713  | 0.0100195 | 0.0000305 | qint16        |
| 9              | quantized_ops.2.sum[sum]   | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input               | 0.6394446  | 0.9999847 | 0.8683713  | 0.0100195 | 0.0000305 | qint16        |
| 11             | quantized_ops.2.mul[mul]   | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-0             | 0.6394446  | 0.9999847 | 0.8683713  | 0.0100195 | 0.0000305 | qint16        |
| 1              | quant_stuby                | <class 'horizon_plugin_pytorch.nn.quantized.quantize.Quantize'>               | input               | 0.5011058  | 0.9995295 | 0.7525039  | 0.0210502 |           | torch.float32 |
| 1              | quant_stuby                | <class 'horizon_plugin_pytorch.nn.quantized.quantize.Quantize'>               | output              | 0.5017246  | 0.9956098 | 0.7525385  | 0.0210164 | 0.0078394 | qint8         |
| 2              | mul_op[mul]                | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-1             | 0.5017246  | 0.9956098 | 0.7525385  | 0.0210164 | 0.0078394 | qint8         |
| 4              | cat_op[cat](1)             | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-1             | 0.5017246  | 0.9956098 | 0.7525385  | 0.0210164 | 0.0078394 | qint8         |
| 6              | quantized_ops.1            | <class 'horizon_plugin_pytorch.nn.quantized.segment_lut.SegmentLUT'>          | output              | 0.4992901  | 0.9464155 | 0.6408262  | 0.0163976 | 0.0074521 | qint8         |
| 7              | quantized_ops.2.sub[sub]   | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-0             | 0.4992901  | 0.9464155 | 0.6408262  | 0.0163976 | 0.0074521 | qint8         |
| 7              | quantized_ops.2.sub[sub]   | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-1             | 0.6334277  | 0.9464155 | 0.7888176  | 0.0090090 | 0.0074521 | qint8         |
| 7              | quantized_ops.2.sub[sub]   | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | output              | -0.4471186 | 0.0000000 | -0.1479909 | 0.0140247 | 0.0000136 | qint16        |
| 8              | quantized_ops.2.exp        | <class 'horizon_plugin_pytorch.nn.quantized.segment_lut.SegmentLUT'>          | input               | -0.4471186 | 0.0000000 | -0.1479909 | 0.0140247 | 0.0000136 | qint16        |
| 10             | quantized_ops.2.reciprocal | <class 'horizon_plugin_pytorch.nn.quantized.segment_lut.SegmentLUT'>          | output              | 0.1693695  | 0.2141069 | 0.1923085  | 0.0000730 | 0.0000065 | qint16        |
| 11             | quantized_ops.2.mul[mul]   | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | input-1             | 0.1693695  | 0.2141069 | 0.1923085  | 0.0000730 | 0.0000065 | qint16        |
| 11             | quantized_ops.2.mul[mul]   | <class 'horizon_plugin_pytorch.nn.quantized.functional_modules.QFunctional'>  | output              | 0.1326724  | 0.2132835 | 0.1666716  | 0.0003308 | 0.0016794 | qint8         |
| 12             | quantized_ops.3            | <class 'horizon_plugin_pytorch.nn.quantized.segment_lut.SegmentLUT'>          | input               | 0.1326724  | 0.2132835 | 0.1666716  | 0.0003308 | 0.0016794 | qint8         |
| 12             | quantized_ops.3            | <class 'horizon_plugin_pytorch.nn.quantized.segment_lut.SegmentLUT'>          | output              | 0.0703202  | 0.1175087 | 0.0903590  | 0.0001112 | 0.0009253 | qint8         |
| 13             | quantized_ops.4            | <class 'horizon_plugin_pytorch.nn.quantized.interpolate.Interpolate'>         | input               | 0.0703202  | 0.1175087 | 0.0903590  | 0.0001112 | 0.0009253 | qint8         |
| 13             | quantized_ops.4            | <class 'horizon_plugin_pytorch.nn.quantized.interpolate.Interpolate'>         | output              | 0.0712454  | 0.1147329 | 0.0903947  | 0.0000526 | 0.0009253 | qint8         |
| 14             | quantized_ops.5            | <class 'horizon_plugin_pytorch.nn.quantized.interpolate.Interpolate'>         | input               | 0.0712454  | 0.1147329 | 0.0903947  | 0.0000526 | 0.0009253 | qint8         |
| 14             | quantized_ops.5            | <class 'horizon_plugin_pytorch.nn.quantized.interpolate.Interpolate'>         | output              | 0.0712454  | 0.1147329 | 0.0903947  | 0.0000461 | 0.0009253 | qint8         |
| 15             | quantized_ops.6            | <class 'horizon_plugin_pytorch.nn.quantized.avg_pool2d.AvgPool2d'>            | input               | 0.0712454  | 0.1147329 | 0.0903947  | 0.0000461 | 0.0009253 | qint8         |
| 15             | quantized_ops.6            | <class 'horizon_plugin_pytorch.nn.quantized.avg_pool2d.AvgPool2d'>            | output              | 0.0747764  | 0.1091563 | 0.0903856  | 0.0000372 | 0.0008595 | qint8         |
| 16             | quantized_ops.7            | <class 'horizon_plugin_pytorch.nn.quantized.upsampling.Upsample'>             | input               | 0.0747764  | 0.1091563 | 0.0903856  | 0.0000372 | 0.0008595 | qint8         |
| 16             | quantized_ops.7            | <class 'horizon_plugin_pytorch.nn.quantized.upsampling.Upsample'>             | output              | 0.0756359  | 0.1074373 | 0.0903877  | 0.0000286 | 0.0008595 | qint8         |
| 17             | quantized_ops.8            | <class 'horizon_plugin_pytorch.nn.quantized.upsampling.UpsamplingBilinear2d'> | input               | 0.0756359  | 0.1074373 | 0.0903877  | 0.0000286 | 0.0008595 | qint8         |
| 17             | quantized_ops.8            | <class 'horizon_plugin_pytorch.nn.quantized.upsampling.UpsamplingBilinear2d'> | output              | 0.0773549  | 0.1048589 | 0.0903853  | 0.0000251 | 0.0008595 | qint8         |
| 18             | dequant_stub               | <class 'horizon_plugin_pytorch.nn.quantized.quantize.DeQuantize'>             | input               | 0.0773549  | 0.1048589 | 0.0903853  | 0.0000251 | 0.0008595 | qint8         |
| 18             | dequant_stub               | <class 'horizon_plugin_pytorch.nn.quantized.quantize.DeQuantize'>             | output              | 0.0773549  | 0.1048589 | 0.0903853  | 0.0000251 |           | torch.float32 |
+----------------+----------------------------+-------------------------------------------------------------------------------+---------------------+------------+-----------+------------+-----------+-----------+---------------+
statistic.html




    

若设置with_tensorboard=True，则会在指定目录下生成 tensorboard 的 log 文件，可以使用 tensorboard 打开查看每组数据的分布直方图。
7.4.5.9. 模型 weight 比较¶
该工具默认会计算模型中每一层 weight 的相似度（如果有的话），默认会输出到屏幕同时保存到文件。您也可以通过设置with_tensorboard=True，绘制 weight 的直方图，方便更直观地观看比较。
若使用 fx 模式进行量化，使用时需注意：
模型转换的过程默认都是 inplace 的，请您手动在进行转换前 deepcopy 一份原始模型。否则转换后，会错误地比较两个相同模型的 weight；
若涉及 float 模型的 weight 比较，请您手动调用 fuse_fx 将原始 float 模型进行 fuse。否则会错误地比较未 fuse 的 float 模型和 fuse 之后的 qat 或定点模型的 weight。
# from horizon_plugin_profiler import compare_weights
def compare_weights(
    float_model: torch.nn.Module,
    qat_quantized_model: torch.nn.Module,
    similarity_func="Cosine",
    with_tensorboard: bool = False,
    tensorboard_dir: Optional[str] = None,
    out_dir: Optional[str] = None,
) -> Dict[str, Dict[str, torch.Tensor]]:
    """比较 float/qat/quantized 模型的 weights。
    该函数使用 torch.quantization._numeric_suite.compare_weights 比较模型中每一层的
    weight。weight 相似度和 atol 将会打印到屏幕同时保存到“weight_comparison.txt”。
    您还可以设置 with_tensorboard=True，将 weight 直方图通过 tensorboard 打印。
        float_model: 浮点模型
        qat_quantized_model: qat/定点模型
        similarity_func: 相似度计算函数。支持 Cosine/MSE/L1/KL/SQNR 和任意您自定
            义的相似度计算函数。如果是自定义的函数，须返回标量或者仅含一个数的 tensor，
            否则结果显示可能不符合预期。默认为 Cosine。
        with_tensorboard: 是否使用 tensorboard，默认为 False。
        tensorboard_dir: tensorboard 日志文件路径。默认为 None。
        out_dir: 保存 txt 结果的路径。默认为 None, 保存到当前路径。
        一个记录两个模型 weight 的 dict，格式如下：
            * KEY (str): module 名 (如 layer1.0.conv.weight)
            * VALUE (dict): 两个模型中对应层的 weight:
                "float": 浮点模型中的 weight
                "quantized": qat/定点模型中的weight
使用示例：
from copy import deepcopy
import horizon_plugin_pytorch as horizon
import numpy as np
import torch
from horizon_plugin_pytorch import nn as horizon_nn
from horizon_plugin_pytorch.march import March, set_march
from horizon_plugin_pytorch.nn.quantized import FloatFunctional
from horizon_plugin_pytorch.quantization import (
    convert,
    get_default_qat_qconfig,
    prepare_qat,
    fuse_modules,
from horizon_plugin_pytorch.quantization.quantize_fx import (
    convert_fx,
    fuse_fx,
    prepare_qat_fx,
from horizon_plugin_pytorch.quantization.qconfig import (
    default_qat_8bit_fake_quant_qconfig,
from horizon_plugin_profiler import compare_weights
from torch import nn
from torch.quantization import DeQuantStub, QuantStub
# 这里略去 Resnet18 的定义
float_net = Resnet18().to(device)
set_march(March.BAYES)
float_net.qconfig = get_default_qat_qconfig()
float_net2 = deepcopy(float_net)
qat_net = prepare_qat_fx(float_net2, {"": default_qat_8bit_fake_quant_qconfig})
qat_net(data)
# 必须！！否则为比较未 fuse 的 float 模型和 fuse 之后的 qat 模型，模型中的 weight
# 有可能无法对应
float_net = fuse_fx(float_net)
compare_weights(float_net, qat_net)
会以表格的形式，同时在屏幕输出并在 weight_comparsion.txt 中保存结果。表格中从左到右每一列分别表示：
Weight Name：是模型中哪一层的 weight
Similarity：两个模型中对应层的 weight 的相似度
Atol: 两个模型中对应层的 weight 相差了几个 scale
+-------------------------------------+--------------+-----------+
| Weight Name                         | Similarity   | Atol      |
|-------------------------------------+--------------+-----------|
| conv1.conv.weight                   | 1.0000000    | 0.0000000 |
| layer1.0.conv_cell1.conv.weight     | 1.0000000    | 0.0000000 |
| layer1.0.shortcut.conv.weight       | 1.0000000    | 0.0000000 |
| layer1.0.conv_cell2.skip_add.weight | 1.0000000    | 0.0000000 |
| layer1.1.conv_cell1.conv.weight     | 1.0000000    | 0.0000000 |
| layer1.1.conv_cell2.conv.weight     | 1.0000000    | 0.0000000 |
| layer2.0.conv_cell1.conv.weight     | 1.0000000    | 0.0000000 |
| layer2.0.shortcut.conv.weight       | 1.0000000    | 0.0000000 |
| layer2.0.conv_cell2.skip_add.weight | 1.0000000    | 0.0000000 |
| layer2.1.conv_cell1.conv.weight     | 1.0000000    | 0.0000001 |
| layer2.1.conv_cell2.conv.weight     | 1.0000000    | 0.0000001 |
| layer3.0.conv_cell1.conv.weight     | 1.0000000    | 0.0000001 |
| layer3.0.shortcut.conv.weight       | 1.0000000    | 0.0000001 |
| layer3.0.conv_cell2.skip_add.weight | 1.0000000    | 0.0000002 |
| layer3.1.conv_cell1.conv.weight     | 1.0000000    | 0.0000005 |
| layer3.1.conv_cell2.conv.weight     | 1.0000001    | 0.0000008 |
| conv2.conv.weight                   | 1.0000001    | 0.0000010 |
| pool.conv.weight                    | 0.9999999    | 0.0000024 |
| fc.weight                           | 1.0000000    | 0.0000172 |
+-------------------------------------+--------------+-----------+
7.4.5.10. 分步量化¶
当遇到 QAT 模型训练困难导致指标上不去的情况时，您可能需要使用分步量化寻找精度的瓶颈，此时需要通过 qconfig=None 的方式将模型的某一部分设置为浮点。
若您使用 fx 进行量化，可以直接参考 API 文档中的 prepare_qat_fx，通过 hybrid 和 hybrid_dict 参数进行开启分步量化。
# from horizon_plugin_pytorch.quantization import prepare_qat
def prepare_qat(
    model: torch.nn.Module,
    mapping: Optional[Dict[torch.nn.Module, torch.nn.Module]] = None,
    inplace: bool = False,
    optimize_graph: bool = False,
    hybrid: bool = False,
"""在 prepare_qat 接口中通过 hybrid 参数来开启分步量化
        hybrid: 生成一个中间 op 是浮点计算的混合模型。其中有一些限制是：
        1. 混合模型不能通过 check_model 也不能编译
        2. 某些量化 op 不能直接接受浮点输入，您需要手动插入 QuantStub
量化算子→浮点算子：量化算子输出类型为 QTensor ， QTensor 默认不允许直接作为浮点算子的输入，因此会导致 forward 时出现 NotImplementedError 报错，为解决这一问题，您可以使用上述接口放开这个限制。
浮点算子→量化算子：QAT 时的量化算子实现一般为 浮点算子+FakeQuant 的形式，因此大部分情况下量化算子可以直接使用 Tensor 作为输入。由于和定点对齐的需求，少数算子在 QAT 时需要 input 的 scale 信息，因此必须输入 QTensor ，对于这种情况我们添加了检查，若您遇到相关报错，需要手动在浮点算子和量化算子之间插入QuantStub 。
使用示例：
import numpy as np
import pytest
import torch
from horizon_plugin_pytorch.march import March, set_march
from horizon_plugin_pytorch.nn import qat
from horizon_plugin_pytorch.quantization import (
    get_default_qat_qconfig,
    prepare_qat,
from horizon_plugin_pytorch.quantization.qconfig import (
    default_qat_8bit_fake_quant_qconfig,
from horizon_plugin_pytorch.quantization.quantize_fx import prepare_qat_fx
from torch import nn
from torch.quantization import DeQuantStub, QuantStub
class HyperQuantModel(nn.Module):
    def __init__(self, channels=3) -> None:
        super().__init__()
        self.quant = QuantStub()
        self.conv0 = nn.Conv2d(channels, channels, 1)
        self.conv1 = nn.Conv2d(channels, channels, 1)
        self.conv2 = nn.Conv2d(channels, channels, 1)
        self.dequant = DeQuantStub()
    def forward(self, input):
        x = self.quant(input)
        x = self.conv0(x)
        x = self.conv1(x)
        x = self.conv2(x)
        return self.dequant(x)
    def set_qconfig(self):
        self.qconfig = default_qat_8bit_fake_quant_qconfig
        self.conv1.qconfig = None
shape = np.random.randint(10, 20, size=4).tolist()
data = torch.rand(size=shape)
set_march(March.BAYES)
model = HyperQuantModel(shape[1])
# 若使用 eager 模式，设置 qconfig 之后，调用 prepare_qat(hybrid=True)
# model.set_qconfig()
# qat_model = prepare_qat(model, hybrid=True)
# fx 模式，直接通过 prepare_qat_fx 接口设置
qat_model = prepare_qat_fx(
    model,
    qconfig_dict={"": default_qat_8bit_fake_quant_qconfig},
    hybrid=True,
    hybrid_dict={"module_name": ["conv1",]}
assert isinstance(qat_model.conv0, qat.Conv2d)
# qat 模型中 conv1 仍然是浮点 conv
assert isinstance(qat_model.conv1, nn.Conv2d)
assert isinstance(qat_model.conv2, qat.Conv2d)
qat_model(data)
7.4.5.11. 单算子转换精度调试¶
当出现 QAT 转定点精度降低的情况时，您可能需要通过将定点模型中的部分重点 op 替换为 QAT 的方式来验证具体是哪个算子造成了转换掉点。




    

# from horizon_plugin_profiler import set_preserve_qat_mode
def set_preserve_qat_mode(model: nn.Module, prefixes=(), types=(), value=True):
通过设置 mod.preserve_qat_mode=True，使得转换后的定点模型中 mod 仍然为 qat 状态。
支持在 float 模型或者 qat 模型时调用此函数。
需要注意以下两点：
1）对于 fuse 的模块，仅在 conv 设置了 preserve_qat_mode = True 时，fuse 后的模块才
会有 preserve_qat_mode = True。因此，可以通过设置 conv.preserve_qat_mode = True 的
方式来设置 fused.preserve_qat_mode = True。示例如下：
    class Model(torch.nn.Module):
        def __init__(self):
            super(Model, self).__init__()
            self.conv = torch.nn.Conv2d()
            self.bn = torch.nn.BatchNorm2d()
            self.add = FloatFunctional()
            self.relu = torch.nn.Relu()
    float_model = Model()
    # 设置浮点 conv，正确
    set_preserve_qat_mode(float_model, types=(torch.nn.Conv2d,))
    # 设置浮点 bn，错误
    set_preserve_qat_mode(float_model, types=(torch.nn.BatchNorm2d,))
    float_model.fuse_modules()
    float_model.qconfig = get_default_qat_qconfig()
    qat_model = prepare_qat(float_model)
    # 在 fuse 并转为 qat 模型之后，设置浮点 conv，正确。这种方式下，模型中所有的 conv
    # 和 fuse 后的模块（convbn, convbnadd, ...）都会设置 preserve_qat_mode = True
    set_preserve_qat_mode(qat_model, types=(torch.nn.Conv2d,))
    # 使用 prefixes 参数来指定某个 fuse 的模块。convbnaddrelu 会被 fuse 到 add 的位置
    set_preserve_qat_mode(qat_model, prefixes=("add",))
2）如果浮点模型使用了 torch 函数（如 torch.add, torch.pow），并使用 fx 进行转换，这些
函数会被自动替换成 horizon 的算子。若要为这些函数设置 preserve_qat_mode = True，需要
对 qat 模型中对应的 horizon 算子设置 preserve_qat_mode = True。示例如下：
    class Model(torch.nn.Module):
        def __init__(self):
            super(Model, self).__init__()
            self.add = torch.add
    float_model = Model()
    # 通过 fx 转为 qat 模型
    qat_model = prepare_qat_fx(float_model)
    # 通过 types 设置，正确。qat 模型中所有的 FloatFunctional 均会被设置
    # preserve_qat_mode = True
    set_preserve_qat_mode(qat_model, types=(FloatFunctional,))
    # 使用 prefixes 参数来指定某个函数（如 add）。"add_generated_add_0" 为自动生成
    的 add 模块的名字
    set_preserve_qat_mode(qat_model, prefixes=("add_generated_add_0",))
    model：需要输出统计量的模型
    prefixes：指定要输出统计量的 op 在模型中对应的 layer name（以 prefixes 开头的 layer）
    types：指定要输出统计量的 op 的类型。如果输入为浮点模型，types 必须为浮点 op 类型；
        若输入为 QAT 模型，types 可以是浮点或者 qat op 类型
    value：设置 preserve_qat_mode=value。默认为 True
使用示例：
import horizon_plugin_pytorch as horizon
import numpy as np
import torch
from horizon_plugin_pytorch import nn as horizon_nn
from horizon_plugin_pytorch.march import March, set_march
from horizon_plugin_pytorch.nn.quantized import FloatFunctional
from horizon_plugin_pytorch.quantization.qconfig import (
    default_qat_8bit_fake_quant_qconfig,
from horizon_plugin_pytorch.quantization.quantize_fx import (
    convert_fx,
    prepare_qat_fx,
from horizon_plugin_profiler import set_preserve_qat_mode
from torch import nn
from torch.quantization import DeQuantStub, QuantStub
class Conv2dModule(nn.Module):
    def __init__(
        self,
        in_channels,
        out_channels,
        kernel_size=1,
        stride=1,
        padding=0,
        dilation=1,
        groups=1,
        bias=True,
        padding_mode="zeros",
        super().__init__()
        self.conv2d = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size,
            stride,
            padding,
            dilation,
            groups,
            bias,
            padding_mode,
        self.add = FloatFunctional()
        self.bn_mod = nn.BatchNorm2d(out_channels)
        self.relu_mod = nn.ReLU()
    def forward(self, x, y):
        x = self.conv2d(x)
        x = self.bn_mod(x)
        x = self.add.add(x, y)
        x = self.relu_mod(x)
        return x
class TestFuseNet(nn.Module):
    def __init__(self, channels) -> None:
        super().__init__()
        self.convmod1 = Conv2dModule(channels, channels)
        self.convmod2 = Conv2dModule(channels, channels)
        self.convmod3 = Conv2dModule(channels, channels)
        self.shared_conv = nn.Conv2d(channels, channels, 1)
        self.bn1 = nn.BatchNorm2d(channels)
        self.bn2 = nn.BatchNorm2d(channels)
        self.sub = FloatFunctional()
        self.relu = nn.ReLU()
    def forward(self, x, y):
        x = self.convmod1(x, y)
        x = self.convmod2(y, x)
        x = self.convmod3(x, y)
        x = self.shared_conv(x)
        x = self.bn1(x)
        y = self.shared_conv(y)
        y = self.bn2(y)
        x = self.sub.sub(x, y)
        x = self.relu(x)
        return x
model = TestFuseNet(3)
# 可以调用接口设置，也可以手动指定 preserve_qat_mode=True
set_preserve_qat_mode(float_net, ("convmod1"), ())
model.convmod1.preserve_qat_mode = True
set_march(March.BAYES)
qat_net = prepare_qat_fx(model, {"": default_qat_8bit_fake_quant_qconfig})
quant_model = horizon.quantization.convert_fx(qat_net)
# 定点模型中 convmod1.add 仍然为 qat.ConvAddReLU2d
assert isinstance(quant_model.convmod1.add, horizon_nn.qat.ConvAddReLU2d)
7.4.5.12. 异构模型部署 device 检查¶
horizon_plugin_pytorch 支持通过 fx 的方式来构建部署异构模型。异构模型 device 检查工具会检查最后部署时，模型中的每个算子运行在 BPU 还是 CPU 上。




    

# from horizon_plugin_profiler import check_deploy_device
def check_deploy_device(
    model: torch.fx.GraphModule,
    print_tabulate: bool = True,
    out_dir: Optional[str] = None,
) -> Dict[str, Tuple[str, str]]:
    """检查异构模型部署时每个算子是运行在 CPU 还是 BPU 上。
        model: QAT 模型或定点模型。必须是通过`prepare_qat_fx`接口转换得到。
        print_tabulate：是否打印结果。默认为 True。
        out_dir: 保存 deploy_device.txt 的路径。默认为 None, 保存到当前路径。
        一个记录每个 op 运行 device 的 dict，格式如下：
            * KEY (str): module 名 (如 layer1.0.conv.weight)
            * VALUE (Tuple): (部署 device(BPU/CPU), module 类型)
使用示例：
import numpy as np
import torch
from horizon_plugin_pytorch.march import March, set_march
from horizon_plugin_pytorch.nn import qat
from horizon_plugin_pytorch.nn.quantized import FloatFunctional
from horizon_plugin_pytorch.quantization import (
    prepare_qat_fx,
    convert_fx,
from horizon_plugin_pytorch.quantization.qconfig import (
    default_qat_8bit_fake_quant_qconfig,
    default_qat_out_8bit_fake_quant_qconfig,
from horizon_plugin_profiler import check_deploy_device
from torch import nn
from torch.quantization import DeQuantStub, QuantStub
class _ConvBlock(nn.Module):
    def __init__(self, channels=3):
        super().__init__()
        self.conv = nn.Conv2d(channels, channels, 1)
        self.prelu = torch.nn.PReLU()
    def forward(self, input):
        x = self.conv(input)
        x = self.prelu(x)
        return torch.nn.functional.selu(x)
class _SeluModule(nn.Module):
    def forward(self, input):
        return torch.nn.functional.selu(input)
class HybridModel(nn.Module):
    def __init__(self, channels=3):
        super().__init__()
        self.quant = QuantStub()
        self.conv0 = nn.Conv2d(channels, channels, 1)
        self.prelu = torch.nn.PReLU()
        self.conv1 = _ConvBlock(channels)
        self.conv2 = nn.Conv2d(channels, channels, 1)
        self.conv3 = nn.Conv2d(channels, channels, 1)
        self.conv4 = nn.Conv2d(channels, channels, 1)
        self.selu = _SeluModule()
        self.dequant = DeQuantStub()
        self.identity = torch.nn.Identity()
        self.add = FloatFunctional()
    def forward(self, input):
        x = self.quant(input)
        x = self.conv0(x)
        x = self.identity(x)
        x = self.prelu(x)
        x = torch.nn.functional.selu(x)
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.identity(x)
        y = self.conv4(x)
        x = self.add.add(x, y)
        x = self.selu(x)
        return self.dequant(x)
set_march(March.BAYES)
shape = np.random.randint(10, 20, size=4).tolist()
infer_shape = [1] + shape[1:]
infer_data = torch.rand(size=infer_shape)
model = HybridModel(shape[1])
model(infer_data)
# 使用 fx 接口进行异构
qat_model = prepare_qat_fx(
    model,
        "": default_qat_8bit_fake_quant_qconfig,
        "module_name": [("conv4", default_qat_out_8bit_fake_quant_qconfig)],
    hybrid=True,
    hybrid_dict={
        "module_name": ["conv1.conv", "conv3"],
        "module_type": [_SeluModule],
qat_model(infer_data)
check_deploy_device(qat_model)
quantize_model = convert_fx(qat_model)
check_deploy_device(quantize_model)
会以表格的形式，同时在屏幕输入并在deploy_device.txt中保存如下结果。表格中从左到右每一列分别表示：




    

name：该 op 在模型中定义的 name
deploy device：部署时实际运行的 device，是 CPU 或者 BPU
type：该 op 在模型中的调用形式，module 或 function
name                            deploy device    type
------------------------------  ---------------  --------
quant                           CPU              module
conv0                           BPU              module
prelu_input_dequant             CPU              module
prelu                           CPU              module
selu                            CPU              function
conv1.conv                      CPU              module
conv1.prelu                     CPU              module
selu_1                          CPU              function
selu_1_activation_post_process  CPU              module
conv2                           BPU              module
conv3_input_dequant             CPU              module
conv3                           CPU              module
conv3_activation_post_process   CPU              module
add_1                           BPU              method
selu_2_input_dequant            CPU              module
selu_2                          CPU              function
dequant                         CPU              module
7.4.5.13. torchscript 和 hbdk 结果对比¶
当遇到 horizon_plugin_pytorch 生成的定点 pt 的推理结果，和编译后的 hbm 推理结果不一致的情况时，您可以使用此工具检查 pt 的推理结果和 hbdk 解析 pt 的结果是否一致。此工具会输出 pt 中每个 op 和 hbdk 解析后对应 op 的结果对比。
当遇到定点 pt 推理结果和 hbm 结果或上板结果不一致时，请先确保前后处理的过程都是一致的。此外，hbdk 对 pt 的解析仅是编译过程中的一步，hbm 推理结果和最终上板推理的结果由 hbdk 和 runtime 等决定。即使使用此工具检查确认定点 pt 的推理结果和 hbdk 对 pt 的解析结果一致，仍无法保证和最终的上板结果一致。后续过程的验证请联系 hbdk 或者 runtime 开发团队。
# from horizon_plugin_profiler import script_profile
def script_profile(
    model: Union[torch.nn.Module, torch.jit.ScriptModule],
    example_inputs: Any,
    out_dir: Optional[str] = None,
    march: Optional[str] = None,
    mark_node_func: Optional[Callable] = None,
    compare_with_hbdk_parser: bool = True,
    """获取 ScriptModel 中每个 op 的结果，并和 hbdk 解析的结果对比。
    该函数将获取 ScriptModel 中每个 op 的结果，并使用 torch.save 将结果存储在
    “horizon_script_result.pt”文件中，同时也会以 dict 的形式返回改结果。
        model: 需要检查的模型。必须是定点模型或者 trace 之后的 ScriptModule
        example_inputs: 模型输入
        out_dir: 保存结果的路径。若为 None，则保存在当前路径下。默认为 None
        march: 使用的 BPU 架构。若为 None，会自动使用 get_march() 获取当前指定的架构。
            默认为 None。
        mark_node_func: 标记 ScriptModule 中哪些节点的结果需要保存的标记函数。
            若为 None，使用默认的标记函数。默认为 None。
        compare_with_hbdk_parser: 是否将 ScriptModule 中每个 op 的结果和 hbdk 解析
            的结果作对比。默认为 True，会和 hbdk 的解析结果进行对比，并在屏幕输出
            对比结果。
        output(dict<str, tensor>): 一个记录 pt 中每个 op 结果的 dict，格式如下：
            * KEY (str): op 名称，和 hbdk 解析后的每个 op 名称一致
            * VALUE (tensor): op 结果
使用示例：
import torch
from torch import nn
from torch.quantization import DeQuantStub, QuantStub
from horizon_plugin_pytorch import nn as horizon_nn
from horizon_plugin_pytorch.march import March, set_march
from horizon_plugin_pytorch.nn.quantized import FloatFunctional
from horizon_plugin_pytorch.quantization.quantize_fx import (
    convert_fx,
    prepare_qat_fx,
from horizon_plugin_pytorch.quantization.qconfig import (
    default_qat_8bit_fake_quant_qconfig,
from horizon_plugin_profiler import script_profile
class Net(nn.Module):
    def __init__(self, share_op=True):
        super(Net, self).__init__()
        self.quant_stubx = QuantStub()
        self.quant_stuby = QuantStub()
        self.unused = nn.ReLU()
        self.mul_op = FloatFunctional()
        self.cat_op = FloatFunctional()
        self.add_op = FloatFunctional()
        self.quantized_ops = nn.Sequential(
            nn.ReLU(),
            nn.Sigmoid(),
            nn.Softmax(),
            nn.SiLU(),
            horizon_nn.Interpolate(
                scale_factor=2, recompute_scale_factor=True
            horizon_nn.Interpolate(
                scale_factor=2.3, recompute_scale_factor=True
            nn.AvgPool2d(kernel_size=4),
            nn.Upsample(scale_factor=1.3, mode="bilinear"),
            nn.UpsamplingBilinear2d(scale_factor=0.7),
        self.dequant_stub = DeQuantStub()
        self.share_op = share_op
    def forward(self, x, y):
        x = self.quant_stubx(x)
        y = self.quant_stuby(y)
        y = self.add_op.add(x, y)
        x = self.cat_op.cat((x, y), 1)
        if self.share_op:
            x = self.cat_op.cat((x, y), dim=1)
        a, b = x.split(15, dim=1)
        x = self.mul_op.mul(a, b)
        x = self.quantized_ops(x)
        x = self.dequant_stub(x)
        return x
set_march(March.BAYES)
device = torch.device("cpu")
data = torch.rand((1, 10, 5, 5), device=device)
data = (data, data)
float_net = Net().to(device)
float_net(*data)
qat_net = prepare_qat_fx(float_net, {"": default_qat_8bit_fake_quant_qconfig})
qat_net = qat_net.to(device)
qat_net(*data)
bpu_net = convert_fx(qat_net)
script_module = torch.jit.trace(bpu_net.eval(), data)
script_profile(bpu_net, data, march=March.BAYES)
会在屏幕输出如下对比结果：




    

name                                        if equal
------------------------------------------  ----------
arg0                                        True
arg1                                        True
_hz_cat                                     True
_hz_cat_1                                   True
_aten_split.0                               True
_aten_split.1                               True
_hz_mul                                     True
_quantized_ops_0_aten_relu                  True
_quantized_ops_1_hz_lut                     True
_quantized_ops_2_aten_max_val               True
_quantized_ops_2_aten_max_arg               True
_quantized_ops_2_hz_sub                     True
_quantized_ops_2_exp_hz_segment_lut         True
_quantized_ops_2_hz_sum                     True
_quantized_ops_2_reciprocal_hz_segment_lut  True
_quantized_ops_2_hz_mul                     True
_quantized_ops_3_hz_lut                     True
_quantized_ops_4_hz_interpolate             True
_quantized_ops_5_hz_interpolate             True
_quantized_ops_6_hz_avg_pool2d              True
_quantized_ops_7_hz_interpolate             True
_quantized_ops_8_hz_interpolate             True
Torch run pt output is same with hbdk parser.
7.4.5.14. 不同版本 torchscript 模型的结果对比¶
当遇到 horizon_plugin_pytorch 版本变更之后，同一个模型的定点 pt 推理结果不一致的问题时，在确保不同版本的前后处理过程一致后，您可以使用此工具对比不同版本的 pt 中每个 op 的结果。
# from horizon_plugin_profiler import compare_script_models
def compare_script_models(
    model1: torch.jit.ScriptModule,
    model2: torch.jit.ScriptModule,
    example_inputs: Any,
    march: Optional[str] = None,
    """比较两个 ScriptModule 的结果。
    该函数比较同一个模型在不同 horizon_plugin_pytorch 下生成的 ScriptModule 中每个 op 结果是否一致。
        model1: 使用某版本 horizon_plugin_pytorch 生成的 ScriptModule
        model2: 使用另一个版本 horizon_plugin_pytorch 生成的 ScriptModule
        example_inputs: 模型输入
        march: 使用的 BPU 架构。若为 None，会自动使用 get_march() 获取当前指定的架构。
            默认为 None。
使用示例：
import torch
from torch import nn
from torch.quantization import DeQuantStub, QuantStub
from horizon_plugin_pytorch import nn as horizon_nn
from horizon_plugin_pytorch.march import March, set_march
from horizon_plugin_pytorch.nn.quantized import FloatFunctional
from horizon_plugin_pytorch.quantization.quantize_fx import (
    convert_fx,
    prepare_qat_fx,
from horizon_plugin_pytorch.quantization.qconfig import (
    default_qat_8bit_fake_quant_qconfig,
from horizon_plugin_profiler import compare_script_models
class Net(nn.Module):
    def __init__(self, share_op=True):
        super(Net, self).__init__()
        self.quant_stubx = QuantStub()
        self.quant_stuby = QuantStub()
        self.unused = nn.ReLU()
        self.mul_op = FloatFunctional()
        self.cat_op = FloatFunctional()
        self.add_op = FloatFunctional()
        self.quantized_ops = nn.Sequential(
            nn.ReLU(),
            nn.Sigmoid(),
            nn.Softmax(),
            nn.SiLU(),
            horizon_nn.Interpolate(
                scale_factor=2, recompute_scale_factor=True
            horizon_nn.Interpolate(
                scale_factor=2.3, recompute_scale_factor=True
            nn.AvgPool2d(kernel_size=4),
            nn.Upsample(scale_factor=1.3, mode="bilinear"),
            nn.UpsamplingBilinear2d(scale_factor=0.7),
        self.dequant_stub = DeQuantStub()
        self.share_op = share_op
    def forward(self, x, y):
        x = self.quant_stubx(x)
        y = self.quant_stuby(y)
        y = self.add_op.add(x, y)
        x = self.cat_op.cat((x, y), 1)
        if self.share_op:
            x = self.cat_op.cat((x, y), dim=1)
        a, b = x.split(15, dim=1)
        x = self.mul_op.mul(a, b)
        x = self.quantized_ops(x)
        x = self.dequant_stub(x)
        return x
set_march(March.BAYES)
device = torch.device("cpu")
data = torch.rand((1, 10, 5, 5), device=device)
data = (data, data)
float_net = Net().to(device)
float_net(*data)
qat_net = prepare_qat_fx(float_net, {"": default_qat_8bit_fake_quant_qconfig})
qat_net = qat_net.to(device)
qat_net(*data)
bpu_net = convert_fx(qat_net)
script_module = torch.jit.trace(bpu_net.eval(), data)
# 实际使用时应输入两个不同版本的 ScriptModule
compare_script_models(script_module, script_module, data)
会在屏幕输出如下结果：
name                                        if equal
------------------------------------------  ----------
arg0                                        True
arg1                                        True
_hz_add                                     True
_hz_cat                                     True
_hz_cat_1                                   True
_aten_split.0                               True
_aten_split.1                               True
_hz_mul                                     True
_quantized_ops_0_aten_relu                  True
_quantized_ops_1_hz_lut                     True
_quantized_ops_2_aten_max_arg               True
_quantized_ops_2_aten_max_val               True
_quantized_ops_2_hz_sub                     True
_quantized_ops_2_exp_hz_segment_lut         True
_quantized_ops_2_hz_sum                     True
_quantized_ops_2_reciprocal_hz_segment_lut  True
_quantized_ops_2_hz_mul                     True
_quantized_ops_3_hz_lut                     True
_quantized_ops_4_hz_interpolate             True
_quantized_ops_5_hz_interpolate             True
_quantized_ops_6_hz_avg_pool2d              True
_quantized_ops_7_hz_interpolate             True
_quantized_ops_8_hz_interpolate             True
All ops in two ScriptModules are same.
7.4.5.15. 模型显存占用分析工具¶
Plugin 提供了模型显存占用的分析工具，便于您定位显存瓶颈，合理使用 checkpoint 和 saved tensor 等技术节省显存。
# from horizon_plugin_profiler import show_cuda_memory_consumption
def show_cuda_memory_consumption(
    model: torch.nn.Module,
    example_inputs: Any,
    device: torch.device,
    check_leaf_module=None,
    out_dir: Optional[str] = None,
    file_name: Optional[str] = None,
    custom_backward=None,
    评估模型在 forward 和 backward 过程中的显存占用情况
    结果将保存为 html 文件
    已知问题：模型中使用了 checkpoint 时，部分 backward 条目的名称将显示为 forward，
    因为 checkpoint 使得 forward hook 在 backward 过程中被调用
        model: 需要评估的模型
        example_inputs (Any[Tensor]): 模型输入
        device: 评估时使用的 device
        check_leaf_module: 检查 module 是否是一个叶子节点。默认为 None，使用
            预定义的 is_leaf_module，将所有 horizon_plugin_pytorch 中定义的 op 以及未支持的
            浮点 op 当作为叶子节点
        out_dir: 保存 html 结果的路径。默认为 None, 保存到当前路径
        file_name: 保存的 html 文件名。若未指定，默认为 mem_info
        custom_backward: 使用模型输出执行 backward 操作，必须设置 retain_graph=False。
            默认为 None，此时模型输出必须是单个 Tensor
使用示例：
# 这里略去 MobilenetV1 的定义
float_net = MobilenetV1()
show_cuda_memory_consumption(float_net, data, torch.device("cuda"))
将会在当前目前或者out_dir参数指定的目录下生成如下结果。
mem_info.html
7.4.5.1. 总览 ¶

7.4.5.2. 集成接口  ¶

7.4.5.3. fuse 检查 ¶

7.4.5.4. 共享 op 检查 ¶

7.4.5.5. 量化配置检查 ¶

7.4.5.6. 可视化：ONNX 模型可视化 ¶

7.4.5.7. 相似度对比 ¶

7.4.5.8. 统计量 ¶

7.4.5.9. 模型 weight 比较 ¶

7.4.5.10. 分步量化 ¶

7.4.5.11. 单算子转换精度调试 ¶

7.4.5.12. 异构模型部署 device 检查 ¶

7.4.5.13. torchscript 和 hbdk 结果对比 ¶

7.4.5.14. 不同版本 torchscript 模型的结果对比 ¶

7.4.5.15. 模型显存占用分析工具 ¶