添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
Sorry, you must verify to complete this action. Please click the verification link in your email. You may re-send via your profile .

We are using openVino-21 and 22 to examine the inference response, checking the output features of the inference results. The results show that the output features are different in both versions.

In openVino-22, the onnx model seems to be optimized(?) when makeing inference process inside the openVino library, does this mean that the operation layers of the onnx model is fused inside the openVino library, like conversion to the IR model from the onnx model was done in mo.py?

This is a simple question, thank you in advance.

thank you for reaching out to us.

We'll get back to you with thorough details as soon as possible.

We appreciate your patience.

Cordially,

Generally, OpenVINO can read ONNX models directly, and the optimization is done by OpenVINO runtime. But this was already possible in OpenVINO 2021, and mo.py is still available in 2022 (with pip install openvino-dev you get an MO executable).

Model Optimizer now uses the ONNX Frontend, so you get the same graph optimizations when you load an ONNX model directly, or when you use MO to convert to IR and then load the model.

Actually, it is not expected that the output of ONNX models is different between 2021 and 2022.


It will be helpful if you could provide:


  1. Models used or custom models (share if possible)
  2. OpenVINO sample application used or it's a custom inferencing code (share if possible)
  3. Methods that you use to evaluate


Cordially,

Iffa



>Actually, it is not expected that the output of ONNX models is different between 2021 and 2022.

Thanks for the kind comment, it is already helpful.

I'd like to ask another question relating to the above comment.

the outputs of both versions can be completely same?

or it is possibly slightly different due to, for instance, data type such as fp16 or fp32, or rounding error?

best regards,

There are a few factors that influence an inferencing performance (since the result is closely related to performance) and as you mentioned, precision (FP32/FP16,etc) is indeed one of them.

There are 4 key elements to measure a successful deep learning inference:

  • Throughput
  • Latency
  • Value
  • Efficiency
  • You may refer to this documentation since it has a thorough explanation of those 4 elements.

    If you were comparing FP32 with FP16 precision, it is expected to have some differences in results, especially FPS and accuracy where FP16 is expected to perform less compared to FP32 since the size itself is halves. However, the one with FP16 precision should infer faster than FP32.

    The slight off in accuracy may be the reason you were experiencing different results.

    Cordially,

    following your kind offer below,

    Actually, it is not expected that the output of ONNX models is different between 2021 and 2022.

    It will be helpful if you could provide:

  • Models used or custom models (share if possible)
  • OpenVINO sample application used or it's a custom inferencing code (share if possible)
  • Methods that you use to evaluate
  • Different responce (different output) between openVino21 and 22 seem to appear when I use QuantizeLinear/DequantizeLinear layers, (thought the reason is not clear, it does not apper when using FakeQuantized layer).

    To reproduce the problem, I posted a simple INT8 Alex models, which are quantized via nncf (FakeQuantized layer) and onnx-runtime (QuantizeLinear/DequantizeLinear layers). The models are cut at the beginning so that we focus on difference of calculation, not its strucutre. You can see the structure with Netron.

    I suspect there is some bug in calculation with QuantizeLinear/DequantizeLinear layers.

    Thanks in advance.

    Thanks for sharing the model.

    I need your help to provide information on Question 2 as it would help me reproduce your result:

    2.OpenVINO sample application used or it's a custom inferencing code (share if possible)

    For example, Hello Classification Python Sample .

    Cordially,

    Thanks for the reply.

    It's possible to provide sample scripts and figures. Please wait a bit.

    I actually prefer to send my data via email, not post them directory.

    Is it possible? If not, I'll post it here.

    I did a few tests on the models that you shared through email.

    I'm using 2 environments, OpenVINO 2022.1 and 2021.4.

    These are the results within 2022.1 environment:

    1. Benchmark_app on original ONNX model

    2. Benchmark_app on model labeled 2022.1

    3. Benchmark_app on model labeled 2021.4

    4. Test model labeled 2022.1 on the official OpenVINO sample app (hello_classification.py)

    5. Test model labeled 2021.4 on the official OpenVINO sample app (hello_classification.py)

    6. Test original ONNX model on the official OpenVINO sample app (hello_classification.py)

    7. Model labeled 2022.1 on customer's code

    8. Model labeled 2021.4 on customer's code

    3. Benchmark_app on model labeled 2021.4

    4. Test model labeled 2022.1 on the official OpenVINO sample app (hello_classification.py)

    5. Test model labeled 2021.4 on the official OpenVINO sample app (hello_classification.py)

    6. Test original ONNX model on the official OpenVINO sample app (hello_classification.py)

    7. Model labeled 2022.1 on customer's code

    8. Model labeled 2021.4 on customer's code

    My finding is that both the 2022.1 and 2021.4 would produce the same result if they were run within the same OpenVINO environment. However, if we compare the result within 2 different OpenVINO environments, they are different.:

    1. Both models labeled as 2022.1 and 2021.4 in OpenVINO 2022.1 environment:

    2. Both models labeled as 2022.1 and 2021.4 in OpenVINO 2021.4 environment:

    We will further investigate this for a definite answer on the differences.

    Cordially,

    Thank you for your help.

    Just confirmation, (To avoid miscommunication,)

    I again listed models below, and an isssue I encountered.

    model.test-int8-Alex.onnx-rt-int8.cut.onnx
    => Quant/DeQuant Linear layers are used

    model.test-int8-Alex.ovino-nncrf-int8.onnx
    (=> I can't upload above original model which are used to generate below IR models)

    model.test-int8-Alex.ovino-nncrf-int8.ovino21.4_mo.cut.bin
    model.test-int8-Alex.ovino-nncrf-int8.ovino21.4_mo.cut.mapping
    model.test-int8-Alex.ovino-nncrf-int8.ovino21.4_mo.cut.xml
    model.test-int8-Alex.ovino-nncrf-int8.ovino22.1_mo.cut.bin
    model.test-int8-Alex.ovino-nncrf-int8.ovino22.1_mo.cut.mapping
    model.test-int8-Alex.ovino-nncrf-int8.ovino22.1_mo.cut.xml
    => FakeQuant layer is used.

    The meaning of label "ovino21.4_mo" and "ovino22.1_mo" means

    which mo.py (or mo.exe) is used to convert to the IR model.

    It means ...

    I though I have to use "mo of ovino21.4" to convert the onnx to the IR model so that I run it in the openVino 21.4 framework,

    similary, I have to use "mo of ovino22.1" to convert the onnx to the IR model so that I run it in the openVino 22.1 framework.

    That's why I labeld "ovino21.4_mo" and "ovino22.1_mo".

    The issue apperes when using "Quant/DeQuant Linear layers" in different openVino version.
    The issue is that different output features are shown when using different openVino version.

    The above issue does not apper when uisng "FakeQuant layer"

    Best regards

    our findings is that, the differences that you see are because the accuracy within the newer version is improved and this is expected.

    This can be seen in the results that I attached before, the best performance is 4.73 FPS which uses the IR converted (which shared by you) and runs within 2022.1 environment. It is best to use a newer version since they have some upgrades that the previous version doesn't and also be consistent in processing (convert etc) of the model within one environment.

    Cordially,

    It means

    Under OpenVINO 2022.1 environment:
    the model quantized w/ onnx-runtime (Quant/DeQuant Linear layers)
    the model converted to IR via mo.oy (21.4 ver.) (FakeQuant layer)
    the model converted to IR via mo.oy (22.1 ver.) (FakeQuant layer)

    output features from each model above are same.

    Under OpenVINO 2021.4 environment:

    the model quantized w/ onnx-runtime (Quant/DeQuant Linear layers)
    the model converted to IR via mo.oy (21.4 ver.) (FakeQuant layer)
    the model converted to IR via mo.oy (22.1 ver.) (FakeQuant layer)

    output features from each model above are same. (but performance is a bit worse compared to 22.1 enviroment)

    Is this what you mean?

    Accuracy influences your inferencing results, for example older version can detect apples, while the newer version (with better accuracy and performance) detects apples with their colours.


    Definitely, you would see differences in the result obtained whether in numerical or graphical representation.


    Cordially,

    Iffa


    but, umm still I dont understand well...

    A model having FakeQuant layer provides same output features/values in both vino version 21.4 and 22.1, and
    A model having Quant/DeQuant Linear layer provides different output features/values in vino both version 21.4 and 22.1

    This above is my problem.

    If I follow your comment, a model having FakeQuant layer should provide different features/values in both version.

    If the situation is below, I can undestand. but it's not.

    Both models provide "same" output features/values in both vino version, or

    Both models provide "different" output features/values in both vino version.

    Maybe I should check something different parts/aspects...


    I'd like to confirm your observation again, the below is what you observed?

    Under OpenVINO 2022.1 environment:
    the model quantized w/ onnx-runtime (Quant/DeQuant Linear layers)
    the model converted to IR via mo.oy (21.4 ver.) (FakeQuant layer)
    the model converted to IR via mo.oy (22.1 ver.) (FakeQuant layer)
    output features from each model above are same.

    Best regards

    Only models that were converted into IR (Intermediate Representation) format produce the same result within one OpenVINO version.

    That result is tested using official OpenVINO sample app (hello_classification.py), hope you had carefully observed the results that I attached before.


    Result 1:

    Moded converted to IR via mo.oy (21.4 ver.) (FakeQuant layer) and the model converted to IR via mo.oy (22.1 ver.) (FakeQuant layer) inferred in OpenVINO 2022.1 produce the same result


    Result 2:

    Moded converted to IR via mo.oy (21.4 ver.) (FakeQuant layer) and the model converted to IR via mo.oy (22.1 ver.) (FakeQuant layer) inferred in OpenVINO 2021.4 produce the same result



    If you compare results 1 and 2 (both models inferred in different OpenVINO runtime version) they have different results, which referred back to your issue, the differences is expected due to improvements that were done within the newer version of OpenVINO.



    I hope this clarifies your questions


    Cordially,

    Iffa



    Community support is provided Monday to Friday. Other contact methods are available here .

    Intel does not verify all solutions, including but not limited to any file transfers that may appear in this community. Accordingly, Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

    For more complete information about compiler optimizations, see our Optimization Notice .