Speed benchmarking on android? - Mobile - PyTorch Forums

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

没读研的酸菜鱼 · qt中QLineEdit如何换行 - CSDN文库· 昨天 ·

骑白马的骆驼 · python使用lxml解析XML，并给出正 ...· 21 小时前 ·

神勇威武的拐杖 · plottable · PyPI· 19 小时前 ·

谈吐大方的仙人球 · 从windows获取连接的USB设备列表_p ...· 16 小时前 ·

多情的长颈鹿 · Speed benchmarking on ...· 7 小时前 ·

曾经爱过的皮蛋 · 西北农林科技大学动物医学院· 1 月前 ·

宽容的企鹅 · PHP环境提取m3u8,PHP读取转发M3U ...· 2 月前 ·

才高八斗的洋葱 · 这个世界最大的资源网站，惹事了！|维基|电子 ...· 3 月前 ·

深情的椅子 · php : 无法将“php”项识别为 ...· 4 月前 ·

温柔的火车 · 使用MERGE的Microsoft Sql ...· 4 月前 ·

I am interested to know how fast some of my models run on the CPUs of a Pixel 3 phone. I am a moderately experienced pytorch programmer and linux user, but I have zero experience with android. I am not looking to build an app right now; I just want to know how fast my model runs on this particular phone.

The TensorFlow repo has this barebones android test thingy for timing the latency of a neural net of your choice on an android phone in TensorFlow: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tools/benchmark/android/README.md

Has anyone made anything similar for pytorch?

We have a binary to do this that can run on your android phone using adb.

To build,

./scripts/build_android.sh \                                                                                                                       
-DBUILD_BINARY=ON \
-DBUILD_CAFFE2_MOBILE=OFF \
-DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \
-DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)')
To run the binary, push it to the device using adb and run the following command

./speed_benchmark_torch --model=model.pt  --input_dims="1,3,224,224" --input_type=float --warmup=10 --iter 10 --report_pep true 
              It should be possible, we already output the total network latency in a format that is acceptable by FAI-PEP.

The flow should be similar to existing caffe2 for mobile, but use the speed_benchmark_torch binary instead.
              Thanks! Is there a certain NDK version that is preferred? I know in TensorFlow, they like using old NDK versions for some reason.
Also, do we need the Android SDK to be visible to PyTorch anywhere?
              The instructions for speed_benchmark_torch worked for me on the first try!
If anyone else wants try this on a Pixel 3 android phone, here is the setup that worked for me:
# in bash shell
cd pytorch #where I have my `git clone` of pytorch
export ANDROID_ABI=arm64-v8a
export ANDROID_NDK=/path/to/Android/Sdk/ndk/21.0.6113669/
./scripts/build_android.sh \
-DBUILD_BINARY=ON \
-DBUILD_CAFFE2_MOBILE=OFF \
-DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \
-DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)') \
# speed_benchmark_torch appears in pytorch/build_android/install/bin/speed_benchmark_torch
Next, I followed these instructions to export a resnet18 torchscript model:
#in python
import torch
import torchvision
model = torchvision.models.resnet18(pretrained=True)
model.eval()
example = torch.rand(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, example)
traced_script_module.save("resnet18.pt")
Then, I put the files onto the android device
# in bash shell on linux host computer that's plugged into Pixel 3 phone
adb shell mkdir /data/local/tmp/pt
adb push build_android/install/bin/speed_benchmark_torch /data/local/tmp/pt
adb push resnet18.pt /data/local/tmp/pt
And finally I run on the android device
# in bash shell on linux host computer that's plugged into Pixel 3 phone
adb shell  /data/local/tmp/pt/speed_benchmark_torch \
--model  /data/local/tmp/pt/resnet18.pt --input_dims="1,3,224,224" \
--input_type=float --warmup=5 --iter 20
It prints:
Starting benchmark.
Running warmup runs.
Main runs.
Main run finished. Milliseconds per iter: 188.382. Iters per second: 5.30836
Pretty good! I believe resnet18 is about 4 gflop (that is, 2 gmac) per frame, so (4 gmac) / (188 ms) = 21 gflop/s. Not bad for ARM CPUs! (At least I assume it’s executing on the ARM CPUs and not any GPUs or other accelerators.)
Also, this whole process took me about 25 minutes, and everything worked on the first try. I use pytorch day-to-day, but I have very little experience with android, and this was also my first time using torchscript, so I’m surprised and impressed that it was so straightforward.
              This thread is very useful and I’m trying to get this working. I can’t get past the step where build_android.sh is run without a bunch of errors. You can view my CMakeError.log here. Does anyone know what’s going on here? Alternatively, if someone could link me their speed_benchmark_torch executable that might also work.
              @solvingPuzzles I’m in Ubuntu 18.04. I actually just got it working. I had to run

git submodule update --init --recursive

within the pytorch clone as well as run the
./scripts/build_android.sh \
-DBUILD_BINARY=ON \
-DBUILD_CAFFE2_MOBILE=OFF \
-DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \
-DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)')
command as sudo -E. I also had to run the python commands as sudo so it could actually write the .pt file.
              If someone met this error
abort_message: assertion "terminating with uncaught exception of type c10::Error: PytorchStreamReader failed locating file bytecode.pkl: file not found ()
just save the model using _save_for_lite_interpreter like this:

traced_script_module._save_for_lite_interpreter("resnet18.pt")
It worked with me ^^
              Hi, I am trying to run quantized version trained via QAT with config settings as qnnpack, but this seems to give error.
terminating with uncaught exception of type c10::NotImplementedError: Could not run 'quantized::linear' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'quantized::linear' is only available for these backends: [QuantizedCPU, BackendSelect, Functionalize, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta].
but if I run a quantized model without qnnpack setting it runs fine. But how is this happening, since mobile chips are arm64 which requires qnnpack configuration.

p.s.: using Pixel 7 pro device with tensor g2 processor
              this error means that your input tensor to the op is somehow coming from unquantized ops. you will need to provide repro.
cc @jerryzh168