As you can see, it shows that all the directory is empty even after running the above command. The raw_data/throw are correct but it is not preprocessing it properly. image1190×831 94.4 KB
Thank you
I copied the code from github to the “preprocess_HMDB_RGB.sh” and ran on terminal but it still doesn’t process the videos into frames giving the same output as from the notebook.
I checked the videos from the raw_data/throw are not corrupted. image2001×1026 381 KB
What is the error? I set the config to run 18 epochs but it seems to have an error after finishing the 17 epoch.
Can I check if I were to create multiple models like ‘throw’ and ‘snatch’ and ‘throw’ have 100 datasets that can be split to 70/30 and snatch have 50 data sets is split to 35/15. Can I run it together with the notebook or must I run one for throw, one for snatch and this will create 2 exports of rgb_resnet18_3.etlt files?
Facing an error when trying to export the RGB model
# Export the RGB model to encrypted ONNX model
!tao action_recognition export \
-e $SPECS_DIR/export_rgb.yaml \
-k $KEY \
model=$RESULTS_DIR/rgb_3d_ptm/rgb_only_model.tlt\
output_file=$RESULTS_DIR/export/rgb_resnet18_3.etlt
Error:
2023-05-29 09:39:15,346 [INFO] root: Registry: ['nvcr.io']
2023-05-29 09:39:15,400 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
ANTLR runtime and generated code versions disagree: 4.8!=4.9.3
ANTLR runtime and generated code versions disagree: 4.8!=4.9.3
[NeMo W 2023-05-29 09:39:28 nemo_logging:349] <frozen cv.action_recognition.scripts.export>:155: UserWarning:
'export_rgb.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
Created a temporary directory at /tmp/tmp3sbl48t1
Writing /tmp/tmp3sbl48t1/_remote_module_non_scriptable.py
ResNet3d(
(conv1): Conv3d(3, 64, kernel_size=(5, 7, 7), stride=(2, 2, 2), padding=(2, 3, 3), bias=False)
(bn1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool3d(kernel_size=(1, 3, 3), stride=2, padding=(0, 1, 1), dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): BasicBlock3d(
(conv1): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(bn1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(bn2): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(1): BasicBlock3d(
Telemetry data couldn't be sent, but the command ran successfully.
[Error]: <urlopen error [Errno -2] Name or service not known>
Execution status: FAIL
2023-05-29 09:39:37,728 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
Telemetry data couldn't be sent, but the command ran successfully.
[Error]: <urlopen error [Errno -2] Name or service not known>
Execution status: FAIL
2023-05-29 09:39:37,728 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
This error can be ignored. Please check if rgb_resnet18_3.etlt is generated. If yes, that means exporting is successful.
Here is the full log:
2023-05-31 04:36:45,995 [INFO] root: Registry: ['nvcr.io']
2023-05-31 04:36:46,048 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
ANTLR runtime and generated code versions disagree: 4.8!=4.9.3
ANTLR runtime and generated code versions disagree: 4.8!=4.9.3
[NeMo W 2023-05-31 04:36:59 nemo_logging:349] <frozen cv.action_recognition.scripts.export>:155: UserWarning:
'export_rgb.yaml' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
Created a temporary directory at /tmp/tmp87lljxfr
Writing /tmp/tmp87lljxfr/_remote_module_non_scriptable.py
ResNet3d(
(conv1): Conv3d(3, 64, kernel_size=(5, 7, 7), stride=(2, 2, 2), padding=(2, 3, 3), bias=False)
(bn1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool3d(kernel_size=(1, 3, 3), stride=2, padding=(0, 1, 1), dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): BasicBlock3d(
(conv1): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(bn1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(bn2): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(1): BasicBlock3d(
(conv1): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(bn1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv3d(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(bn2): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(layer2): Sequential(
(0): BasicBlock3d(
(conv1): Conv3d(64, 128, kernel_size=(3, 3, 3), stride=(1, 2, 2), padding=(1, 1, 1), bias=False)
(bn1): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(bn2): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv3d(64, 128, kernel_size=(1, 1, 1), stride=(1, 2, 2), bias=False)
(1): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(1): BasicBlock3d(
(conv1): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(bn1): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv3d(128, 128, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(bn2): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(layer3): Sequential(
(0): BasicBlock3d(
(conv1): Conv3d(128, 256, kernel_size=(3, 3, 3), stride=(1, 2, 2), padding=(1, 1, 1), bias=False)
(bn1): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(bn2): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv3d(128, 256, kernel_size=(1, 1, 1), stride=(1, 2, 2), bias=False)
(1): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(1): BasicBlock3d(
(conv1): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(bn1): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv3d(256, 256, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(bn2): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(layer4): Sequential(
(0): BasicBlock3d(
(conv1): Conv3d(256, 512, kernel_size=(3, 3, 3), stride=(1, 2, 2), padding=(1, 1, 1), bias=False)
(bn1): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv3d(512, 512, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(bn2): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(downsample): Sequential(
(0): Conv3d(256, 512, kernel_size=(1, 1, 1), stride=(1, 2, 2), bias=False)
(1): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(1): BasicBlock3d(
(conv1): Conv3d(512, 512, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(bn1): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(conv2): Conv3d(512, 512, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
(bn2): BatchNorm3d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(avg_pool): AdaptiveAvgPool3d(output_size=(1, 1, 1))
(fc_cls): Linear(in_features=512, out_features=2, bias=True)
Error executing job with overrides: ['encryption_key=nvidia_tao', 'model=/results/rgb_3d_ptm/rgb_only_model.tlt', 'output_file=/results/export/rgb_resnet18_3.etlt']
An error occurred during Hydra's exception formatting:
AssertionError()
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 252, in run_and_report
assert mdl is not None
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "</opt/conda/lib/python3.8/site-packages/nvidia_tao_pytorch/cv/action_recognition/scripts/export.py>", line 3, in <module>
File "<frozen cv.action_recognition.scripts.export>", line 155, in <module>
File "/opt/NeMo/nemo/core/config/hydra_runner.py", line 104, in wrapper
_run_hydra(
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 377, in _run_hydra
run_and_report(
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 294, in run_and_report
raise ex
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 378, in <lambda>
lambda: hydra.run(
File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 111, in run
_ = ret.return_value
File "/opt/conda/lib/python3.8/site-packages/hydra/core/utils.py", line 233, in return_value
raise self._return_value
File "/opt/conda/lib/python3.8/site-packages/hydra/core/utils.py", line 160, in run_job
ret.return_value = task_function(task_cfg)
File "<frozen cv.action_recognition.scripts.export>", line 33, in main
File "<frozen cv.action_recognition.scripts.export>", line 73, in run_export
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 161, in load_from_checkpoint
model = cls._load_model_state(checkpoint, strict=strict, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 209, in _load_model_state
keys = model.load_state_dict(checkpoint["state_dict"], strict=strict)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1660, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ActionRecognitionModel:
size mismatch for model.fc_cls.weight: copying a param with shape torch.Size([9, 512]) from checkpoint, the shape in current model is torch.Size([2, 512]).
size mismatch for model.fc_cls.bias: copying a param with shape torch.Size([9]) from checkpoint, the shape in current model is torch.Size([2]).
Telemetry data couldn't be sent, but the command ran successfully.
[Error]: <urlopen error [Errno -2] Name or service not known>
Execution status: FAIL
2023-05-31 04:37:08,536 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.