Getting "view size is not compatible with input tensor's size and stride" due to missing ".contiguous()" in megatron-core dependency
File "/disks/yaroslav/envs/nemo2/lib/python3.9/site-packages/torch/cuda/amp/autocast_mode.py", line 123, in decorate_bwd
return bwd(*args, **kwargs)
File "/disks/yaroslav/envs/nemo2/lib/python3.9/site-packages/megatron/core/tensor_parallel/layers.py", line 284, in backward
total_input = total_input.view(total_input.shape[0] * total_input.shape[1],
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
The work-around is to open /disks/yaroslav/envs/nemo2/lib/python3.9/site-packages/megatron/core/tensor_parallel/layers.py file and change line 224 from
total_input = total_input.view(total_input.shape[0] * total_input.shape[1],
total_input.shape[2])
to instead read
total_input = total_input.contiguous()
total_input = total_input.view(total_input.shape[0] * total_input.shape[1],
total_input.shape[2])
This file is part of the following package:
megatron-core 0.2.0 pypi_0 pypi
changed the title
Getting " File "/disks/yaroslav/envs/nemo2/lib/python3.9/site-packages/torch/cuda/amp/autocast_mode.py", line 123, in decorate_bwd return bwd(*args, **kwargs) File "/disks/yaroslav/envs/nemo2/lib/python3.9/site-packages/megatron/core/tensor_parallel/layers.py", line 284, in backward total_input = total_input.view(total_input.shape[0] * total_input.shape[1], RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead."
Getting "view size is not compatible with input tensor's size and stride" due to missing ".contiguous()" in megatron-core dependency
Jul 18, 2023