Torch 1.6.0 RuntimeError: probability tensor contains either `inf`, `nan` or element < 0, But good w

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

谦虚好学的跑步机 · LabVIEW 2015 中英文 ...· 6 月前 ·

玩命的企鹅 · 全网粉丝过亿，却仍“卖不动”，陈翔六点半的“ ...· 9 月前 ·

威武的香菇 · ERROR: column ...· 1 年前 ·

霸气的小虾米 · 【无人机学习笔记 ...· 1 年前 ·

帅气的地瓜 · 韩国演员李海仁三�% - 搜狗图片搜索· 1 年前 ·

I’m runing this piece of code:

    def forward(self, inputs, core_state=()):
        x = inputs["frame"]
        # time x batch x 64 x 64 x 3
        T, B, *_ = x.shape
        # merge time and batch
        # [T*B x 64 x 64 x 3]
        x = torch.flatten(x, 0, 1)
        x = x.float()
        # [T*B x 3 x 64 x 64]
        x = x.transpose(1, 3)
        # x = checkpoint_sequential(self.feat_extract, 2, x)
        x = self.feat_extract(x)
        x = x.view(T*B, -1)
        # core_input = checkpoint_sequential(self.fc, 2, x)
        core_input = self.fc(x)
        core_output = core_input
        core_state = tuple()
        policy_logits = self.policy(core_output)
        baseline = self.baseline(core_output)
        if self.training:
            action = torch.multinomial(F.softmax(policy_logits, dim=1), num_samples=1)
        else:
            action = torch.argmax(policy_logits, dim=1)
        policy_logits = policy_logits.view(T, B, self.num_actions)
        baseline = baseline.view(T, B)
        action = action.view(T, B)
        return dict(policy_logits=policy_logits, baseline=baseline,
                    action=action), core_state
And got this error:
Traceback (most recent call last):

File "/usr/local/easybuild-2019/easybuild/software/compiler/gcccore/8.3.0/python/3.7.4/lib/python3$

self.run()

File "/usr/local/easybuild-2019/easybuild/software/compiler/gcccore/8.3.0/python/3.7.4/lib/python3$

self._target(*self._args, **self._kwargs)

File “/data/gpfs/projects/punim1126/CL-RIDE-master-project/src/utils.py”, line 335, in act

raise e

File “/data/gpfs/projects/punim1126/CL-RIDE-master-project/src/utils.py”, line 287, in act

agent_output, agent_state = model(env_output, agent_state)

File “/home/shizhec/.local/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 722, in _$

result = self.forward(*input, **kwargs)

File “/data/gpfs/projects/punim1126/CL-RIDE-master-project/src/models.py”, line 675, in forward

action = torch.multinomial(F.softmax(policy_logits, dim=1), num_samples=1)

RuntimeError: probability tensor contains either inf, nan or element < 0
I’m using torch 1.6.0, however, when I use torch 1.1.0, I don’t get any error anymore, and I could train the model correctly. Anyone know why this is happening?
#one output before error:

tensor([[  4.0024, -22.5107,   8.6548, -26.3529, 199.5710, -23.9216, -40.0046,

3.1537, -19.8343, -39.2633,  34.4721,  -7.2311, -56.9415,  -5.0400,

15.7165]])

#where the error occur:

tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]])
              You code seems OK, something might have changed between these two versions, so you can either:
Modify you framework, then track and find the origin of the first NaN value
Switch back to 1.1.0, for now.
I would recommend you use the second solution, according to your complex situation presented in the previous question.
division by 0
something involving the annoying log/exp calculation, like log probability, I have just located a Nan problem myself this morning. Eg:
   a=Normal(1, 1e-23)
   a.log_prob(a.sample()) -> NaN because sigma is too small.
               Have you tried increasing the temperature?

Well try increasing the temperature value. I had very low temperature value along with other parameters such as top_k and top_p which made the next token distribution too steep and as the beam search’s logic, you will need to have multiple tokens available, and in the low temperature case I couldn’t have (because we know how temperature works, right?)
So I increased the temperature and it worked.
Try increasing the temp value and it should just work, if there are no other complexity involved.