AttributeError: 'float' object has no attribute 'sqrt' in PPO · Issue #1077 · ray-project/ray

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

风流倜傥的炒粉 · Frappe Cloud custom ...· 1 月前 ·

眼睛小的大葱 · Python3教程：copy模块详细用法_p ...· 1 月前 ·

爱搭讪的猴子 · [Arch Linux] expected ...· 1 月前 ·

傻傻的课本 · 硬货系列 (二)！！使用 python ...· 3 周前 ·

霸气的毛衣 · ERROR: Error loading ...· 2 周前 ·

魁梧的咖啡豆 · CIIPA年度报告专访——蔚来欧洲· 7 月前 ·

爱吹牛的瀑布 · 姑息疗法 - 妙佑医疗国际· 9 月前 ·

温柔的针织衫 · 配合数据库工作（Working with ...· 1 年前 ·

会开车的西装 · 《职业性尘肺病的病理诊断》（原《尘肺病理诊断 ...· 1 年前 ·

英俊的山楂 · 👩‍💻Flutter Jobs in ...· 1 年前 ·

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When training using PPO a custom-made environment I've got the following weird error:

  File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/rllib/common.py", 
line 144, in train
    result = self._train()
  File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/rllib/ppo/ppo.py",
 line 120, in _train
    self.model.reward_filter)
  File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/rllib/ppo/rollout.
py", line 125, in collect_samples
    trajectory, rewards, lengths, obs_f, rew_f = ray.get(next_trajectory)
  File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/worker.py", line 2
058, in get
    raise RayGetError(object_ids, value)
ray.worker.RayGetError: Could not get objectid ObjectID(c8742107b043fb31798884ee5ea564b234c6bb86). It was
 created by remote function compute_steps which failed with:
Remote function compute_steps failed with:
Traceback (most recent call last):
  File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/worker.py", line 7
27, in _process_task
    self.actors[task.actor_id().id()], *arguments)
  File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/rllib/ppo/runner.p
y", line 243, in compute_steps
    trajectory = self.compute_trajectory(gamma, lam, horizon)
  File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/rllib/ppo/runner.p
y", line 206, in compute_trajectory
    self.env, horizon, self.observation_filter, self.reward_filter)
  File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/rllib/ppo/rollout.
py", line 31, in rollouts
    observation = observation_filter(env.reset())
  File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/rllib/ppo/filter.p
y", line 120, in __call__
    x = x / (self.rs.std + 1e-8)
  File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/rllib/ppo/filter.p
y", line 81, in std
    return np.sqrt(self.var)
AttributeError: 'float' object has no attribute 'sqrt'

It is hard to debug since it pops after an hour of training but I have found that numpy raises this weird error in the following case:

x = np.array(42.42, dtype='object')
np.sqrt(x)

Interesting, I can reproduce this by running

python ray/python/ray/rllib/train.py --alg=PPO --env=CartPole-v0
It looks like the dtype of self.var in
      ray/python/ray/rllib/ppo/filter.py
         Line 81
      6ecc899
          Interestingly, when the problem occurs both self._M and self._S are
ndarrays with dtype='object' and self._n = 23908550229251600946. Will
investigate further.
On 4 October 2017 at 19:24, Robert Nishihara ***@***.***> wrote:
 Actually, I may have been mistaken and not successfully reproduced the
 problem.. need to look into it more.
 You are receiving this because you authored the thread.
 Reply to this email directly, view it on GitHub
 <#1077 (comment)>,
 or mute the thread
 <https://github.com/notifications/unsubscribe-auth/AEuSHY0Aaj1vMkp8yezl2JjPcYA1-NXYks5so79lgaJpZM4PtZEr>
          It is hard to debug for me. I was trying to investigate why self._n is so
large. It seems to grow to quickly. And I am pretty sure that it has to do
with the number of workers. When I print self._n in the push function then
from iteration 1 on strange things start to show up for 2 workers:
===> iteration 1
push 1035
push 9
push 1036
push 10
push 11
push 1037
push 12
push 1038
push 13
push 1039
For 5 workers:
===> iteration 1
push 20
push 993
push 1036
push 1037
push 1005
push 21
push 994
push 1037
push 1038
push 1038
push 22
push 995
push 1006
push 23
whereas when I use only 1 worker the output is:
===> iteration 1
push 9
push 10
push 11
push 12
push 13
push 14
push 15
push 16
push 17
push 18
push 19
push 20
I wonder whether it has anything to do with the warning I get on executing
the training:
WARNING: Serializing objects of type <class
'ray.rllib.ppo.filter.MeanStdFilter'> by expa
nding them as dictionaries of their fields. This behavior may be incorrect
in some cases.
WARNING: Serializing objects of type <class
'ray.rllib.ppo.filter.RunningStat'> by expand
ing them as dictionaries of their fields. This behavior may be incorrect in
some cases.
Strangely, it does not seem to affect Pong-v0...
On 4 October 2017 at 21:54, Wojciech Jaśkowski <wojciech.jaskowski@gmail.com
 wrote:
 Interestingly, when the problem occurs both self._M and self._S are
 ndarrays with dtype='object' and self._n = 23908550229251600946. Will
 investigate further.
 On 4 October 2017 at 19:24, Robert Nishihara ***@***.***>
 wrote:
> Actually, I may have been mistaken and not successfully reproduced the
> problem.. need to look into it more.
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#1077 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AEuSHY0Aaj1vMkp8yezl2JjPcYA1-NXYks5so79lgaJpZM4PtZEr>
Hopefully, it will also shed some light on the dtype='object' problem.
On 4 October 2017 at 22:33, Wojciech Jaśkowski <wojciech.jaskowski@gmail.com
 wrote:
 It is hard to debug for me. I was trying to investigate why self._n is so
 large. It seems to grow to quickly. And I am pretty sure that it has to do
 with the number of workers. When I print self._n in the push function then
 from iteration 1 on strange things start to show up for 2 workers:
 ===> iteration 1
 push 1035
 push 9
 push 1036
 push 10
 push 11
 push 1037
 push 12
 push 1038
 push 13
 push 1039
 For 5 workers:
 ===> iteration 1
 push 20
 push 993
 push 1036
 push 1037
 push 1005
 push 21
 push 994
 push 1037
 push 1038
 push 1038
 push 22
 push 995
 push 1006
 push 23
 whereas when I use only 1 worker the output is:
 ===> iteration 1
 push 9
 push 10
 push 11
 push 12
 push 13
 push 14
 push 15
 push 16
 push 17
 push 18
 push 19
 push 20
 I wonder whether it has anything to do with the warning I get on executing
 the training:
 WARNING: Serializing objects of type <class 'ray.rllib.ppo.filter.MeanStdFilter'>
 by expa
 nding them as dictionaries of their fields. This behavior may be incorrect
 in some cases.
 WARNING: Serializing objects of type <class 'ray.rllib.ppo.filter.RunningStat'>
 by expand
 ing them as dictionaries of their fields. This behavior may be incorrect
 in some cases.
 Strangely, it does not seem to affect Pong-v0...
 On 4 October 2017 at 21:54, Wojciech Jaśkowski <
 ***@***.***> wrote:
> Interestingly, when the problem occurs both self._M and self._S are
> ndarrays with dtype='object' and self._n = 23908550229251600946. Will
> investigate further.
> On 4 October 2017 at 19:24, Robert Nishihara ***@***.***>
> wrote:
>> Actually, I may have been mistaken and not successfully reproduced the
>> problem.. need to look into it more.
>> You are receiving this because you authored the thread.
>> Reply to this email directly, view it on GitHub
>> <#1077 (comment)>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/AEuSHY0Aaj1vMkp8yezl2JjPcYA1-NXYks5so79lgaJpZM4PtZEr>
          I can reproduce the problem by running PPO on CartPole-v0, e.g.,
python python/ray/rllib/train.py --alg=PPO --env=CartPole-v0 --config='{"num_sgd_iter": 5, "num_workers": 2, "model": {"fcnet_hiddens": [10, 10]}}'
The changes in #1082 seem to fix the problem for me (it runs on CartPole without the error appearing).
Some of the values in the observation filter and reward filter were growing at an exponential rate due to the bug described in #1081. I haven't tracked down exactly how this lead to a numpy array with dtype=object, but note that
>>> import numpy as np
>>> np.array(1 << 100)
array(1267650600228229401496703205376, dtype=object)
So it's plausible that the problems are related.
Do you want to try the changes in #1082 and see if that fixes the problem?
          On 5 October 2017 at 04:57, Robert Nishihara ***@***.***> wrote:
 Do you want to try the changes in #1082
 <#1082> and see if that fixes the
 problem?
This filter is crucial for my application. Without it I will not be able to
learn anything.