When training using PPO a custom-made environment I've got the following weird error:
File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/rllib/common.py",
line 144, in train
result = self._train()
File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/rllib/ppo/ppo.py",
line 120, in _train
self.model.reward_filter)
File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/rllib/ppo/rollout.
py", line 125, in collect_samples
trajectory, rewards, lengths, obs_f, rew_f = ray.get(next_trajectory)
File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/worker.py", line 2
058, in get
raise RayGetError(object_ids, value)
ray.worker.RayGetError: Could not get objectid ObjectID(c8742107b043fb31798884ee5ea564b234c6bb86). It was
created by remote function compute_steps which failed with:
Remote function compute_steps failed with:
Traceback (most recent call last):
File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/worker.py", line 7
27, in _process_task
self.actors[task.actor_id().id()], *arguments)
File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/rllib/ppo/runner.p
y", line 243, in compute_steps
trajectory = self.compute_trajectory(gamma, lam, horizon)
File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/rllib/ppo/runner.p
y", line 206, in compute_trajectory
self.env, horizon, self.observation_filter, self.reward_filter)
File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/rllib/ppo/rollout.
py", line 31, in rollouts
observation = observation_filter(env.reset())
File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/rllib/ppo/filter.p
y", line 120, in __call__
x = x / (self.rs.std + 1e-8)
File "/home/wojciech/miniconda3/envs/learning_to_run/lib/python3.6/site-packages/ray/rllib/ppo/filter.p
y", line 81, in std
return np.sqrt(self.var)
AttributeError: 'float' object has no attribute 'sqrt'
It is hard to debug since it pops after an hour of training but I have found that numpy raises this weird error in the following case:
python ray/python/ray/rllib/train.py --alg=PPO --env=CartPole-v0
It looks like the dtype of self.var
in
ray/python/ray/rllib/ppo/filter.py
Line 81
6ecc899
Interestingly, when the problem occurs both self._M and self._S are
ndarrays with dtype='object' and self._n = 23908550229251600946. Will
investigate further.
On 4 October 2017 at 19:24, Robert Nishihara ***@***.***> wrote:
Actually, I may have been mistaken and not successfully reproduced the
problem.. need to look into it more.
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<
#1077 (comment)>,
or mute the thread
<
https://github.com/notifications/unsubscribe-auth/AEuSHY0Aaj1vMkp8yezl2JjPcYA1-NXYks5so79lgaJpZM4PtZEr>
It is hard to debug for me. I was trying to investigate why self._n is so
large. It seems to grow to quickly. And I am pretty sure that it has to do
with the number of workers. When I print self._n in the push function then
from iteration 1 on strange things start to show up for 2 workers:
===> iteration 1
push 1035
push 9
push 1036
push 10
push 11
push 1037
push 12
push 1038
push 13
push 1039
For 5 workers:
===> iteration 1
push 20
push 993
push 1036
push 1037
push 1005
push 21
push 994
push 1037
push 1038
push 1038
push 22
push 995
push 1006
push 23
whereas when I use only 1 worker the output is:
===> iteration 1
push 9
push 10
push 11
push 12
push 13
push 14
push 15
push 16
push 17
push 18
push 19
push 20
I wonder whether it has anything to do with the warning I get on executing
the training:
WARNING: Serializing objects of type <class
'ray.rllib.ppo.filter.MeanStdFilter'> by expa
nding them as dictionaries of their fields. This behavior may be incorrect
in some cases.
WARNING: Serializing objects of type <class
'ray.rllib.ppo.filter.RunningStat'> by expand
ing them as dictionaries of their fields. This behavior may be incorrect in
some cases.
Strangely, it does not seem to affect Pong-v0...
On 4 October 2017 at 21:54, Wojciech Jaśkowski <wojciech.jaskowski@gmail.com
wrote:
Interestingly, when the problem occurs both self._M and self._S are
ndarrays with dtype='object' and self._n = 23908550229251600946. Will
investigate further.
On 4 October 2017 at 19:24, Robert Nishihara ***@***.***>
wrote:
> Actually, I may have been mistaken and not successfully reproduced the
> problem.. need to look into it more.
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <
#1077 (comment)>,
> or mute the thread
> <
https://github.com/notifications/unsubscribe-auth/AEuSHY0Aaj1vMkp8yezl2JjPcYA1-NXYks5so79lgaJpZM4PtZEr>
Hopefully, it will also shed some light on the dtype='object' problem.
On 4 October 2017 at 22:33, Wojciech Jaśkowski <wojciech.jaskowski@gmail.com
wrote:
It is hard to debug for me. I was trying to investigate why self._n is so
large. It seems to grow to quickly. And I am pretty sure that it has to do
with the number of workers. When I print self._n in the push function then
from iteration 1 on strange things start to show up for 2 workers:
===> iteration 1
push 1035
push 9
push 1036
push 10
push 11
push 1037
push 12
push 1038
push 13
push 1039
For 5 workers:
===> iteration 1
push 20
push 993
push 1036
push 1037
push 1005
push 21
push 994
push 1037
push 1038
push 1038
push 22
push 995
push 1006
push 23
whereas when I use only 1 worker the output is:
===> iteration 1
push 9
push 10
push 11
push 12
push 13
push 14
push 15
push 16
push 17
push 18
push 19
push 20
I wonder whether it has anything to do with the warning I get on executing
the training:
WARNING: Serializing objects of type <class 'ray.rllib.ppo.filter.MeanStdFilter'>
by expa
nding them as dictionaries of their fields. This behavior may be incorrect
in some cases.
WARNING: Serializing objects of type <class 'ray.rllib.ppo.filter.RunningStat'>
by expand
ing them as dictionaries of their fields. This behavior may be incorrect
in some cases.
Strangely, it does not seem to affect Pong-v0...
On 4 October 2017 at 21:54, Wojciech Jaśkowski <
***@***.***> wrote:
> Interestingly, when the problem occurs both self._M and self._S are
> ndarrays with dtype='object' and self._n = 23908550229251600946. Will
> investigate further.
> On 4 October 2017 at 19:24, Robert Nishihara ***@***.***>
> wrote:
>> Actually, I may have been mistaken and not successfully reproduced the
>> problem.. need to look into it more.
>> You are receiving this because you authored the thread.
>> Reply to this email directly, view it on GitHub
>> <
#1077 (comment)>,
>> or mute the thread
>> <
https://github.com/notifications/unsubscribe-auth/AEuSHY0Aaj1vMkp8yezl2JjPcYA1-NXYks5so79lgaJpZM4PtZEr>
I can reproduce the problem by running PPO on CartPole-v0, e.g.,
python python/ray/rllib/train.py --alg=PPO --env=CartPole-v0 --config='{"num_sgd_iter": 5, "num_workers": 2, "model": {"fcnet_hiddens": [10, 10]}}'
The changes in #1082 seem to fix the problem for me (it runs on CartPole without the error appearing).
Some of the values in the observation filter and reward filter were growing at an exponential rate due to the bug described in #1081. I haven't tracked down exactly how this lead to a numpy array with dtype=object
, but note that
>>> import numpy as np
>>> np.array(1 << 100)
array(1267650600228229401496703205376, dtype=object)
So it's plausible that the problems are related.
Do you want to try the changes in #1082 and see if that fixes the problem?
On 5 October 2017 at 04:57, Robert Nishihara ***@***.***> wrote:
Do you want to try the changes in
#1082
<
#1082> and see if that fixes the
problem?
This filter is crucial for my application. Without it I will not be able to
learn anything.