ray: [RLlib] Bugs in PyTorch version of DDPG

Hello,

I spotted some bugs in the implementation of DDPG in PyTorch.

The gradient clipping is not implemented correctly, it uses the ‘grad_norm_clipping’ parameter instead of ‘grad_clip’ and the function ray.rllib.utils.torch_ops.minimize_and_clip which is not correct. All the others algorithms seems to use ray.rllib.agents.a3c.a3c_torch_policy.apply_grad_clipping. I propose to replace minimize_and_clip by the A3C function in torch_ops.
In the PyTorch model of DDPG the parameters to bound the action space (range and minimum) are not tracked by PyTorch as they are not registered as parameters. This means that they are not converted to cuda tensors resulting in an error.
The target model is placed on the gpu even if ray was not configure to use the gpu.

I will make a PR with everything. But I don’t know if I should replace minimize_and_clip.

I have verified my script runs in a clean environment and reproduces the issue.
I have verified the issue also occurs with the latest wheels.

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 1
Comments: 23 (23 by maintainers)

Commits related to this issue

Fix gradient clipping (fix ray-project/ray#9667) — committed to raphaelavalos/ray by raphaelavalos 4 years ago
Fix target model placement (fix ray-project/ray#9667) — committed to raphaelavalos/ray by raphaelavalos 4 years ago
Fix bounded action space parameter placement (fix ray-project/ray#9667) Updated tests. — committed to raphaelavalos/ray by raphaelavalos 4 years ago
Improve tests (fix ray-project/ray#9667) — committed to raphaelavalos/ray by raphaelavalos 4 years ago
Fix gradient clipping (fix ray-project/ray#9667) — committed to raphaelavalos/ray by raphaelavalos 4 years ago
Fix target model placement (fix ray-project/ray#9667) — committed to raphaelavalos/ray by raphaelavalos 4 years ago
Fix bounded action space parameter placement (fix ray-project/ray#9667) Updated tests. — committed to raphaelavalos/ray by raphaelavalos 4 years ago
Improve tests (fix ray-project/ray#9667) — committed to raphaelavalos/ray by raphaelavalos 4 years ago
Clamp to min max (fix ray-project/ray#9667) — committed to raphaelavalos/ray by raphaelavalos 4 years ago
Register as buffer instead of parameter (fix ray-project/ray#9667) — committed to raphaelavalos/ray by raphaelavalos 4 years ago
policy/q variables use parameters instead of state_dict (fix ray-project/ray#9667) — committed to raphaelavalos/ray by raphaelavalos 4 years ago

Most upvoted comments

@sven1977 would it be possible to keep the issue open until #9683 is merged ?

raphaelavalos on Jul 30, 2020

@sven1977 @raphaelavalos Other bug: Pytorch state_dict method is used to get a dict of model parameters, which is wrong. state_dict returns something way more complex than that, including attributes. named_parameters should be used instead. Moreover, it is sometimes returning a dict instead of an OrderedDict as it should be. The issue occurs here for DDPG. It should be instead:

    def policy_variables(self, as_dict=False):
        """Return the list of variables for the policy net."""
        if as_dict:
            return OrderedDict(self.policy_model.named_parameters())
        return list(self.policy_model.parameters())

    def q_variables(self, as_dict=False):
        """Return the list of variables for Q / twin Q nets."""
        if as_dict:
            return OrderedDict((
                *self.q_model.named_parameters(),
                *(self.twin_q_model.named_parameters() if self.twin_q_model else [])
            ))
        return [*self.q_model.parameters(),
                *(self.twin_q_model.parameters() if self.twin_q_model else [])]

duburcqa on Jul 25, 2020

@duburcqa @raphaelavalos Btw, we are very close to setting up daily automatic GPU + “heavy” regression tests (Atari, MuJoCo) to catch these things much earlier than we do right now. Hopefully, this will eliminate issues like the GPU one here altogether.

sven1977 on Jul 24, 2020

On the GPU: Yes, we are no longer checking, whether we have a GPU, but whether the config actually says: num_gpus > 0 and only then place the model on the GPU.

sven1977 on Jul 24, 2020

The target model is placed on the gpu even if ray was not configure to use the gpu. This should be fixed in the current master.

sven1977 on Jul 24, 2020

I would add other issue: clamp is used to clamp the action at several places here and here, while min/max combinaison should be used instead. Indeed, clamp does not support element-wise operation and this limitation is currently circumvent by clamping with respect to the first low/high bounds of the action space, which does not make any sense.

duburcqa on Jul 23, 2020