ray: Windows: rllib RNN assert seq_lens is not None

What is the problem?

Using the option “use_lstm” = True ends in an assertion error. It appears that the model is always called with sequence length None. I’m not sure if this is a bug, but according to the documentation adding this option should just wrap the model with an LSTM cell. Weirdly enough it also happens with the cartpole environment. Is this intended behaviour?

`Traceback (most recent call last): File “D:/Seafile/Programming projects/rl_trading/test.py”, line 44, in <module> trainer = sac.SACTrainer(config=config, env=“CartPole-v0”) File “C:\Python38\lib\site-packages\ray\rllib\agents\trainer_template.py”, line 88, in init Trainer.init(self, config, env, logger_creator) File “C:\Python38\lib\site-packages\ray\rllib\agents\trainer.py”, line 479, in init super().init(config, logger_creator) File “C:\Python38\lib\site-packages\ray\tune\trainable.py”, line 245, in init self.setup(copy.deepcopy(self.config)) File “C:\Python38\lib\site-packages\ray\rllib\agents\trainer.py”, line 643, in setup self._init(self.config, self.env_creator) File “C:\Python38\lib\site-packages\ray\rllib\agents\trainer_template.py”, line 101, in _init self.workers = self._make_workers( File “C:\Python38\lib\site-packages\ray\rllib\agents\trainer.py”, line 708, in _make_workers return WorkerSet( File “C:\Python38\lib\site-packages\ray\rllib\evaluation\worker_set.py”, line 66, in init self._local_worker = self._make_worker( File “C:\Python38\lib\site-packages\ray\rllib\evaluation\worker_set.py”, line 259, in _make_worker worker = cls( File “C:\Python38\lib\site-packages\ray\rllib\evaluation\rollout_worker.py”, line 403, in init self._build_policy_map(policy_dict, policy_config) File “C:\Python38\lib\site-packages\ray\rllib\evaluation\rollout_worker.py”, line 986, in _build_policy_map policy_map[name] = cls(obs_space, act_space, merged_conf) File “C:\Python38\lib\site-packages\ray\rllib\policy\tf_policy_template.py”, line 132, in init DynamicTFPolicy.init( File “C:\Python38\lib\site-packages\ray\rllib\policy\dynamic_tf_policy.py”, line 236, in init action_distribution_fn( File “C:\Python38\lib\site-packages\ray\rllib\agents\sac\sac_tf_policy.py”, line 108, in get_distribution_inputs_and_class model_out, state_out = model({ File “C:\Python38\lib\site-packages\ray\rllib\models\modelv2.py”, line 202, in call res = self.forward(restored, state or [], seq_lens) File “C:\Python38\lib\site-packages\ray\rllib\models\tf\recurrent_net.py”, line 157, in forward assert seq_lens is not None AssertionError

Process finished with exit code 1 `

Ray version and other system information (Python version, TensorFlow version, OS): Python 3.8.5 TensorFlow 2.3 Windows 10

Reproduction (REQUIRED)

Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):

`import ray from ray.rllib.agents import sac

config = sac.DEFAULT_CONFIG.copy() config[“num_gpus”] = 1 config[“num_workers”] = 1 config[“framework”] = “tf” config[“model”][“use_lstm”] = True

ray.init(include_dashboard=False)

trainer = sac.SACTrainer(config=config, env=“CartPole-v0”) for i in range(10): result = trainer.train()`

[x ] I have verified my script runs in a clean environment and reproduces the issue.
[x ] I have verified the issue also occurs with the latest wheels.

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 19 (1 by maintainers)

Most upvoted comments

Hello! Have same problem with PPOTrainer with use_lstm=True

tokarev-i-v on Aug 20, 2022

hello, I have exactly the same thing during inference time with a PPO trainer, It works fine during training but fails on inference… thank you @jobeid1 for your code snippet, I can’t seem to make it work on my side. Is it mandatory to access to the policy of the agent before executing the step method ? I usually do agent.compute_single_action directly I have tried the rc from ray (ray==2.0.0rc0) but the problem still remains 😦 Does the ray team have this on the radar ? All the best and thank you for your contributions !

davidlarcher on Aug 5, 2022

@evo11x when using attention, something along these lines should work:

transformer_attention_size = policy_config[3]["model"]["attention_dim"]
transformer_memory_size = policy_config[3]["model"]["attention_memory_inference"]
transformer_layer_size = np.zeros([transformer_memory_size, transformer_attention_size])
transformer_length = policy_config[3]["model"]["attention_num_transformer_units"]
state_list = transformer_length * [transformer_layer_size]
initial_state_list = state_list

for agent in env.agent_iter():
    policy_key = agent_id_to_policy_key(agent)
    observation, reward, done, info = env.last()

    if done:
       action = None
       state_list = initial_state_list
       break
    policy_config = config["multiagent"]["policies"][policy_key]
    policy = PPOagent.get_policy(policy_key)
    action, next_state, _ = policy.compute_single_action(obs=observation, state=state_list)
    state_list = [np.concatenate((state_list[i], [next_state[i]]))[1:] for i in range(transformer_length)]

    env.step(action)

The code may require minor adjustments to work and obviously will if you want to use LSTM. Hope this helps!

jobeid1 on Jul 18, 2022

@evo11x My colleague discovered that RLLib doesn’t automatically initialize state or update state when using a trainer with attention or lstm as it does when using tune. It really ought to, and I think they have plans to make this easier in version 2.0. You will need to initialize the state and then update the state at each step.

jobeid1 on Jul 10, 2022

Same problem with PPOTrainer with use_lstm=True

jiahao-shen on Oct 19, 2022

@evo11x I’m having the same issue with the attention net using a PPOTrainer. Did you find a solution?

jobeid1 on Jul 1, 2022