ray: [Rllib] Using Dict state space throws exception (not supported yet?)

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux 16.04
  • Ray installed from (source or binary): Source
  • Ray version:0.5.3
  • Python version: 3.6.6
  • Exact command to reproduce:

Describe the problem

I am trying to use Dict state space with PPO but it throws an exception. More details below.

Source code / logs

class MyServing(ServingEnv):
    def __init__(self):
        ServingEnv.__init__(
            self, spaces.Box(-1.0, 1.0, (1,), dtype=np.float32),
            spaces.Dict({"image":spaces.Box(0.0, 1.0, (img_height, img_width, 3), dtype=np.float32),
                         "speed": spaces.Box(0.0, 1.0, (1,), dtype=np.float32)}))
    def run(self):
        print("Starting policy server at {}:{}".format(SERVER_ADDRESS,
                                                       SERVER_PORT))
        server = PolicyServer(self, SERVER_ADDRESS, SERVER_PORT)
        server.serve_forever()
if __name__ == "__main__":
    register_my_model()
    ray.init(num_gpus=1)
    register_env("srv", lambda _: MyServing())

    # We use DQN since it supports off-policy actions, but you can choose and
    # configure any agent.
    ppo = PPOAgent(
        env="srv",
        config={
            # Use a single process to avoid needing to set up a load balancer
            "num_workers": 0,
            "num_gpus": 1,
            "batch_mode": "complete_episodes",
            "train_batch_size": 2000,
            "model": {
                "custom_model": "mymodel"
            },
            #"num_gpus":1,
            # Configure the agent to run short iterations for debugging
            #"exploration_fraction": 0.01,
            #"learning_starts": 100,
            #"timesteps_per_iteration": 200,
            #"schedule_max_timesteps": 100000,
            #"gamma": 0.8,
            "tf_session_args": {
                "gpu_options": {"allow_growth": True},
            },
        })

  File "/RL/ray-master/ray/python/ray/rllib/my_scripts/ppo/udacity_server_ppo.py", line 142, in <module>
    "gpu_options": {"allow_growth": True},
  File "/RL/ray-master/ray/python/ray/rllib/agents/agent.py", line 216, in __init__
    Trainable.__init__(self, config, logger_creator)
  File "/RL/ray-master/ray/python/ray/tune/trainable.py", line 86, in __init__
    self._setup()
  File "/RL/ray-master/ray/python/ray/rllib/agents/agent.py", line 258, in _setup
    self._init()
  File "/RL/ray-master/ray/python/ray/rllib/agents/ppo/ppo.py", line 85, in _init
    self.env_creator, self._policy_graph)
  File "/RL/ray-master/ray/python/ray/rllib/agents/agent.py", line 131, in make_local_evaluator
    "inter_op_parallelism_threads": None,
  File "/RL/ray-master/ray/python/ray/rllib/agents/agent.py", line 171, in _make_evaluator
    monitor_path=self.logdir if config["monitor"] else None)
  File "/RL/ray-master/ray/python/ray/rllib/evaluation/policy_evaluator.py", line 228, in __init__
    policy_dict, policy_config)
  File "/RL/ray-master/ray/python/ray/rllib/evaluation/policy_evaluator.py", line 286, in _build_policy_map
    policy_map[name] = cls(obs_space, act_space, merged_conf)
  File "/RL/ray-master/ray/python/ray/rllib/agents/ppo/ppo_policy_graph.py", line 123, in __init__
    shape=(None, ) + observation_space.shape)
TypeError: can only concatenate tuple (not "NoneType") to tuple

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 2
  • Comments: 15 (1 by maintainers)

Most upvoted comments

@ericl I can confirm that, the dict state space works now. Thanks for the fix.