habitat-lab: Error while running ObjectNav task

❓ Help

I am trying to run objectnav using the script habitat_baseline/rl/ddppo/single_node.sh on 2 GPU machine. I have edited --exp-config flag to habitat_baselines/config/objectnav/ddppo_objectnav.yaml. I am getting the following error log

CHECKPOINT_FOLDER: new_checkpoints
CHECKPOINT_INTERVAL: 50
CMD_TRAILING_OPTS: []
ENV_NAME: NavRLEnv
EVAL:
  SPLIT: val
  USE_CKPT_CONFIG: True
EVAL_CKPT_PATH_DIR: new_checkpoints
LOG_FILE: train.log
LOG_INTERVAL: 10
NUM_PROCESSES: 1
NUM_UPDATES: 10000
ORBSLAM2:
  ANGLE_TH: 0.2617993877991494
  BETA: 100
  CAMERA_HEIGHT: 1.25
  DEPTH_DENORM: 10.0
  DIST_REACHED_TH: 0.15
  DIST_TO_STOP: 0.05
  D_OBSTACLE_MAX: 4.0
  D_OBSTACLE_MIN: 0.1
  H_OBSTACLE_MAX: 1.25
  H_OBSTACLE_MIN: 0.375
  MAP_CELL_SIZE: 0.1
  MAP_SIZE: 40
  MIN_PTS_IN_OBSTACLE: 320.0
  NEXT_WAYPOINT_TH: 0.5
  NUM_ACTIONS: 3
  PLANNER_MAX_STEPS: 500
  PREPROCESS_MAP: True
  SLAM_SETTINGS_PATH: habitat_baselines/slambased/data/mp3d3_small1k.yaml
  SLAM_VOCAB_PATH: habitat_baselines/slambased/data/ORBvoc.txt
RL:
  DDPPO:
    backbone: resnet50
    distrib_backend: NCCL
    num_recurrent_layers: 2
    pretrained: False
    pretrained_encoder: False
    pretrained_weights: data/ddppo-models/gibson-2plus-resnet50.pth
    reset_critic: True
    rnn_type: LSTM
    sync_frac: 0.6
    train_encoder: True
  PPO:
    clip_param: 0.2
    entropy_coef: 0.01
    eps: 1e-05
    gamma: 0.99
    hidden_size: 512
    lr: 2.5e-06
    max_grad_norm: 0.2
    num_mini_batch: 2
    num_steps: 128
    ppo_epoch: 2
    reward_window_size: 50
    tau: 0.95
    use_gae: True
    use_linear_clip_decay: False
    use_linear_lr_decay: False
    use_normalized_advantage: False
    value_loss_coef: 0.5
  REWARD_MEASURE: distance_to_goal
  SLACK_REWARD: -0.01
  SUCCESS_MEASURE: spl
  SUCCESS_REWARD: 2.5
SENSORS: ['DEPTH_SENSOR', 'RGB_SENSOR']
SIMULATOR_GPU_ID: 0
TASK_CONFIG:
  DATASET:
    CONTENT_SCENES: []
    DATA_PATH: data/datasets/objectnav/mp3d/v0/{split}/{split}.json.gz
    SCENES_DIR: data/scene_datasets/
    SPLIT: val
    TYPE: ObjectNav-v1
  ENVIRONMENT:
    ITERATOR_OPTIONS:
      CYCLE: True
      GROUP_BY_SCENE: True
      MAX_SCENE_REPEAT_EPISODES: -1
      MAX_SCENE_REPEAT_STEPS: 10000
      NUM_EPISODE_SAMPLE: -1
      SHUFFLE: True
      STEP_REPETITION_RANGE: 0.2
    MAX_EPISODE_SECONDS: 10000000
    MAX_EPISODE_STEPS: 500
  PYROBOT:
    BASE_CONTROLLER: proportional
    BASE_PLANNER: none
    BUMP_SENSOR:
      TYPE: PyRobotBumpSensor
    DEPTH_SENSOR:
      CENTER_CROP: False
      HEIGHT: 480
      MAX_DEPTH: 5.0
      MIN_DEPTH: 0.0
      NORMALIZE_DEPTH: True
      TYPE: PyRobotDepthSensor
      WIDTH: 640
    LOCOBOT:
      ACTIONS: ['BASE_ACTIONS', 'CAMERA_ACTIONS']
      BASE_ACTIONS: ['go_to_relative', 'go_to_absolute']
      CAMERA_ACTIONS: ['set_pan', 'set_tilt', 'set_pan_tilt']
    RGB_SENSOR:
      CENTER_CROP: False
      HEIGHT: 480
      TYPE: PyRobotRGBSensor
      WIDTH: 640
    ROBOT: locobot
    ROBOTS: ['locobot']
    SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR', 'BUMP_SENSOR']
  SEED: 100
  SIMULATOR:
    ACTION_SPACE_CONFIG: v1
    AGENTS: ['AGENT_0']
    AGENT_0:
      ANGULAR_ACCELERATION: 12.56
      ANGULAR_FRICTION: 1.0
      COEFFICIENT_OF_RESTITUTION: 0.0
      HEIGHT: 0.88
      IS_SET_START_STATE: False
      LINEAR_ACCELERATION: 20.0
      LINEAR_FRICTION: 0.5
      MASS: 32.0
      RADIUS: 0.2
      SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR']
      START_POSITION: [0, 0, 0]
      START_ROTATION: [0, 0, 0, 1]
    DEFAULT_AGENT_ID: 0
    DEPTH_SENSOR:
      HEIGHT: 480
      HFOV: 79
      MAX_DEPTH: 5.0
      MIN_DEPTH: 0.5
      NORMALIZE_DEPTH: True
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimDepthSensor
      WIDTH: 640
    FORWARD_STEP_SIZE: 0.25
    HABITAT_SIM_V0:
      ALLOW_SLIDING: True
      ENABLE_PHYSICS: False
      GPU_DEVICE_ID: 0
      GPU_GPU: False
      PHYSICS_CONFIG_FILE: ./data/default.phys_scene_config.json
    RGB_SENSOR:
      HEIGHT: 480
      HFOV: 79
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimRGBSensor
      WIDTH: 640
    SCENE: data/scene_datasets/habitat-test-scenes/van-gogh-room.glb
    SEED: 100
    SEMANTIC_SENSOR:
      HEIGHT: 480
      HFOV: 79
      POSITION: [0, 0.88, 0]
      TYPE: HabitatSimSemanticSensor
      WIDTH: 640
    TILT_ANGLE: 30
    TURN_ANGLE: 30
    TYPE: Sim-v0
  TASK:
    ACTIONS:
      ANSWER:
        TYPE: AnswerAction
      LOOK_DOWN:
        TYPE: LookDownAction
      LOOK_UP:
        TYPE: LookUpAction
      MOVE_FORWARD:
        TYPE: MoveForwardAction
      STOP:
        TYPE: StopAction
      TELEPORT:
        TYPE: TeleportAction
      TURN_LEFT:
        TYPE: TurnLeftAction
      TURN_RIGHT:
        TYPE: TurnRightAction
    ANSWER_ACCURACY:
      TYPE: AnswerAccuracy
    COLLISIONS:
      TYPE: Collisions
    COMPASS_SENSOR:
      TYPE: CompassSensor
    CORRECT_ANSWER:
      TYPE: CorrectAnswer
    DISTANCE_TO_GOAL:
      DISTANCE_TO: VIEW_POINTS
      TYPE: DistanceToGoal
    EPISODE_INFO:
      TYPE: EpisodeInfo
    GOAL_SENSOR_UUID: objectgoal
    GPS_SENSOR:
      DIMENSIONALITY: 2
      TYPE: GPSSensor
    HEADING_SENSOR:
      TYPE: HeadingSensor
    INSTRUCTION_SENSOR:
      TYPE: InstructionSensor
    INSTRUCTION_SENSOR_UUID: instruction
    MEASUREMENTS: ['DISTANCE_TO_GOAL', 'SPL']
    OBJECTGOAL_SENSOR:
      GOAL_SPEC: TASK_CATEGORY_ID
      GOAL_SPEC_MAX_VAL: 50
      TYPE: ObjectGoalSensor
    POINTGOAL_SENSOR:
      DIMENSIONALITY: 2
      GOAL_FORMAT: POLAR
      TYPE: PointGoalSensor
    POINTGOAL_WITH_GPS_COMPASS_SENSOR:
      DIMENSIONALITY: 2
      GOAL_FORMAT: POLAR
      TYPE: PointGoalWithGPSCompassSensor
    POSSIBLE_ACTIONS: ['STOP', 'MOVE_FORWARD', 'TURN_LEFT', 'TURN_RIGHT', 'LOOK_UP', 'LOOK_DOWN']
    PROXIMITY_SENSOR:
      MAX_DETECTION_RADIUS: 2.0
      TYPE: ProximitySensor
    QUESTION_SENSOR:
      TYPE: QuestionSensor
    SENSORS: ['OBJECTGOAL_SENSOR', 'COMPASS_SENSOR', 'GPS_SENSOR']
    SPL:
      DISTANCE_TO: VIEW_POINTS
      SUCCESS_DISTANCE: 0.2
      TYPE: SPL
    SUCCESS_DISTANCE: 0.1
    TOP_DOWN_MAP:
      DRAW_BORDER: True
      DRAW_GOAL_AABBS: True
      DRAW_GOAL_POSITIONS: True
      DRAW_SHORTEST_PATH: True
      DRAW_SOURCE: True
      DRAW_VIEW_POINTS: True
      FOG_OF_WAR:
        DRAW: True
        FOV: 90
        VISIBILITY_DIST: 5.0
      MAP_PADDING: 3
      MAP_RESOLUTION: 1250
      MAX_EPISODE_STEPS: 1000
      NUM_TOPDOWN_MAP_SAMPLE_POINTS: 20000
      TYPE: TopDownMap
    TYPE: ObjectNav-v1
TENSORBOARD_DIR: tb1
TEST_EPISODE_COUNT: 2184
TORCH_GPU_ID: 1
TRAINER_NAME: ppo
VIDEO_DIR: video_dir
VIDEO_OPTION: ['disk', 'tensorboard']
2020-02-20 13:40:17,978 Initializing dataset ObjectNav-v1
2020-02-20 13:40:38,077 Initializing dataset ObjectNav-v1
2020-02-20 13:40:56,217 initializing sim Sim-v0
2020-02-20 13:41:09,272 Initializing task ObjectNav-v1
2020-02-20 13:41:11,333 agent number of parameters: 71371399
Traceback (most recent call last):
  File "habitat_baselines/run.py", line 68, in <module>
    main()
  File "habitat_baselines/run.py", line 38, in main
    run_exp(**vars(args))
  File "habitat_baselines/run.py", line 62, in run_exp
    trainer.train()
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 300, in train
    episode_counts,
  File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 146, in _collect_rollout_step
    outputs = self.envs.step([a[0].item() for a in actions])
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 339, in step
    return self.wait_step()
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 326, in wait_step
    observations.append(read_fn())
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError
Exception ignored in: <bound method VectorEnv.__del__ of <habitat.core.vector_env.VectorEnv object at 0x7efc2f8b1668>>
Traceback (most recent call last):
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 468, in __del__
  File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 347, in close
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 250, in recv
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 375, in _recv
AttributeError: 'NoneType' object has no attribute 'BytesIO'
Traceback (most recent call last):
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/torch-1.4.0-py3.6-linux-x86_64.egg/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/torch-1.4.0-py3.6-linux-x86_64.egg/torch/distributed/launch.py", line 259, in main
    cmd=cmd)
subprocess.CalledProcessError: Command '['/local-scratch/anaconda3/envs/habitat/bin/python', '-u', 'habitat_baselines/run.py', '--exp-config', 'habitat_baselines/config/objectnav/ddppo_objectnav.yaml', '--run-type', 'train']' returned non-zero exit status 1.

Thanks in advance

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 29 (19 by maintainers)

Commits related to this issue

Most upvoted comments

@erikwijmans, thank you for bringing it up. The PointNav v1 MP3D dataset was updated with remove of that faulty episodes: https://dl.fbaipublicfiles.com/habitat/data/datasets/pointnav/mp3d/v1/pointnav_mp3d_v1.zip. cc @srama2512

We fixed a bug in the navmeshes a while back that made a few episodes for MP3D pointnav invalid (they shouldn’t have ever been valid and I have no idea how they ever were), this is the script I made for finding/removing them: https://gist.github.com/erikwijmans/e4410f0e12facb87890e919aa264e3fe – They are just train episodes fortunately

@mathfac did the cleaned up versions ever get re-uploaded?

I’m getting DistanceToGoal metric as NaN in the PointNav training on MP3D. The episode ID might be different than the standard datasets since I made some changes. So, I’ve given the scene id, agent position and goal position.

Episode id:  67230
Scene id: data/scene_datasets/mp3d/dhjEzFoUFzH/dhjEzFoUFzH.glb
Agent position: [ -1.4486076   -0.35115004 -34.537006  ]
Goal position: [0.5637829899787903, -0.15369552373886108, -42.55187225341797]

I haven’t had the time to investigate why this is happening. Temporarily, I’m replacing geodesic distance with euclidean distance if this happens so that I can continue to train my agents.

I used watch free -g to monitor memory. Before I run the code, free memory is 41GB as shown here:

                 total        used        free      shared  buff/cache   available
Mem:             62           7          41           0          13          54
Swap:             0           0           0

During the entire runtime of the code, free memory doesn’t fall below 36GB