habitat-lab: Error while running ObjectNav task
❓ Help
I am trying to run objectnav using the script habitat_baseline/rl/ddppo/single_node.sh
on 2 GPU machine. I have edited --exp-config
flag to habitat_baselines/config/objectnav/ddppo_objectnav.yaml
.
I am getting the following error log
CHECKPOINT_FOLDER: new_checkpoints
CHECKPOINT_INTERVAL: 50
CMD_TRAILING_OPTS: []
ENV_NAME: NavRLEnv
EVAL:
SPLIT: val
USE_CKPT_CONFIG: True
EVAL_CKPT_PATH_DIR: new_checkpoints
LOG_FILE: train.log
LOG_INTERVAL: 10
NUM_PROCESSES: 1
NUM_UPDATES: 10000
ORBSLAM2:
ANGLE_TH: 0.2617993877991494
BETA: 100
CAMERA_HEIGHT: 1.25
DEPTH_DENORM: 10.0
DIST_REACHED_TH: 0.15
DIST_TO_STOP: 0.05
D_OBSTACLE_MAX: 4.0
D_OBSTACLE_MIN: 0.1
H_OBSTACLE_MAX: 1.25
H_OBSTACLE_MIN: 0.375
MAP_CELL_SIZE: 0.1
MAP_SIZE: 40
MIN_PTS_IN_OBSTACLE: 320.0
NEXT_WAYPOINT_TH: 0.5
NUM_ACTIONS: 3
PLANNER_MAX_STEPS: 500
PREPROCESS_MAP: True
SLAM_SETTINGS_PATH: habitat_baselines/slambased/data/mp3d3_small1k.yaml
SLAM_VOCAB_PATH: habitat_baselines/slambased/data/ORBvoc.txt
RL:
DDPPO:
backbone: resnet50
distrib_backend: NCCL
num_recurrent_layers: 2
pretrained: False
pretrained_encoder: False
pretrained_weights: data/ddppo-models/gibson-2plus-resnet50.pth
reset_critic: True
rnn_type: LSTM
sync_frac: 0.6
train_encoder: True
PPO:
clip_param: 0.2
entropy_coef: 0.01
eps: 1e-05
gamma: 0.99
hidden_size: 512
lr: 2.5e-06
max_grad_norm: 0.2
num_mini_batch: 2
num_steps: 128
ppo_epoch: 2
reward_window_size: 50
tau: 0.95
use_gae: True
use_linear_clip_decay: False
use_linear_lr_decay: False
use_normalized_advantage: False
value_loss_coef: 0.5
REWARD_MEASURE: distance_to_goal
SLACK_REWARD: -0.01
SUCCESS_MEASURE: spl
SUCCESS_REWARD: 2.5
SENSORS: ['DEPTH_SENSOR', 'RGB_SENSOR']
SIMULATOR_GPU_ID: 0
TASK_CONFIG:
DATASET:
CONTENT_SCENES: []
DATA_PATH: data/datasets/objectnav/mp3d/v0/{split}/{split}.json.gz
SCENES_DIR: data/scene_datasets/
SPLIT: val
TYPE: ObjectNav-v1
ENVIRONMENT:
ITERATOR_OPTIONS:
CYCLE: True
GROUP_BY_SCENE: True
MAX_SCENE_REPEAT_EPISODES: -1
MAX_SCENE_REPEAT_STEPS: 10000
NUM_EPISODE_SAMPLE: -1
SHUFFLE: True
STEP_REPETITION_RANGE: 0.2
MAX_EPISODE_SECONDS: 10000000
MAX_EPISODE_STEPS: 500
PYROBOT:
BASE_CONTROLLER: proportional
BASE_PLANNER: none
BUMP_SENSOR:
TYPE: PyRobotBumpSensor
DEPTH_SENSOR:
CENTER_CROP: False
HEIGHT: 480
MAX_DEPTH: 5.0
MIN_DEPTH: 0.0
NORMALIZE_DEPTH: True
TYPE: PyRobotDepthSensor
WIDTH: 640
LOCOBOT:
ACTIONS: ['BASE_ACTIONS', 'CAMERA_ACTIONS']
BASE_ACTIONS: ['go_to_relative', 'go_to_absolute']
CAMERA_ACTIONS: ['set_pan', 'set_tilt', 'set_pan_tilt']
RGB_SENSOR:
CENTER_CROP: False
HEIGHT: 480
TYPE: PyRobotRGBSensor
WIDTH: 640
ROBOT: locobot
ROBOTS: ['locobot']
SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR', 'BUMP_SENSOR']
SEED: 100
SIMULATOR:
ACTION_SPACE_CONFIG: v1
AGENTS: ['AGENT_0']
AGENT_0:
ANGULAR_ACCELERATION: 12.56
ANGULAR_FRICTION: 1.0
COEFFICIENT_OF_RESTITUTION: 0.0
HEIGHT: 0.88
IS_SET_START_STATE: False
LINEAR_ACCELERATION: 20.0
LINEAR_FRICTION: 0.5
MASS: 32.0
RADIUS: 0.2
SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR']
START_POSITION: [0, 0, 0]
START_ROTATION: [0, 0, 0, 1]
DEFAULT_AGENT_ID: 0
DEPTH_SENSOR:
HEIGHT: 480
HFOV: 79
MAX_DEPTH: 5.0
MIN_DEPTH: 0.5
NORMALIZE_DEPTH: True
POSITION: [0, 0.88, 0]
TYPE: HabitatSimDepthSensor
WIDTH: 640
FORWARD_STEP_SIZE: 0.25
HABITAT_SIM_V0:
ALLOW_SLIDING: True
ENABLE_PHYSICS: False
GPU_DEVICE_ID: 0
GPU_GPU: False
PHYSICS_CONFIG_FILE: ./data/default.phys_scene_config.json
RGB_SENSOR:
HEIGHT: 480
HFOV: 79
POSITION: [0, 0.88, 0]
TYPE: HabitatSimRGBSensor
WIDTH: 640
SCENE: data/scene_datasets/habitat-test-scenes/van-gogh-room.glb
SEED: 100
SEMANTIC_SENSOR:
HEIGHT: 480
HFOV: 79
POSITION: [0, 0.88, 0]
TYPE: HabitatSimSemanticSensor
WIDTH: 640
TILT_ANGLE: 30
TURN_ANGLE: 30
TYPE: Sim-v0
TASK:
ACTIONS:
ANSWER:
TYPE: AnswerAction
LOOK_DOWN:
TYPE: LookDownAction
LOOK_UP:
TYPE: LookUpAction
MOVE_FORWARD:
TYPE: MoveForwardAction
STOP:
TYPE: StopAction
TELEPORT:
TYPE: TeleportAction
TURN_LEFT:
TYPE: TurnLeftAction
TURN_RIGHT:
TYPE: TurnRightAction
ANSWER_ACCURACY:
TYPE: AnswerAccuracy
COLLISIONS:
TYPE: Collisions
COMPASS_SENSOR:
TYPE: CompassSensor
CORRECT_ANSWER:
TYPE: CorrectAnswer
DISTANCE_TO_GOAL:
DISTANCE_TO: VIEW_POINTS
TYPE: DistanceToGoal
EPISODE_INFO:
TYPE: EpisodeInfo
GOAL_SENSOR_UUID: objectgoal
GPS_SENSOR:
DIMENSIONALITY: 2
TYPE: GPSSensor
HEADING_SENSOR:
TYPE: HeadingSensor
INSTRUCTION_SENSOR:
TYPE: InstructionSensor
INSTRUCTION_SENSOR_UUID: instruction
MEASUREMENTS: ['DISTANCE_TO_GOAL', 'SPL']
OBJECTGOAL_SENSOR:
GOAL_SPEC: TASK_CATEGORY_ID
GOAL_SPEC_MAX_VAL: 50
TYPE: ObjectGoalSensor
POINTGOAL_SENSOR:
DIMENSIONALITY: 2
GOAL_FORMAT: POLAR
TYPE: PointGoalSensor
POINTGOAL_WITH_GPS_COMPASS_SENSOR:
DIMENSIONALITY: 2
GOAL_FORMAT: POLAR
TYPE: PointGoalWithGPSCompassSensor
POSSIBLE_ACTIONS: ['STOP', 'MOVE_FORWARD', 'TURN_LEFT', 'TURN_RIGHT', 'LOOK_UP', 'LOOK_DOWN']
PROXIMITY_SENSOR:
MAX_DETECTION_RADIUS: 2.0
TYPE: ProximitySensor
QUESTION_SENSOR:
TYPE: QuestionSensor
SENSORS: ['OBJECTGOAL_SENSOR', 'COMPASS_SENSOR', 'GPS_SENSOR']
SPL:
DISTANCE_TO: VIEW_POINTS
SUCCESS_DISTANCE: 0.2
TYPE: SPL
SUCCESS_DISTANCE: 0.1
TOP_DOWN_MAP:
DRAW_BORDER: True
DRAW_GOAL_AABBS: True
DRAW_GOAL_POSITIONS: True
DRAW_SHORTEST_PATH: True
DRAW_SOURCE: True
DRAW_VIEW_POINTS: True
FOG_OF_WAR:
DRAW: True
FOV: 90
VISIBILITY_DIST: 5.0
MAP_PADDING: 3
MAP_RESOLUTION: 1250
MAX_EPISODE_STEPS: 1000
NUM_TOPDOWN_MAP_SAMPLE_POINTS: 20000
TYPE: TopDownMap
TYPE: ObjectNav-v1
TENSORBOARD_DIR: tb1
TEST_EPISODE_COUNT: 2184
TORCH_GPU_ID: 1
TRAINER_NAME: ppo
VIDEO_DIR: video_dir
VIDEO_OPTION: ['disk', 'tensorboard']
2020-02-20 13:40:17,978 Initializing dataset ObjectNav-v1
2020-02-20 13:40:38,077 Initializing dataset ObjectNav-v1
2020-02-20 13:40:56,217 initializing sim Sim-v0
2020-02-20 13:41:09,272 Initializing task ObjectNav-v1
2020-02-20 13:41:11,333 agent number of parameters: 71371399
Traceback (most recent call last):
File "habitat_baselines/run.py", line 68, in <module>
main()
File "habitat_baselines/run.py", line 38, in main
run_exp(**vars(args))
File "habitat_baselines/run.py", line 62, in run_exp
trainer.train()
File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 300, in train
episode_counts,
File "/local-scratch/habitat-api1/habitat_baselines/rl/ppo/ppo_trainer.py", line 146, in _collect_rollout_step
outputs = self.envs.step([a[0].item() for a in actions])
File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 339, in step
return self.wait_step()
File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 326, in wait_step
observations.append(read_fn())
File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
Exception ignored in: <bound method VectorEnv.__del__ of <habitat.core.vector_env.VectorEnv object at 0x7efc2f8b1668>>
Traceback (most recent call last):
File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 468, in __del__
File "/local-scratch/habitat-api1/habitat/core/vector_env.py", line 347, in close
File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 250, in recv
File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 375, in _recv
AttributeError: 'NoneType' object has no attribute 'BytesIO'
Traceback (most recent call last):
File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/torch-1.4.0-py3.6-linux-x86_64.egg/torch/distributed/launch.py", line 263, in <module>
main()
File "/local-scratch/anaconda3/envs/habitat/lib/python3.6/site-packages/torch-1.4.0-py3.6-linux-x86_64.egg/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/local-scratch/anaconda3/envs/habitat/bin/python', '-u', 'habitat_baselines/run.py', '--exp-config', 'habitat_baselines/config/objectnav/ddppo_objectnav.yaml', '--run-type', 'train']' returned non-zero exit status 1.
Thanks in advance
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 29 (19 by maintainers)
Commits related to this issue
- Trigger event listener for keys at capturing phase (#308) - This will help when a user has some extension overriding keys — committed to facebookresearch/habitat-lab by apsdehal 5 years ago
@erikwijmans, thank you for bringing it up. The PointNav v1 MP3D dataset was updated with remove of that faulty episodes: https://dl.fbaipublicfiles.com/habitat/data/datasets/pointnav/mp3d/v1/pointnav_mp3d_v1.zip. cc @srama2512
We fixed a bug in the navmeshes a while back that made a few episodes for MP3D pointnav invalid (they shouldn’t have ever been valid and I have no idea how they ever were), this is the script I made for finding/removing them: https://gist.github.com/erikwijmans/e4410f0e12facb87890e919aa264e3fe – They are just train episodes fortunately
@mathfac did the cleaned up versions ever get re-uploaded?
I’m getting DistanceToGoal metric as NaN in the PointNav training on MP3D. The episode ID might be different than the standard datasets since I made some changes. So, I’ve given the scene id, agent position and goal position.
I haven’t had the time to investigate why this is happening. Temporarily, I’m replacing geodesic distance with euclidean distance if this happens so that I can continue to train my agents.
I used
watch free -g
to monitor memory. Before I run the code, free memory is 41GB as shown here:During the entire runtime of the code, free memory doesn’t fall below 36GB