carla: Carla segfaults within 127 Episodes
Environment
The following has been tested in Ubuntu with Carla 0.9.9.4, started using CarlaUE4.sh -opengl -quality-level=Low
:
Trigger Code
Consider the following proof of concept code (does not logically make a lot of sense, since it was stripped down from an old version of our reinforcement learning infrastructure to be a minimal reproducer that makes Carla misbehave and crash reliably):
import carla
def update_data(self, data):
self.data = data
def spawn_and_destroy():
client = carla.Client("localhost", 2000)
client.set_timeout(3)
world = client.load_world("Town04")
blueprints = world.get_blueprint_library()
settings = world.get_settings()
settings.fixed_delta_seconds = 0.05
settings.synchronous_mode = True
settings.no_rendering_mode = True
world.apply_settings(settings)
car = blueprints.find("vehicle.tesla.model3")
position = carla.Transform(carla.Location(x=300, y=13.5, z=2), carla.Rotation(yaw=180))
vehicle = world.spawn_actor(car, position)
collision_actor = world.spawn_actor(blueprints.find("sensor.other.collision"),
carla.Transform(carla.Location()),
attach_to=vehicle)
collision_actor.listen(update_data)
lane_actor = world.spawn_actor(blueprints.find("sensor.other.lane_invasion"),
carla.Transform(), # carla.Location()),
attach_to=vehicle)
vehicle.apply_control(carla.VehicleControl(hand_brake=1))
lane_actor.listen(update_data)
for i in range(10):
try:
world.tick(1)
except:
print("WARNING: tick timed out, continuing ...")
vehicle.destroy()
collision_actor.destroy()
lane_actor.destroy()
for i in range(150):
print("Episode", i)
spawn_and_destroy()
Starting Carla and then executing this file reliably produces problems within 127 Episodes here, though the behavior is not always exactly the same, see the following sections.
Crash variant 1
After 127 Episodes the script ends:
Episode 127
Traceback (most recent call last):
File "carla-segfault-min.py", line 87, in <module>
spawn_and_destroy()
File "carla-segfault-min.py", line 51, in spawn_and_destroy
client = carla.Client("localhost", 2000)
RuntimeError: resolve: Device or resource busy
and Carla crashes with the following message:
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
Signal 11 caught.
Malloc Size=65538 LargeMemoryPoolOffset=65554
CommonUnixCrashHandler: Signal=11
Malloc Size=65535 LargeMemoryPoolOffset=131119
Malloc Size=140864 LargeMemoryPoolOffset=272000
Engine crash handling finished; re-raising signal 11 for the default handler. Good bye.
Segmentation fault (core dumped)```
Crash variant 2
After Episode 127 the script stops:
Episode 127
Traceback (most recent call last):
File "carla-segfault-min.py", line 55, in <module>
spawn_and_destroy()
File "carla-segfault-min.py", line 19, in spawn_and_destroy
client = carla.Client("localhost", 2000)
RuntimeError: resolve: Device or resource busy
and Carla crashes with the following message:
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
LowLevelFatalError [File:Unknown] [Line: 102]
Exception thrown: close: Bad file descriptor
Signal 11 caught.
Malloc Size=65538 LargeMemoryPoolOffset=65554
CommonUnixCrashHandler: Signal=11
Malloc Size=65535 LargeMemoryPoolOffset=131119
Malloc Size=140864 LargeMemoryPoolOffset=272000
Engine crash handling finished; re-raising signal 11 for the default handler. Good bye.
Segmentation fault (core dumped)
Crash variant 3
I tested the above script multiple times and at one occurrence Carla already crashed within 27 episodes:
Episode 27
Traceback (most recent call last):
File "carla-segfault-min.py", line 87, in <module>
spawn_and_destroy()
File "carla-segfault-min.py", line 53, in spawn_and_destroy
world = client.load_world("Town04")
RuntimeError: failed to connect to newly created map
with the following crash message:
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
Signal 11 caught.
Malloc Size=65538 LargeMemoryPoolOffset=65554
terminating with uncaught exception of type std::__1::bad_weak_ptr: bad_weak_ptrCommonUnixCrashHandler: Signal=11
Signal 6 caught.
Malloc Size=65535 LargeMemoryPoolOffset=131119
Malloc Size=140864 LargeMemoryPoolOffset=272000
Engine crash handling finished; re-raising signal 11 for the default handler. Good bye.
Segmentation fault (core dumped)
Crash Variant 4
Episode 38
Traceback (most recent call last):
File "carla-segfault-min.py", line 55, in <module>
spawn_and_destroy()
File "carla-segfault-min.py", line 21, in spawn_and_destroy
world = client.load_world("Town04")
RuntimeError: failed to connect to newly created map
4.24.3-0+++UE4+Release-4.24 518 0
Disabling core dumps.
Signal 11 caught.
Malloc Size=65538 LargeMemoryPoolOffset=65554
CommonUnixCrashHandler: Signal=11
Malloc Size=65535 LargeMemoryPoolOffset=131119
Malloc Size=140864 LargeMemoryPoolOffset=272000
Engine crash handling finished; re-raising signal 11 for the default handler. Good bye.
Segmentation fault (core dumped)
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 1
- Comments: 24 (5 by maintainers)
Hi, I’m looking into it. I have detected two problems so far.
At some point the operating system will not have more resources to use and it will give an error.
I will create a PR with the fixes for the point 1 as soon I can, but I’m still checking the point 2.
I might have a workaround from client side to avoid crashing the server. I’m using Carla 0.9.13.
I started using carla by following client implementation in https://github.com/cjy1992/gym-carla/blob/master/gym_carla/envs/carla_env.py.
As the code evolves, I recently had made some changes to the code, and it might cause a server to crash more frequently (I can’t recall which change though). I thought it maybe because changing the server between async and sync mode cause the server to crash (even though I have been using it since the beginning, and it likely that it rarely causes server to crash) because the timeout exception in client side was usually raised when calling to set the setting to async and sync mode. (I changed it inside reset function, so it was changed back and forth every episode)
So, I tried to solve it by using Carla only in sync mode and end up with the code like this.
In summary, I called
world.tick
a lot because I thought this was the right way to do it. (previously, spawning process happen while server is in async mode).Unfortunately, it turned out that it crashed even faster than before, from crash every few hundred episodes to crash every tens episodes. It happened consistently, and I noticed (from timeout exception) that it happened when calling to of the the methods that tick the world.
I tried to make Carla server crash with the code from Carla team by playing with example file
manual_control.py
. I spawned a new vehicle repeatedly but no matter how much I spawn a new vehicle, the Carla server just won’t crash.I noticed that in
manual_control.py
, it only callsworld.tick
in the main loop one time per loop no matter how much it spawns actor or doing anything else. So, I removed almost all ofworld.tick
and end up with the code like this.Now, I have been running simulation for hours and it’s still running without crashing.
In summary, this is what I think will help avoiding crashing the server.
I hope my experience can help anyone who having this problem.
Hi, we are releasing a new version next week, and this will include the fix for the 127 episodes problem. Here is the pending PR
https://github.com/carla-simulator/carla/pull/5611