tensorforce: Quickstart example get stuck [GPU]
Hi,
I just installed tensorforce (from pip) with tensorflow-gpu 1.7 and tried to run example/quickstart.py.
The training starts but then gets stucks after n episodes where n is the minimum of batch_size and frequency value of the update_mode argument of PPOAgent.
update_mode=dict(
unit='episodes',
# 10 episodes per update
batch_size=20,
# Every 10 episodes
frequency=20
),
No error message is displayed, it just hangs forever. Has anyone experienced something similar?
Thanks,
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 28 (13 by maintainers)
Oh, I probably have the same issue, then. The timeout didn’t help, it is stuck, still. If I press ctrl+c it just ignores it. I was having the problem with another training, but the error is the same in the quickstart.
I tried attaching gdb on it but it’s kind of hard to interpret. Looks like some kind of deadlock, possibly not a bug in tensorforce.
(omitted the middle of the stack)
Found this recent issue on tensorflow, maybe related https://github.com/tensorflow/tensorflow/issues/18737 ? It looks like it should happen only on distributed mode though, which isn’t the case.
Using tensorflow 1.8.
Same issue on tensorflow-gpu 1.7. By downgrading to 1.5 as @gian1312 said, it works again.
I had the same issue. By downgrading on Tensorflow-gpu 1.5 it started working again.