reverb: segmentation fault
Hello,
I am using reverb 0.4.0 in tf-agents 0.9.0 through the ReverbReplayBuffer and ReverbAddTrajectoryObserver. Ray actors push experience to the the reverb server. Currently though, it is configured to have just 1 actor, and experience pushing is completed in a blocking manner before agent training in the main loop.
I am seeing segmentation faults happening at random times, always within the main process that samples from reverb to train the agent.
I was wondering if you have some hints about why this might be happening, or where I can start debugging?
*** SIGSEGV received at time=1629367150 on cpu 4 ***
PC: @ 0x7f328092219f (unknown) deepmind::reverb::(anonymous namespace)::LocalSamplerWorker::FetchSamples()
@ 0x7f32d0810980 2320 (unknown)
@ 0x7f328091a14f 144 deepmind::reverb::Sampler::RunWorker()
@ 0x7f32cd7cd039 (unknown) execute_native_thread_routine
@ 0x7f2be000ec70 (unknown) (unknown)
@ 0x7f3280965ca0 (unknown) (unknown)
@ 0x75058b4808ec8348 (unknown) (unknown)
Segmentation fault (core dumped)
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 23
That note doesn’t apply to Reverb, since the iterator for Reverb is aware of updates to the service. Basically it should just work 😃
On Tue, Sep 7, 2021 at 11:05 AM Samarth Brahmbhatt @.***> wrote:
I’ll leave this open until we can figure out what’s going on or move everyone over to
TrajectoryWriter. Thanks for the report and the repro, and for the additional details about parallel write/read (that should work just fine).