deeplearning4j: RL4J: VizDoom Training crash

From @schrum2 on September 14, 2017 14:24

This issue is a follow-up on what was already discussed here: https://github.com/deeplearning4j/rl4j/issues/61 (all version details were worked out there: Thank you @saudet )

Although the code can successfully save movies and load saved models (thanks to fixes in the previous issue), and it does successfully train for a while, it eventually crashes with the following exception.

09:15:48.896 [main] ERROR org.deeplearning4j.rl4j.learning.sync.SyncLearning - Training failed.
java.lang.NullPointerException: null
	at org.deeplearning4j.rl4j.learning.sync.Transition.dup(Transition.java:50)
	at org.deeplearning4j.rl4j.learning.sync.Transition.dup(Transition.java:38)
	at org.deeplearning4j.rl4j.learning.sync.ExpReplay.getBatch(ExpReplay.java:45)
	at org.deeplearning4j.rl4j.learning.sync.ExpReplay.getBatch(ExpReplay.java:52)
	at org.deeplearning4j.rl4j.learning.sync.qlearning.discrete.QLearningDiscrete.trainStep(QLearningDiscrete.java:159)
	at org.deeplearning4j.rl4j.learning.sync.qlearning.QLearning.trainEpoch(QLearning.java:91)
	at org.deeplearning4j.rl4j.learning.sync.SyncLearning.train(SyncLearning.java:38)
	at org.deeplearning4j.examples.rl4j.Doom.doomLearn(Doom.java:100)
	at org.deeplearning4j.examples.rl4j.Doom.main(Doom.java:78)
java.lang.NullPointerException
	at org.deeplearning4j.rl4j.learning.sync.Transition.dup(Transition.java:50)
	at org.deeplearning4j.rl4j.learning.sync.Transition.dup(Transition.java:38)
	at org.deeplearning4j.rl4j.learning.sync.ExpReplay.getBatch(ExpReplay.java:45)
	at org.deeplearning4j.rl4j.learning.sync.ExpReplay.getBatch(ExpReplay.java:52)
	at org.deeplearning4j.rl4j.learning.sync.qlearning.discrete.QLearningDiscrete.trainStep(QLearningDiscrete.java:159)
	at org.deeplearning4j.rl4j.learning.sync.qlearning.QLearning.trainEpoch(QLearning.java:91)
	at org.deeplearning4j.rl4j.learning.sync.SyncLearning.train(SyncLearning.java:38)
	at org.deeplearning4j.examples.rl4j.Doom.doomLearn(Doom.java:100)
	at org.deeplearning4j.examples.rl4j.Doom.main(Doom.java:78)
High, level 3.1
[libx264 @ 00000000435bb9e0] 264 - core 148 - H.264/MPEG-4 AVC codec - Copyleft 2003-2016 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=30.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'C:\Users\he_de\rl4j-data\1/video/video-1528-130755.mp4':
  Metadata:
    encoder         : Lavf57.56.100
    Stream #0:0: Video: h264 (Constrained Baseline) ([33][0][0][0] / 0x0021), yuv420p, 800x600, q=2-31, 400 kb/s, 15360 tbn
[libx264 @ 00000000435bb9e0] frame I:1     Avg QP:26.61  size: 29808
[libx264 @ 00000000435bb9e0] frame P:36    Avg QP:32.52  size:  4993
[libx264 @ 00000000435bb9e0] frame B:78    Avg QP:34.95  size:  1270
[libx264 @ 00000000435bb9e0] consecutive B-frames:  5.2% 10.4%  7.8% 76.5%
[libx264 @ 00000000435bb9e0] mb I  I16..4:  1.5% 76.5% 21.9%
[libx264 @ 00000000435bb9e0] mb P  I16..4:  1.1%  7.0%  1.4%  P16..4: 45.7%  5.0%  2.5%  0.0%  0.0%    skip:37.3%
[libx264 @ 00000000435bb9e0] mb B  I16..4:  0.2%  0.2%  0.0%  B16..8: 35.8%  1.3%  0.1%  direct: 0.4%  skip:62.0%  L0:50.7% L1:49.1% BI: 0.2%
[libx264 @ 00000000435bb9e0] 8x8 transform intra:73.2% inter:87.6%
[libx264 @ 00000000435bb9e0] coded y,uvDC,uvAC intra: 65.3% 56.1% 30.3% inter: 6.3% 3.6% 0.6%
[libx264 @ 00000000435bb9e0] i16 v,h,dc,p: 46% 30% 13% 12%
[libx264 @ 00000000435bb9e0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 18% 15% 24%  6%  8%  7%  9%  6%  7%
[libx264 @ 00000000435bb9e0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 32% 25% 18%  5%  5%  4%  5%  3%  4%
[libx264 @ 00000000435bb9e0] i8c dc,h,v,p: 61% 15% 21%  3%
[libx264 @ 00000000435bb9e0] Weighted P-Frames: Y:16.7% UV:13.9%
[libx264 @ 00000000435bb9e0] ref P L0: 59.6% 16.1% 14.3%  6.1%  3.9%
[libx264 @ 00000000435bb9e0] ref B L0: 90.8%  7.6%  1.6%
[libx264 @ 00000000435bb9e0] ref B L1: 96.2%  3.8%
[libx264 @ 00000000435bb9e0] kb/s:643.97

Copied from original issue: deeplearning4j/rl4j#64

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 18 (10 by maintainers)

Most upvoted comments

Loading the Doom model and training further seems to work. I suppose it will crash again, but I should eventually be able to get good behavior. I’ll follow up once I know how this turns out.

Still need to try ALE though.

schrum2 on May 31, 2018

I changed the pom again ( https://github.com/schrum2/MM-NEAT/blob/dev/pom.xml ) and everything runs now! Of course, the original issue I had was that the code would run for a long time (overnight) and then eventually crash. I’m running VizDoom right now, and will let you know tomorrow in this issue what happened.

However, making all of the DL4J upgrades did cause some additional unrelated problems with ImageNet, which I’ve created a new issue for here: #5402

schrum2 on May 31, 2018