deeplearning4j: java.lang.RuntimeException: execIndexReduce failed; Error code: [77] (beta5)
I train LSTM net with GA. With beta4, I used ubyte ndarray with shape= [16,netParamsCount] to represent the Individuals of GA; With beta5, Iuse UINT16 with the shape=[1,netParamsCount] . With beta4, my app ran slowly and slowly, but it count run more 3000 generations. With beta5, it ran slowly and slowly too, and throws exception less 300 generations. The reported problem in beta4 : https://github.com/eclipse/deeplearning4j/issues/8165. With beta5,the time of the two runs is quite different. The difference is the called count of the following methods :
{
float prob = random.nextFloat();
INDArray params= Nd4j.getExecutioner().exec(new BernoulliDistribution(Nd4j.createUninitialized(1, netParamsCount), prob), random).castTo(DataType.UINT16);
int bitMask = 1 << genIndex;
params.muli(Nd4j.scalar(DataType.UINT16, bitMask));
INDArray netParamPart = mutatingIndiv.getAllNetParamParts();
CqNd4jUtil.bitwiseXor(netParamPart, params, netParamPart);
}
public class CqNd4jUtil {
private static final Map<Integer,INDArray> mask65535Map=new HashMap<>();
private static INDArray getMask65535(int gpuId,long[] shape){
INDArray maskA=mask65535Map.get(gpuId);
if(maskA==null){
synchronized(mask65535Map){
maskA=mask65535Map.get(gpuId);
if(maskA==null){
int mask = 65535;
maskA = Nd4j.ones(DataType.UINT16, shape);
maskA.muli(Nd4j.scalar(DataType.UINT16, mask));
mask65535Map.put(gpuId, maskA);
}
}
}
return maskA;
}
public static INDArray bitwiseNot(INDArray x,int gpuId){
INDArray maskA =getMask65535(gpuId, x.shape());
INDArray out = Nd4j.create(DataType.UINT16, x.shape());
bitwiseXor(x, maskA, out);
return out;
}
public static void bitwiseXor(INDArray x,INDArray y,INDArray out){
Nd4j.exec(DynamicCustomOp.builder("bitwise_xor")
.addInputs(x,y)
.addOutputs(out)
.build());
}
}
The first run result: No.2 pop, No.0 gen. fitness=,1.509737015851003,time=2019-09-09T12:43:23.713 No.2 pop, No.10 gen. fitness=,5.08371495862429, time=2019-09-09T12:44:49.877 No.2 pop, No.20 gen. fitness=,5.08371495862429, time=2019-09-09T12:46:20.835 No.2 pop, No.30 gen. fitness=,5.08371495862429,time=2019-09-09T12:47:56.433 No.2 pop, No.40 gen. fitness=,5.316829509000931,time=2019-09-09T12:49:37.218 (about 1min 30 sec) No.2 pop, No.200 gen. fitness=,10.012225160059621,time=2019-09-09T13:28:20.169 No.2 pop, No.210 gen. fitness=,13.028256426830792,time=2019-09-09T13:31:22.344 No.2 pop, No.220 gen. fitness=,13.028256426830792, time=2019-09-09T13:34:29.195 No.2 pop, No.230 gen. fitness=,13.069600351268019, time=2019-09-09T13:37:37.489 No.2 pop, No.240 gen. fitness=,13.069600351268019, time=2019-09-09T13:40:57.743 (about 3min 8sec)
The second: No.4 pop, No.0 gen. fitness=,2.2699775250279206, time=2019-09-09T15:26:01.567 No.4 pop, No.10 gen. fitness=,3.8596067507619107, time=2019-09-09T15:27:25.504 No.4 pop, No.20 gen. fitness=,4.614095333102667, time=2019-09-09T15:28:51.888 No.4 pop, No.30 gen. fitness=,7.941406174141714,time=2019-09-09T15:30:21.768 No.4 pop, No.40 gen. fitness=,10.921808660009578, time=2019-09-09T15:31:54.501 (about 1min 30 sec) No.4 pop, No.270 gen. fitness=,25.710134689037734, time=2019-09-09T16:22:06.392 No.4 pop, No.280 gen. fitness=,25.77070918213987, time=2019-09-09T16:24:58.065 No.4 pop, No.290 gen. fitness=,25.85695933717356,time=2019-09-09T16:27:47.438 No.4 pop, No.300 gen. fitness=,26.377852100381727, time=2019-09-09T16:30:38.906 (about 2min 50sec) Exception in thread “pool-1-thread-3” Exception in thread “pool-1-thread-5” Exception in thread “pool-1-thread-2” Exception in thread “pool-1-thread-4” java.lang.RuntimeException: execIndexReduce failed; Error code: [77] at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.exec(CudaExecutioner.java:659) at org.nd4j.linalg.factory.Nd4j.argMax(Nd4j.java:535) at org.nd4j.linalg.api.ndarray.BaseNDArray.argMax(BaseNDArray.java:5306) at com.cq.deepGaTrader4j.rnn.trade.RnnNeuroTradeSimulator.simulateTrade(RnnNeuroTradeSimulator.java:159) at com.cq.deepGaTrader4j.rnn.NeuroTradeIndiv.simulateTradeWithMultiLayerNet(NeuroTradeIndiv.java:84) at com.cq.deepGaTrader4j.rnn.NeuroTradeIndiv.simulateTrade(NeuroTradeIndiv.java:78) at com.cq.deepGaTrader4j.rnn.NeuroTradeIndiv.computeFitness(NeuroTradeIndiv.java:52) at com.cq.deepGa4j.Population.computeFitness(Population.java:339) at com.cq.deepGa4j.Population.evolute(Population.java:237) at com.cq.deepGa4j.Population.evolute(Population.java:203) at com.cq.deepGa4j.operator.PopulationEvolver.run(PopulationEvolver.java:25) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) java.lang.RuntimeException: execIndexReduce failed; Error code: [77] at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.exec(CudaExecutioner.java:659) at org.nd4j.linalg.factory.Nd4j.argMax(Nd4j.java:535) at org.nd4j.linalg.api.ndarray.BaseNDArray.argMax(BaseNDArray.java:5306) at com.cq.deepGaTrader4j.rnn.trade.RnnNeuroTradeSimulator.simulateTrade(RnnNeuroTradeSimulator.java:159) at com.cq.deepGaTrader4j.rnn.NeuroTradeIndiv.simulateTradeWithMultiLayerNet(NeuroTradeIndiv.java:84) at com.cq.deepGaTrader4j.rnn.NeuroTradeIndiv.simulateTrade(NeuroTradeIndiv.java:78) at com.cq.deepGaTrader4j.rnn.NeuroTradeIndiv.computeFitness(NeuroTradeIndiv.java:52) at com.cq.deepGa4j.Population.computeFitness(Population.java:339) at com.cq.deepGa4j.Population.evolute(Population.java:237) at com.cq.deepGa4j.Population.evolute(Population.java:203) at com.cq.deepGa4j.operator.PopulationEvolver.run(PopulationEvolver.java:25) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Exception in thread “pool-1-thread-1” java.lang.RuntimeException: execIndexReduce failed; Error code: [77] at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.exec(CudaExecutioner.java:659) at org.nd4j.linalg.factory.Nd4j.argMax(Nd4j.java:535) at org.nd4j.linalg.api.ndarray.BaseNDArray.argMax(BaseNDArray.java:5306) at com.cq.deepGaTrader4j.rnn.trade.RnnNeuroTradeSimulator.simulateTrade(RnnNeuroTradeSimulator.java:159) at com.cq.deepGaTrader4j.rnn.NeuroTradeIndiv.simulateTradeWithMultiLayerNet(NeuroTradeIndiv.java:84) at com.cq.deepGaTrader4j.rnn.NeuroTradeIndiv.simulateTrade(NeuroTradeIndiv.java:78) at com.cq.deepGaTrader4j.rnn.NeuroTradeIndiv.computeFitness(NeuroTradeIndiv.java:52) at com.cq.deepGa4j.Population.computeFitness(Population.java:339) at com.cq.deepGa4j.Population.evolute(Population.java:237) at com.cq.deepGa4j.Population.evolute(Population.java:203) at com.cq.deepGa4j.operator.PopulationEvolver.run(PopulationEvolver.java:25) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) java.lang.RuntimeException: cudaEventRecord failed; Error code: 77 at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.register(cudaEvent_t.java:85) at org.nd4j.jita.flow.impl.SynchronousFlowController.registerAction(SynchronousFlowController.java:278) at org.nd4j.jita.handler.impl.CudaZeroHandler.registerAction(CudaZeroHandler.java:1248) at org.nd4j.jita.allocator.impl.AtomicAllocator.registerAction(AtomicAllocator.java:1078) at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.naiveExec(CudaExecutioner.java:386) at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.exec(CudaExecutioner.java:572) at org.nd4j.linalg.api.ndarray.BaseNDArray.equalsWithEps(BaseNDArray.java:4491) at org.nd4j.linalg.api.ndarray.BaseNDArray.equals(BaseNDArray.java:4547) at org.nd4j.linalg.jcublas.JCublasNDArray.equals(JCublasNDArray.java:507) at com.cq.deepGa4j.operator.CrossOverOperator.cross(CrossOverOperator.java:118) at com.cq.deepGa4j.operator.CrossOverOperator.crossOver(CrossOverOperator.java:65) at com.cq.deepGa4j.Population.cross(Population.java:370) at com.cq.deepGa4j.Population.evolute(Population.java:218) at com.cq.deepGa4j.Population.evolute(Population.java:203) at com.cq.deepGa4j.operator.PopulationEvolver.run(PopulationEvolver.java:25) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Exception in thread “pool-1-thread-6” java.lang.RuntimeException: Op [bitwise_xor] execution failed at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.exec(CudaExecutioner.java:2307) at org.nd4j.linalg.factory.Nd4j.exec(Nd4j.java:6606) at com.cq.deepGa4j.util.CqNd4jUtil.bitwiseXor(CqNd4jUtil.java:80) at com.cq.deepGa4j.util.CqNd4jUtil.bitwiseNot(CqNd4jUtil.java:76) at com.cq.deepGa4j.operator.CrossOverOperator.cross(CrossOverOperator.java:87) at com.cq.deepGa4j.operator.CrossOverOperator.crossOver(CrossOverOperator.java:65) at com.cq.deepGa4j.Population.cross(Population.java:370) at com.cq.deepGa4j.Population.evolute(Population.java:218) at com.cq.deepGa4j.Population.evolute(Population.java:203) at com.cq.deepGa4j.operator.PopulationEvolver.run(PopulationEvolver.java:25) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: execPairwiseIntTransform failed; Error code: [77] at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.exec(CudaExecutioner.java:2491) at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.exec(CudaExecutioner.java:2297) … 12 more java.lang.RuntimeException: Op [softmax] execution failed at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.exec(CudaExecutioner.java:2307) at org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner.execAndReturn(DefaultOpExecutioner.java:736) at org.nd4j.linalg.activations.impl.ActivationSoftmax.getActivation(ActivationSoftmax.java:40) at org.deeplearning4j.nn.layers.recurrent.RnnOutputLayer.activate(RnnOutputLayer.java:137) at org.deeplearning4j.nn.layers.BaseOutputLayer.activate(BaseOutputLayer.java:195) at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.outputOfLayerDetached(MultiLayerNetwork.java:1309) at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.rnnTimeStep(MultiLayerNetwork.java:3137) at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.rnnTimeStep(MultiLayerNetwork.java:3118) at com.cq.deepGaTrader4j.rnn.trade.RnnNeuroTradeSimulator.simulateTrade(RnnNeuroTradeSimulator.java:158) at com.cq.deepGaTrader4j.rnn.NeuroTradeIndiv.simulateTradeWithMultiLayerNet(NeuroTradeIndiv.java:84) at com.cq.deepGaTrader4j.rnn.NeuroTradeIndiv.simulateTrade(NeuroTradeIndiv.java:78) at com.cq.deepGaTrader4j.rnn.NeuroTradeIndiv.computeFitness(NeuroTradeIndiv.java:52) at com.cq.deepGa4j.Population.computeFitness(Population.java:339) at com.cq.deepGa4j.Population.evolute(Population.java:237) at com.cq.deepGa4j.Population.evolute(Population.java:203) at com.cq.deepGa4j.operator.PopulationEvolver.run(PopulationEvolver.java:25) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: helpers::softmax: cuda stream synchronization failed !; Error code: [77] at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.exec(CudaExecutioner.java:2491) at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.exec(CudaExecutioner.java:2297) … 18 more Exception in thread “DeallocatorServiceThread_0” Exception in thread “DeallocatorServiceThread_8” Exception in thread “DeallocatorServiceThread_6” Exception in thread “DeallocatorServiceThread_1” java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:75) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:131) at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:63) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:230) at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78) at org.nd4j.jita.handler.impl.CudaZeroHandler.purgeDeviceObject(CudaZeroHandler.java:1094) at org.nd4j.jita.allocator.impl.AtomicAllocator.purgeDeviceObject(AtomicAllocator.java:574) at org.nd4j.jita.allocator.impl.CudaDeallocator.deallocate(CudaDeallocator.java:62) at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:118) Exception in thread “DeallocatorServiceThread_7” Exception in thread “DeallocatorServiceThread_4” java.lang.RuntimeException: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:133) Caused by: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:75) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:131) at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:63) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:230) at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78) at org.nd4j.jita.allocator.impl.CudaDeallocator.deallocate(CudaDeallocator.java:47) at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:128) Exception in thread “DeallocatorServiceThread_10” java.lang.RuntimeException: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:133) Caused by: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:75) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:131) at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:63) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:230) at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78) at org.nd4j.jita.allocator.impl.CudaDeallocator.deallocate(CudaDeallocator.java:47) at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:128) java.lang.RuntimeException: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:133) Caused by: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:75) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:131) at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:63) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:230) at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78) at org.nd4j.jita.allocator.impl.CudaDeallocator.deallocate(CudaDeallocator.java:47) at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:128) java.lang.RuntimeException: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:133) Caused by: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:75) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:131) at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:63) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:230) at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78) at org.nd4j.jita.allocator.impl.CudaDeallocator.deallocate(CudaDeallocator.java:47) at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:128) Exception in thread “DeallocatorServiceThread_2” Exception in thread “DeallocatorServiceThread_11” java.lang.RuntimeException: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:133) Caused by: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:75) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:131) at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:63) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:230) at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78) at org.nd4j.jita.handler.impl.CudaZeroHandler.purgeDeviceObject(CudaZeroHandler.java:1094) at org.nd4j.jita.allocator.impl.AtomicAllocator.purgeDeviceObject(AtomicAllocator.java:574) at org.nd4j.jita.allocator.impl.CudaDeallocator.deallocate(CudaDeallocator.java:62) at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:128) Exception in thread “DeallocatorServiceThread_3” Exception in thread “DeallocatorServiceThread_5” Exception in thread “DeallocatorServiceThread_9” java.lang.RuntimeException: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:133) Caused by: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:75) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:131) at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:63) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:230) at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78) at org.nd4j.jita.handler.impl.CudaZeroHandler.purgeDeviceObject(CudaZeroHandler.java:1094) at org.nd4j.jita.allocator.impl.AtomicAllocator.purgeDeviceObject(AtomicAllocator.java:574) at org.nd4j.jita.allocator.impl.CudaDeallocator.deallocate(CudaDeallocator.java:62) at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:128) java.lang.RuntimeException: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:133) Caused by: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:75) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:131) at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:63) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:230) at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78) at org.nd4j.jita.handler.impl.CudaZeroHandler.purgeDeviceObject(CudaZeroHandler.java:1094) at org.nd4j.jita.allocator.impl.AtomicAllocator.purgeDeviceObject(AtomicAllocator.java:574) at org.nd4j.jita.allocator.impl.CudaDeallocator.deallocate(CudaDeallocator.java:62) at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:128) java.lang.RuntimeException: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:133) Caused by: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:75) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:131) at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:63) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:230) at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78) at org.nd4j.jita.handler.impl.CudaZeroHandler.purgeDeviceObject(CudaZeroHandler.java:1094) at org.nd4j.jita.allocator.impl.AtomicAllocator.purgeDeviceObject(AtomicAllocator.java:574) at org.nd4j.jita.allocator.impl.CudaDeallocator.deallocate(CudaDeallocator.java:62) at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:128) java.lang.RuntimeException: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:133) Caused by: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:75) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:233) at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78) at org.nd4j.jita.allocator.impl.CudaDeallocator.deallocate(CudaDeallocator.java:47) at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:128) java.lang.RuntimeException: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:133) Caused by: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:75) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:131) at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:63) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:230) at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78) at org.nd4j.jita.handler.impl.CudaZeroHandler.purgeDeviceObject(CudaZeroHandler.java:1094) at org.nd4j.jita.allocator.impl.AtomicAllocator.purgeDeviceObject(AtomicAllocator.java:574) at org.nd4j.jita.allocator.impl.CudaDeallocator.deallocate(CudaDeallocator.java:62) at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:128) java.lang.RuntimeException: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:133) Caused by: java.lang.RuntimeException: cudaEventSynchronize failed; Error code: 77 at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:75) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:131) at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:63) at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:230) at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78) at org.nd4j.jita.allocator.impl.CudaDeallocator.deallocate(CudaDeallocator.java:47) at org.nd4j.linalg.memory.deallocation.DeallocatorService$DeallocatorServiceThread.run(DeallocatorService.java:128)
Version Information
- Deeplearning4j 1.0.0-beta5
- Platform information ubuntu 18.4
- CUDA version, nd4j-cuda-10.0-platform
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 17 (7 by maintainers)
Also rather than using a map, you might preallocate an INDArray [1M, 512, 1024] and operate on views (i.e., point(x), all(), all()) or using something like gather / pullRows. That way you avoid having millions of objects in memory.