parser: CUDA error: device-side assert triggered

I run into this error when training a model with encoder=bert on the Italian UD training set:

python -u -m supar.cmds.biaffine_dep train -p=exp/it_isdt.dbmdz-electra-xxl/model
-c=config.ini --bert=dbmdz/electra-base-italian-xxl-cased-discriminator
–train=…/train-dev/UD_Italian-ISDT/it_isdt-ud-train.conllu
–dev=…/train-dev/UD_Italian-ISDT/it_isdt-ud-dev.conllu --feat=bert --encoder=bert

/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [3,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [6,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [8,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [9,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [10,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [11,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [12,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [13,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [14,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [15,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [16,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [17,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [18,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [19,0,0] Assertion t >= 0 && t < n_classes failed. Traceback (most recent call last): File “/usr/lib/python3.8/runpy.py”, line 194, in _run_module_as_main return _run_code(code, main_globals, None, File “/usr/lib/python3.8/runpy.py”, line 87, in _run_code exec(code, run_globals) File “/project/piqasso/Experiment/IWPT21/parser/supar/cmds/biaffine_dep.py”, line 46, in <module> main() File “/project/piqasso/Experiment/IWPT21/parser/supar/cmds/biaffine_dep.py”, line 42, in main parse(parser) File “/project/piqasso/Experiment/IWPT21/parser/supar/cmds/cmd.py”, line 29, in parse parser.train(**args) File “/project/piqasso/Experiment/IWPT21/parser/supar/parsers/dep.py”, line 62, in train return super().train(**Config().update(locals())) File “/project/piqasso/Experiment/IWPT21/parser/supar/parsers/parser.py”, line 73, in train loss, test_metric = self._evaluate(test.loader) File “/usr/lib/python3.8/dist-packages/torch/autograd/grad_mode.py”, line 27, in decorate_context return func(*args, **kwargs) File “/project/piqasso/Experiment/IWPT21/parser/supar/parsers/dep.py”, line 197, in _evaluate arc_preds, rel_preds = self.model.decode(s_arc, s_rel, mask, self.args.tree, self.args.proj) File “/project/piqasso/Experiment/IWPT21/parser/supar/models/dep.py”, line 220, in decode bad = [not CoNLL.istree(seq[1:i+1], proj) for i, seq in zip(lens.tolist(), arc_preds.tolist())] RuntimeError: CUDA error: device-side assert triggered

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 16 (7 by maintainers)

Most upvoted comments

@MinionAttack You don’t need any embeddings since --encoder bert. The model consists of only transformer layers, MLPs and Biaffines.

yzhangcs on May 5, 2021