apex: Error in FusedLayerNorm
After installing apex
with the cuda extensions and running pytorch-pretrained-BERT, I get the following error in FusedLayerNormAffineFunction
, apex/normalization/fused_layer_norm.py (line 21).
RuntimeError: a Tensor with 2482176 elements cannot be converted to Scalar (item at /pytorch/aten/src/ATen/native/Scalar.cpp:9)
Here are the shapes of my tensors:
input_ - [32, 101, 768]
bias_ - [768]
weight_ - [768]
self.normalized_shape - [768]
I’m not sure if it’s a problem with pytorch-pretrained-BERT
calling it incorrectly or a bug in apex
. Any idea? I’ve also created an issue here.
I’m running Ubuntu with CUDA 9, PyTorch 0.4.1.
Full stacktrace below.
File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling.py", line 710, in forward
embedding_output = self.embeddings(input_ids, token_type_ids)
File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/pytorch_pretrained_bert/modeling.py", line 261, in forward
embeddings = self.LayerNorm(embeddings)
File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/normalization/fused_layer_norm.py", line 149, in forward
input, self.weight, self.bias)
File "/home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/normalization/fused_layer_norm.py", line 21, in forward
input_, self.normalized_shape, weight_, bias_, self.eps)
RuntimeError: a Tensor with 2482176 elements cannot be converted to Scalar (item at /pytorch/aten/src/ATen/native/Scalar.cpp:9)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f1aa5da3021 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f1aa5da28ea in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: at::native::item(at::Tensor const&) + 0x12c3 (0x7f1aa690d5b3 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #3: at::TypeDefault::item(at::Tensor const&) const + 0x55 (0x7f1aa6b1c905 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #4: torch::autograd::VariableType::eye_out(at::Tensor&, long, long) const + 0x184 (0x7f1aa4faeec4 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #5: <unknown function> + 0x89ca (0x7f1a82e739ca in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #6: layer_norm_affine(at::Tensor, c10::ArrayRef<long>, at::Tensor, at::Tensor, double) + 0x185 (0x7f1a82e762a5 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #7: <unknown function> + 0x18d44 (0x7f1a82e83d44 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #8: <unknown function> + 0x16495 (0x7f1a82e81495 in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/fused_layer_norm_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #9: _PyCFunction_FastCallDict + 0x154 (0x55a8f9925744 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #10: <unknown function> + 0x198610 (0x55a8f99ac610 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x30a (0x55a8f99d138a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #12: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #13: _PyFunction_FastCallDict + 0x11b (0x55a8f99a6bab in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #14: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #15: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #16: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #17: THPFunction_do_forward(THPFunction*, _object*) + 0x15c (0x7f1ae02e21ec in /home/hyper/Documents/anaconda3/envs/allennlp/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #18: PyCFunction_Call + 0x5f (0x55a8f992863f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #19: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #20: <unknown function> + 0x16ba91 (0x55a8f997fa91 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #21: _PyObject_FastCallDict + 0x8b (0x55a8f992592b in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #22: <unknown function> + 0x19857e (0x55a8f99ac57e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #23: _PyEval_EvalFrameDefault + 0x30a (0x55a8f99d138a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #24: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #25: _PyFunction_FastCallDict + 0x11b (0x55a8f99a6bab in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #26: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #27: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #28: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #29: _PyEval_EvalFrameDefault + 0x19ec (0x55a8f99d2a6c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #30: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #31: <unknown function> + 0x1918e4 (0x55a8f99a58e4 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #32: _PyFunction_FastCallDict + 0x1bc (0x55a8f99a6c4c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #33: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #34: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #35: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #36: <unknown function> + 0x16ba91 (0x55a8f997fa91 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #37: _PyObject_FastCallDict + 0x8b (0x55a8f992592b in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #38: <unknown function> + 0x19857e (0x55a8f99ac57e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #39: _PyEval_EvalFrameDefault + 0x30a (0x55a8f99d138a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #40: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #41: <unknown function> + 0x1918e4 (0x55a8f99a58e4 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #42: _PyFunction_FastCallDict + 0x3da (0x55a8f99a6e6a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #43: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #44: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #45: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #46: _PyEval_EvalFrameDefault + 0x19ec (0x55a8f99d2a6c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #47: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #48: <unknown function> + 0x1918e4 (0x55a8f99a58e4 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #49: _PyFunction_FastCallDict + 0x1bc (0x55a8f99a6c4c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #50: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #51: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #52: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #53: <unknown function> + 0x16ba91 (0x55a8f997fa91 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #54: _PyObject_FastCallDict + 0x8b (0x55a8f992592b in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #55: <unknown function> + 0x19857e (0x55a8f99ac57e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #56: _PyEval_EvalFrameDefault + 0x30a (0x55a8f99d138a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #57: <unknown function> + 0x71e1 (0x7f1af51ee1e1 in /home/hyper/.PyCharm2018.1/system/cythonExtensions/_pydevd_frame_eval_ext/pydevd_frame_evaluator.cpython-36m-x86_64-linux-gnu.so)
frame #58: <unknown function> + 0x1918e4 (0x55a8f99a58e4 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #59: _PyFunction_FastCallDict + 0x3da (0x55a8f99a6e6a in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #60: _PyObject_FastCallDict + 0x26f (0x55a8f9925b0f in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #61: _PyObject_Call_Prepend + 0x63 (0x55a8f992a6a3 in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #62: PyObject_Call + 0x3e (0x55a8f992554e in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
frame #63: _PyEval_EvalFrameDefault + 0x19ec (0x55a8f99d2a6c in /home/hyper/Documents/anaconda3/envs/allennlp/bin/python)
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 32 (5 by maintainers)
@Hyperparticle @thomwolf @geniki While I wait for the results of Thor’s runs, one thing that occurs to me is your segfault may be because when you upgraded Pytorch, the existing (installed) Apex binaries were no longer compatible somehow. Try a full
pip uninstall apex
, thencd apex_repo_dir; rm-rf build; pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
and see if the segfault persists.I solved the problem, it’s the version of GCC . It should be 4.9+,but ubuntu 14.04 is 4.8.
@mrdbourke I think you may have compiled apex without cuda support. You need to compile it with python setup.py install --cpp_ext --cuda_ext.
Me too - PyTorch 1.0.1, CUDA 10. It’s not specific to
pytorch-pretrained-BERT
, the script below is enough for me:Whew, this is a useful gotcha to know about. good old emergency repair procedure number one: turn it off and on again. Glad people seem to be happy, especially since as I said, I don’t have the bandwidth to do a deep dive debug right this second.
Note to self: make the setup.py smarter to avoid such cases in the future.
i do like this but also get the segment fault
Upgraded to CUDA 10.0 and PyTorch 1.0.1, now I get a segmentation fault with Apex enabled.