apex: SyncBatchNorm doesn't support 2 dimensions input?
Hi, I’m facing the issue that the program crash when the input for SyncBatchNorm is two dimensions. Here’s the code:
import torch
import apex
model = apex.parallel.SyncBatchNorm(4).cuda()
data = torch.rand((8,4)).cuda()
output = model(data)
When running the code, error raised like this:
Traceback (most recent call last):
File "syncbn_test.by", line 7, in <module>
output = model(data)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/apex/parallel/optimized_sync_batchnorm.py", line 81, in forward
return SyncBatchnormFunction.apply(input, self.weight, self.bias, self.running_mean, self.running_var, self.eps, self.training or not self.track_running_stats, exponential_average_factor, self.process_group, self.channel_last)
File "/usr/local/lib/python3.5/dist-packages/apex/parallel/optimized_sync_batchnorm_kernel.py", line 27, in forward
mean, var_biased = syncbn.welford_mean_var(input)
RuntimeError: Dimension out of range (expected to be in range of [-2, 1], but got 2) (maybe_wrap_dim at /pytorch/aten/src/ATen/core/WrapDimMinimal.h:18)
And everthing runs ok when data
a 4 dims tensor.
Here is my environment:
Ubuntu 16.04
Python 3.5.2
Pytorch 1.0.1, installed with "pip install torch"
apex is installed with command:
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
cuda 10.0
nvidia driver 410.72
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 19 (5 by maintainers)
Commits related to this issue
- [SyncBatchNorm] supporting 2 dimensional input, resolving issue #194 Implementation: for 2d input, switching channel_last flag to true for better memory access pattern in the kernel. — committed to NVIDIA/apex by jjsjann123 5 years ago
- [SyncBatchNorm] (#206) supporting 2 dimensional input, resolving issue #194 Implementation: for 2d input, switching channel_last flag to true for better memory access pattern in the kernel. — committed to NVIDIA/apex by jjsjann123 5 years ago
Hello, I’m confused with the same problem. The apex was installed on 16 Sep.
import torch import apex model = apex.parallel.SyncBatchNorm(4).cuda() data = torch.rand((8,4)).cuda() ouput = model(data)
Traceback (most recent call last): File “<stdin>”, line 1, in <module> File “/gpfs01/user_home/zhuying/.conda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 547, in call result = self.forward(*input, **kwargs) File “/gpfs01/user_home/zhuying/.conda/envs/py36/lib/python3.6/site-packages/apex/parallel/optimized_sync_batchnorm.py”, line 85, in forward return SyncBatchnormFunction.apply(input, z, self.weight, self.bias, self.running_mean, self.running_var, self.eps, self.training or not self.track_running_stats, exponential_average_factor, self.process_group, self.channel_last, self.fuse_relu) File “/gpfs01/user_home/zhuying/.conda/envs/py36/lib/python3.6/site-packages/apex/parallel/optimized_sync_batchnorm_kernel.py”, line 27, in forward mean, var_biased = syncbn.welford_mean_var(input) RuntimeError: Dimension out of range (expected to be in range of [-2, 1], but got 2) (maybe_wrap_dim at /tmp/pip-req-build-p5q91txh/c10/core/WrapDimMinimal.h:20)
I’m facing the same issue, hoping you guys can fix this soon… many thanks!