nni: EmptyLayerError() or UnBalancedGroupError() during pruning depthwise separable convolution

Describe the issue: Errors when pruning depthwise separable convolution

when ‘total_sparsity’ has different value , it may encounter error as follows:

  1. EmptyLayerError
    raise EmptyLayerError()
nni.compression.pytorch.speedup.error_code.EmptyLayerError: Pruning a Layer to empty is not legal
  1. UnBalancedGroupError
    raise UnBalancedGroupError()
nni.compression.pytorch.speedup.error_code.UnBalancedGroupError: The number remained filters in each group is different

#4648 and #4796 seems solve the error, but it doesn’t work in my code.

How can i solve it? Any help would be greatly appreciated, thanks!

Environment:

  • NNI version: 2.8
  • Training service (local|remote|pai|aml|etc):
  • Client OS: ubuntu 18.04
  • Server OS (for remote mode only):
  • Python version: 3.8
  • PyTorch/TensorFlow version: pytorch=1.10.1
  • Is conda/virtualenv/venv used?: conda
  • Is running in Docker?: no

How to reproduce it?:

import torch
from nni.compression.pytorch.pruning import L1NormPruner, L2NormPruner
from nni.compression.pytorch.speedup import ModelSpeedup
from torch import nn

model = nn.Sequential(
    nn.Conv2d(128, 128, (3, 3), (1, 1), 1, groups=128, bias=False),
    nn.BatchNorm2d(128),
    nn.ReLU(inplace=True),
)

config_list = [{'total_sparsity': 0.6, 'op_types': ['Conv2d']}]

dummy_input = torch.rand(5, 128, 256, 256)
pruner = L2NormPruner(model, config_list, mode='dependency_aware', dummy_input=dummy_input)
_, masks = pruner.compress()
pruner._unwrap_model()
ModelSpeedup(model, dummy_input, masks).speedup_model()

print(model)

Log message: [2022-07-08 11:41:49] start to speedup the model [2022-07-08 11:41:53] infer module masks… [2022-07-08 11:41:53] Update mask for 0 [2022-07-08 11:41:55] Update mask for 1 [2022-07-08 11:41:57] Update mask for 2 [2022-07-08 11:41:59] Update the indirect sparsity for the 2 /home/sf/anaconda3/envs/nni_py38/lib/python3.8/site-packages/torch/_tensor.py:1013: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won’t be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1639180523671/work/build/aten/src/ATen/core/TensorBody.h:417.) return self._grad [2022-07-08 11:41:59] Update the indirect sparsity for the 1 [2022-07-08 11:42:00] Update the indirect sparsity for the 0 [2022-07-08 11:42:01] resolve the mask conflict [2022-07-08 11:42:01] replace compressed modules… [2022-07-08 11:42:01] replace module (name: 0, op_type: Conv2d) Traceback (most recent call last): File “ttttt.py”, line 35, in <module> ModelSpeedup(model, dummy_input, masks).speedup_model() File “/home/sf/anaconda3/envs/nni_py38/lib/python3.8/site-packages/nni/compression/pytorch/speedup/compressor.py”, line 543, in speedup_model self.replace_compressed_modules() File “/home/sf/anaconda3/envs/nni_py38/lib/python3.8/site-packages/nni/compression/pytorch/speedup/compressor.py”, line 402, in replace_compressed_modules self.replace_submodule(unique_name) File “/home/sf/anaconda3/envs/nni_py38/lib/python3.8/site-packages/nni/compression/pytorch/speedup/compressor.py”, line 473, in replace_submodule compressed_module = replace_function( File “/home/sf/anaconda3/envs/nni_py38/lib/python3.8/site-packages/nni/compression/pytorch/speedup/compress_modules.py”, line 16, in <lambda> ‘Conv2d’: lambda module, masks: replace_conv2d(module, masks), File “/home/sf/anaconda3/envs/nni_py38/lib/python3.8/site-packages/nni/compression/pytorch/speedup/compress_modules.py”, line 424, in replace_conv2d raise UnBalancedGroupError() nni.compression.pytorch.speedup.error_code.UnBalancedGroupError: The number remained filters in each group is different

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 15 (9 by maintainers)

Most upvoted comments

sorry no progress so far. it’s hard to fix, and we are now refactoring ModelSpeedup in 3.0. then we will try to fix it.

reproduced the bug. need some time to fix it