examples: DCGAN example doesn't work with different image sizes

I’m trying to use this code as a starting point for building GANs from my own image data-- 512x512 grayscale images. If I change any of the default arguments (e.g. --imageSize 512) I get the following error:

Traceback (most recent call last):
  File "main.py", line 209, in <module>
    errD_real = criterion(output, label)
  File "/opt/python/lib/python3.6/site-packages/torch/nn/modules/module.py", line 210, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/python/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 36, in forward
    return backend_fn(self.size_average, weight=self.weight)(input, target)
  File "/opt/python/lib/python3.6/site-packages/torch/nn/_functions/thnn/loss.py", line 22, in forward
    assert input.nelement() == target.nelement()
AssertionError

Still learning my way around PyTorch so the network architectures that are spit out before the above message don’t yet give me much intuition. I appreciate any pointers you can give!

About this issue

Original URL
State: open
Created 7 years ago
Comments: 34 (5 by maintainers)

Most upvoted comments

As @apaszke mentions, the G and D networks are generated with the 64x64 limitations hardcoded. The implementation of the DCGAN here is very similar to the dcgan.torch implementation, and someone else asked about this limitation and got this answer: https://github.com/soumith/dcgan.torch/issues/2#issuecomment-164862299

By following the changes suggested in that comment, you can expand the network for 128x128. So for the generator:

class _netG(nn.Module):
    def __init__(self, ngpu):
        super(_netG, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is Z, going into a convolution
            nn.ConvTranspose2d(     nz, ngf * 16, 4, 1, 0, bias=False),
            nn.BatchNorm2d(ngf * 16),
            nn.ReLU(True),
            # state size. (ngf*16) x 4 x 4
            nn.ConvTranspose2d(ngf * 16, ngf * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # state size. (ngf*8) x 8 x 8
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # state size. (ngf*4) x 16 x 16 
            nn.ConvTranspose2d(ngf * 4, ngf * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # state size. (ngf*2) x 32 x 32
            nn.ConvTranspose2d(ngf * 2,     ngf, 4, 2, 1, bias=False),
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            # state size. (ngf) x 64 x 64
            nn.ConvTranspose2d(    ngf,      nc, 4, 2, 1, bias=False),
            nn.Tanh()
            # state size. (nc) x 128 x 128
        )

And for the discriminator:

class _netD(nn.Module):
    def __init__(self, ngpu):
        super(_netD, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is (nc) x 128 x 128
            nn.Conv2d(nc, ndf, 4, stride=2, padding=1, bias=False), 
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf) x 64 x 64
            nn.Conv2d(ndf, ndf * 2, 4, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*2) x 32 x 32
            nn.Conv2d(ndf * 2, ndf * 4, 4, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(ndf * 4),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*4) x 16 x 16 
            nn.Conv2d(ndf * 4, ndf * 8, 4, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*8) x 8 x 8
            nn.Conv2d(ndf * 8, ndf * 16, 4, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(ndf * 16),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*16) x 4 x 4
            nn.Conv2d(ndf * 16, 1, 4, stride=1, padding=0, bias=False),
            nn.Sigmoid()
            # state size. 1
        )

However, as you can also see in that thread, it is harder to get a stable game between the generator and discriminator for this larger problem. To avoid this, I think you’ll have to take a look at the improvements used here https://github.com/openai/improved-gan (paper: https://arxiv.org/abs/1606.03498). This repository includes a model for 128x128 imagenet generation.

+32

bartolsthoorn on Feb 27, 2017

Yes, the learning is unstable. There are some new interesting suggestion in the dcgan.torch thread: https://github.com/soumith/dcgan.torch/issues/2#issuecomment-283982237

Set ndf to ngf/4, this changes the size of the G and D models in order to balance the training
Add white noise (this is a trick also mentioned here: ganhacks)

+13

bartolsthoorn on Mar 4, 2017

The error tells you that the number of inputs to the loss function is different than the number of given targets. It happens in the line 209. The problem is that the generator and discriminator architectures are apparently fixed to the default image size (see annotations in the model). Adding a pooling layer at the end of the discriminator, that squeezes every batch element into a 1x1x1 image would help. I think that appending nn.MaxPool2d(opt.imageSize // 64) after the Sigmoid would fix that.

+10

apaszke on Feb 17, 2017

I was able to get dcgan operating successfully at 128x128 by adding the convolutional layers described above and then running with ngf 128 and ndf 32. When I attempted to go to 512 I was not able to get a stable result. I’m attempting to add the white noise to the discriminator to see if that helps.

** I ended up abandoning dcgan and am now over using bmsggan which is a variation on progressive gans. Its handling higher resolutions much better **

powerspowers on Jul 1, 2019

Changing the kernel size to 2 and the stride to 4 in the last Conv2D of the Discriminator seems to fix that error, but just want to make sure I’m not crazy here.

EvanZ on Feb 21, 2021

Yes, you can do that @xjdeng, you simply have to ensure that the output of your FC layer has the ratio you want from the final output. That is one way. So in your case, the dense layer output should be (batch_size, channels, 4, 1) or some multiple of that. If your network then consists of transposed convolutions that double in size each layer you would need 5 transposed conv layers to get images of size (batch_size, channels, 128, 32)

LukasMosser on Dec 22, 2018

Any luck with the above ~tricks~ heuristics?

LukasMosser on Mar 14, 2017

@magsol I would suggest to first try your dataset on the standard 64x64. Next you run it on 128x128, with either the extra convolution or pooling layer as listed above. After that you can try 512x512, I am no expert but I have not seen pictures that large generated by a DCGAN. You could also consider generating 128x128 images and then use a separate super-resolution network to reach 512x512.

64x64 and 128x128 are easy to try (the model includes the preprocessing, i.e. the rescaling of the images) and should be easier to generate. Did you already get good results with your data on the 64x64 scale? Please share your experience so far. 😄

bartolsthoorn on Feb 27, 2017

DCGAN is quite old. Check the latest papers on GANs and you will find many large resolution models/examples. You need a dataset with high resolution images as well (of course).

On Thu, 21 Mar 2019 at 16:56, Chi Nok Enoch K notifications@github.com wrote:

Not sure if this thread is still active, but did anyone try to generate 128x128 images and upscale to 512x512 per @bartolsthoorn https://github.com/bartolsthoorn 's suggestion?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pytorch/examples/issues/70#issuecomment-475289841, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGgWk-LFMMdIzfFfUK6Fg2YBQHumo6yks5vY6vBgaJpZM4MDwky .

bartolsthoorn on Mar 21, 2019

Ha! Isn’t all of practical ML just tricks 😃 Haven’t had a chance yet–behind on my teaching. Hoping to get back to this within the next week though, and will absolutely update when I do. On Tue, Mar 14, 2017 at 04:40 Lukas Mosser notifications@github.com wrote:

Any luck with the above tricks heuristics?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pytorch/examples/issues/70#issuecomment-286369535, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIQ-V7JYksn7-m-IfnFvafXVdcqfQtSks5rlmCYgaJpZM4MDwky .

– iPhone’d

magsol on Mar 18, 2017