CellBender: Batch size, CUDA out of memory

Hi,

Great package! I am currently using cellbender V2.1. I ran into an issue, which is caused by too high memory allocation.

[....]
cellbender:remove-background: [epoch 198]  average training loss: 1790.0774
cellbender:remove-background: [epoch 199]  average training loss: 1787.5904
cellbender:remove-background: [epoch 200]  average training loss: 1792.2732
cellbender:remove-background: [epoch 200] average test loss: 1773.5361
cellbender:remove-background: Inference procedure complete.
cellbender:remove-background: 2020-08-06 23:06:51
cellbender:remove-background: Preparing to write outputs to file...
cell counts tensor([ 8096.,  6134.,  1805.,  2324.,  5410.,  5546.,  5092.,  1724.,  5301.,
         1329.,  3143.,  5382.,   618.,  3833.,  6279.,  5066.,  2166.,  7982.,
         7920.,  3160.,  3907., 12285.,  3919.,  7285.,  1576.,  2011.,  1805.,
         5842.,  2688.,  8696.,  7202.,  7752.,  6153.,  4572.,  2058.,  7318.,
         3196.,  3786.,  7375.,  2877.,  2555.,  4179.,  1650.,  1776.,  4262.,
         4624.,  5314.,  5727.,  5470.,   693.,  4088.,  2078.,  1429.,  2127.,
         5265.,   649.,  4733.,  9864., 19365.,  7845.,  5621.,   699.,  3006.,
         3918.,  1308.,  6071.,  5948.,  1816.,  7495.,  3055.,  2016., 11080.,
         1845.,  1077., 14801.,  8278.,  2293.,  1718.,  1436.,  7260.,  1655.,
        13636.,  8505.,  1307.,  2211.,  7010.,  4465.,  1496.,  3346.,  8285.,
         1948.,  1978.,  2007.,  1693., 16839.,  6170.,  4675., 12212.,  1955.,
         1499.], device='cuda:0')
Traceback (most recent call last):
  File "path/to/bin/cellbender", line 33, in <module>
    sys.exit(load_entry_point('cellbender', 'console_scripts', 'cellbender')())
  File "path/to/CellBender/cellbender/base_cli.py", line 101, in main
    cli_dict[args.tool].run(args)
  File "path/to/cellbender/remove_background/cli.py", line 103, in run
    main(args)
  File "path/to/cellbender/remove_background/cli.py", line 196, in main
    run_remove_background(args)
  File "path/to/cellbender/remove_background/cli.py", line 166, in run_remove_background
    save_plots=True)
  File "path/to/cellbender/remove_background/data/dataset.py", line 524, in save_to_output_file
    inferred_count_matrix = self.posterior.mean
  File "path/to/cellbender/remove_background/infer.py", line 56, in mean
    self._get_mean()
  File "path/to/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "path/to/cellbender/remove_background/infer.py", line 402, in _get_mean
    alpha_est=map_est['alpha'])
  File "path/to/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "path/to/cellbender/remove_background/infer.py", line 1005, in _lambda_binary_search_given_fpr
    alpha_est=alpha_est)
  File "path/to/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "path/to/cellbender/remove_background/infer.py", line 809, in _calculate_expected_fpr_given_lambda_mult
    alpha_est=alpha_est)
  File "path/to/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "path/to/cellbender/remove_background/infer.py", line 604, in _true_counts_from_params
    .log_prob(noise_count_tensor)
  File path/to/lib/python3.7/site-packages/torch/distributions/poisson.py", line 63, in log_prob
    return (rate.log() * value) - rate - (value + 1).lgamma()
RuntimeError: CUDA out of memory. Tried to allocate 1016.00 MiB (GPU 0; 3.97 GiB total capacity; 2.48 GiB already allocated; 378.79 MiB free; 2.58 GiB reserved in total by PyTorch)

Do you suggest to change environmental settings, or adjust the batch size? Changing “empty-drop-training-fraction” did not solve the issue. Thanks for your thoughts!

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 31 (15 by maintainers)

Commits related to this issue

Most upvoted comments

Love this hack idea, I’ve also run into the same problem on some GPUs, would be cool to have a parameter a user can input depending on the vram the GPUs they have access to so they can run 😃

Thanks!

I have a couple questions for you:

  1. How much GPU memory do you have? I typically work with a Tesla K80 that has (I think) 15GB of GPU memory. It looks like maybe you are running on a GPU with less memory?
  2. How many genes does your dataset have? (It should be around line 8 in the CellBender log file)

Interesting, I see that the inference procedure has completed. It looks like it’s during posterior generation that this memory error occurs. Unfortunately that is a place where I’ve currently got something hard-coded that you can’t change by modifying an input parameter.

If you want to reach into the code to try to make this work, I would suggest that you do the following with your own local copy of the cellbender code:

https://github.com/broadinstitute/CellBender/blob/0feb5e0f8867332d3c55711e4390dd4f5b03fa18/cellbender/remove_background/infer.py#L413

I would change this line to batch_size=5. It may create the posterior a bit more slowly, but at least it should help you get an output.

@sjfleming In deed I was running cellbender in the cellbender conda environment. And I do not have conda environment called r4.0.2. When I run the code you suggested on my.ipynb, I got a my.nbconvert.ipynbfile.

I found out the issue was because of nbconvert was looking for python3, I had python3 longtime ago in the r4.0.2 environment in nbconvert. I tried cellbender after removed ‘r4.0.2’ in nbcovert, now everything went smoothly! Thanks a lot for your help!

Best, Zhijun

Hi @zhijunyuu , sorry, I should have been more clear. Those things are now command-line arguments for v0.3.0, which was released just last week.

You can get v0.3.0 by doing

pip install cellbender

(Make sure you are using python 3.7)

So try installing v0.3.0 and then try running cellbender like this

$ cellbender remove-background \
    --cuda \
    --input my_input_file.h5 \
    --output my_output_file.h5 \
    --projected-ambient-count-threshold 2 \
    --posterior-batch-size 64

Hi @zhijunyuu , it looks like you are already using that sf_dev_0.3.0_postreg branch, potentially, based on the log file. You can make sure you have the latest version of the changes on that branch by following this:

https://github.com/broadinstitute/CellBender/issues/225#issuecomment-1667976108

I would suggest two potential fixes:

  1. try setting --projected-ambient-count-threshold 2… this will greatly limit the number of ATAC features included in the analysis
  2. try setting --posterior-batch-size 64 (the default is 128, and it looks like you just barely overflowed your GPU memory)

The multiome RNA + ATAC data is a challenge for cellbender still, since there are typically close to 200k features!

So I tried changing the batch size to 5 as suggested above, but I got the same error. However, after looking at #98, I tried to change the n_cells parameter to 10 rather than 100 (not sure if this is too low?) and with that it was able to finish running.

Thanks!

I have a couple questions for you:

  1. How much GPU memory do you have? I typically work with a Tesla K80 that has (I think) 15GB of GPU memory. It looks like maybe you are running on a GPU with less memory?
  2. How many genes does your dataset have? (It should be around line 8 in the CellBender log file)

Interesting, I see that the inference procedure has completed. It looks like it’s during posterior generation that this memory error occurs. Unfortunately that is a place where I’ve currently got something hard-coded that you can’t change by modifying an input parameter.

If you want to reach into the code to try to make this work, I would suggest that you do the following with your own local copy of the cellbender code:

https://github.com/broadinstitute/CellBender/blob/0feb5e0f8867332d3c55711e4390dd4f5b03fa18/cellbender/remove_background/infer.py#L413

I would change this line to batch_size=5. It may create the posterior a bit more slowly, but at least it should help you get an output.

This works on my case.