spaCy: Memory leak with en_core_web_trf model
there is a Memory leak when using pipe of en_core_web_trf model, I run the model using GPU with 16GB RAM, here is a sample of the code.
!python -m spacy download en_core_web_trf
import en_core_web_trf
nlp = en_core_web_trf.load()
#it's just an array of 100K sentences.
data = dataload()
for index, review in enumerate( nlp.pipe(data, batch_size=100) ):
#doing some processing here
if index % 1000: print(index)
this code cracks when reaching 31K, and raises OOM error.
CUDA out of memory. Tried to allocate 46.00 MiB (GPU 0; 11.17 GiB total capacity; 10.44 GiB already allocated; 832.00 KiB free; 10.72 GiB reserved in total by PyTorch)
I just use the pipeline to predict, not train any data or other stuff and tried with different batch sizes, but nothing happened, still, crash.
Your Environment
- spaCy version: 3.0.5
- Platform: Linux-4.19.112±x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.10
- Pipelines: en_core_web_trf (3.0.0)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 7
- Comments: 19 (3 by maintainers)
Focusing on inference (not training) for this particular issue:
I can’t find any behavior that looks like a memory leak and the only way I can reproduce an out of memory error with
en_core_web_trfis with a batch or doc* that is too long. I also checked on CPU with valgrind and couldn’t find any memory leaks.*Correction based on #7268: it looks like long docs are more likely to cause problems than long batches of shorter texts. The same text split up into shorter texts in one single batch does not cause OOM errors even when the same text does as a single doc. Very long batches may still cause issues, of course.
In the original report, the batch size of 100 may be too large given the text lengths. If a batch is too long, the error looks like this:
In this comment above, I think the issue is that you’re saving a list of docs, which each contain saved tensors as part of
doc._.trf_data, and those tensors are stored on the GPU:In contrast, in a loop like this, the tensor data in
doc._.trf_datais garbage collected at some point after the end of each iteration:The data saved in
doc._.trf_datais required while the pipeline is running (the components that listen to the transformer reference these tensors), but after all the listening components have run, you don’t need to keep it unless you need it for further processing. One simple workaround is to add a final custom component that setsdoc._.trf_data = None, which means the tensors will be garbage collected and freed. See: https://github.com/explosion/spaCy/discussions/7486#discussioncomment-512106If you do want to store all the docs with the
TransformerData, I think you could convert the tensors to numpy arrays on CPU instead. I think the simplest way is something like this with.get():You can also add a final custom component that does this step if you want it to run as part of the pipeline.
For reference, I tested with:
Notes:
torch.cuda.empty_cache()andcupy.get_default_memory_pool().free_all_blocks()to free memory manually at an earlier point than it would be automatically, but it shouldn’t be necessaryIf you’re still running into this problem, could you include additional details about the versions of libraries where you see this problem (CUDA, cupy, torch, transformers, thinc, spacy) and the exact code that you’re running.
Thanks, that’s helpful to see.
Can you try configuring the model to periodically flush the pytorch cache? That’s the most obvious built-in option that might help. It’s not enabled by default, the comments in the code say it shouldn’t be necessary, and it doesn’t look like we need to do this while training
en_core_web_trf, so I’m very uncertain about whether it will help, but just to see try setting this to something between 0-1:0.1could well be too high, but I’m not sure what value makes sense. This means that randomly 10% of the time in the forward pass there’s an additional call totorch.cuda.empty_cache().This setting isn’t saved with the model, so if it does help there’s some room for improvement here, but it would be interesting to hear if this did make a difference?
It might also be helpful to see if a simpler pipeline like only
['transformer', 'tagger']runs into the same problem?I’ve replicated the problem with long documents locally but haven’t tried to replicate this yet myself…
@nikjohn7: Memory usage while training is a separate issue. This issue is focused on prediction/inference only.
Could you open a new discussion thread with all the details about your training setup? Unfortunately I don’t see an easy way for me to convert your original comment into a discussion thread…
@adrianeboyd I’m also having the same issue with en_core_web_trf model. It works fine on my smaller dataset (12k) but gives the OOM error when I try with my 60k dataset. I’m using spacy3 with config files and have set the size of examples and batch size to 50, but it is still not working. See my config file below. I am running the model with GPU and 16 GiB memory.
thank you for your suggestion @adrianeboyd but I tried torch.cuda.empty_cache(), but what I found that the memory of GPU wasn’t affected, there are some things that still هccupied a place in the memory and that not make sense because the model is loaded in memory, and the pipeline used just to predict, I tried 1K of my data, and it succeeded, but the memory of GPU didn’t free after deleting the model and remove cash, you should restart the interpreter to get free GPU memory.
This behavior seems to come from having one very long doc. The batch size can currently set the number of docs to process in a batch, but individual docs aren’t split up in any way if they’re very long. Can you check if there’s a particularly long doc at one point in your data?