audiolm-pytorch: Audio generation failing at FineTransformer
I tried training a model back when the repo was at commit 95e0669dde9c177b807fa6f0a52e4d2e685c47fd and successfully got checkpoints but it crashed when I tried to test the generations. The error message was a hard-to-understand CUDA message:
generating fine: 0%| | 0/512 [00:00<?, ?it/s]../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [480,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [480,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
... many repetitions ...
File "/fsx/itsleonwu/audiolm-pytorch-training/audiolm_pytorch/audiolm_pytorch.py", line 1617, in generate
_, fine_logits = self.transformer.forward_with_cond_scale(
... more stuff ...
I suspect the problem is some bug with coarse transformer’s eos handling in the coarse transformer, because the generation crashes specifically when the fine transformer is just about to get started. I printed state and found that the coarse token id had a -1, which I think is the result of applying mask_out_after_eos_id
. But it turns out that the first index of the -1 was at timestep 121, quantizer 2 (0, 121, 2)
which is not in-between a “full” quantizer step-- I’d expect the first -1 to appear somewhere like (batch_index, timestep, 0)
. Seems plausible that this is consistent with a CUDA issue (I’m guessing -1 when you expect all the indices to be small-ish nonnegative ints could result in some memory bounds violations).
Going to use this issue to track any updates and what I’ve tried-- will be using the script in https://github.com/LWprogramming/audiolm-pytorch-training/blob/main/audiolm_pytorch_demo_laion.py (which I set up to eliminate non-determinism).
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 15 (15 by maintainers)
Commits related to this issue
- address https://github.com/lucidrains/audiolm-pytorch/issues/199 — committed to lucidrains/audiolm-pytorch by lucidrains a year ago
- address https://github.com/lucidrains/audiolm-pytorch/issues/199 — committed to lucidrains/audiolm-pytorch by lucidrains a year ago
- change back to new slurm jobs, seems like 280087 might've had an issue (see https://github.com/lucidrains/audiolm-pytorch/issues/199\#issuecomment-1593944300) — committed to LWprogramming/audiolm-pytorch-training by LWprogramming a year ago
Hm, it seems to work now when I try a different dataset. I’d originally tried to train the model on a tiny dataset (intentionally overfit to see if it can do that) with samples trimmed to exactly
data_max_length
, and that’s when unaligned eos starts showing up. It still does, but I can just try on input data that’s a bit larger and that should probably be ok.edit: hang on, I didn’t do the trimming properly. Now I’m not sure what’s causing the issue 🙃
should be resolved, feel free to reopen if any new error pops up!
@LWprogramming that’s true! well, it wouldn’t hurt to keep it in there for now 😄
thanks, it was nice!
Right, everything after eos should disappear based on that masking logic, although I’m a bit confused the relation between this masking and the fine transformer logic you implemented in theFineTransformer
change. I think the original issue was that there was an eos token incoarse_token_ids
even though it should’ve been masked out by the code you link. This issue only shows up when we actually try to use thesecoarse_token_ids
inFineTransformer
’sgenerate()
, which (iiuc) expects the eos to have been masked out correctly.Oh, in the process of writing this I think I get what you did here? So if previously the
FineTransformerWrapper
had logic to avoid trying to do anything with eos coarse tokens that weren’t properly masked, you moved that toFineTransformer
so it always works. But if the eos should be masked out inCoarseTransformer
already, I’m not sure why we see eos by the time we get toFineTransformer
anyways.hope the trip was nice! 😃
Just submitted the job to try (pending, so no results yet) but while we wait, just to check my understanding: this change masks out anything that isn’t an actual coarse index, so the transformer doesn’t learn anything that relies on those special tokens. However, why does this prevent eos from appearing in the wrong spot (i.e. not aligned with the end of a quantizer step)? Or is the goal just to make that really low probability because during training attention never sees eos?
Here is the script I’m trying on some small dataset but you can use an artificial one by uncommenting
make_placeholder_dataset()
, switching todataset_folder = f"{prefix}/placeholder_dataset"
, and setting all the train steps etc to be super low so you can get to the error quickly.And I confirmed that the eos is the problem because the assertion here triggered when my job ran last night