SoundStorm-pytorch: Problems with SoundStorm
Have trained update_v2 branch on :
- Extracted Semantic token from HuBert Large layer 16 with 1024 cluster Kmean. (
50 tok/sec) - Extracted Acoustic token from Encodec 24 khz sample rate, 240 hop length with 8 cookbook config from here. (
100 tok/sec)
Results: Output is not as desired, here is the sample first 6 sec is prompt.
This thread uses as a potential issue tracker and solution logs.
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 49 (26 by maintainers)
Hi Rishikksh20, your training codes is working well on my data pipeline (I modified the code a little bit to fit my data), for inference I made a new version that combined yours and lucidrain’s inference code, and it gives samples even slightly better than what I already have. code for reference https://github.com/feng-yufei/shared_debugging_code/blob/main/soundstorm2.py
below I provide the core code and one sample, which I think is very close to the paper’s description https://github.com/feng-yufei/shared_debugging_code/blob/main/soundstorm.py, hope it can be useful