pflowtts_pytorch: Crash in MAS

We are experiencing a strange issue. With one our big dataset (about 300 hours) MAS is randomly crashes. Core dump shows following line:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007feaf9884576 in __pyx_f_5pflow_5utils_15monotonic_align_4core_maximum_path_each (__pyx_v_path=..., __pyx_v_value=..., __pyx_v_path=..., __pyx_v_value=..., __pyx_optional_args=0x0, __pyx_v_t_x=289, __pyx_v_t_y=279)
    at pflow/utils/monotonic_align/core.c:17615
17615       if (__pyx_t_7) {

We have tried everything but nothing did help. The only thing that helped was replacing MAS with AlignerNet but there was another issue - crash at inference, maybe synthesis method requires some changes too?

I have successfully trained pflowttss on single speaker dataset which is subset of this bigger dataset and it sounds great. Demo is here - https://tts.patriotyk.name

Also I have built and pushed to registry docker image which can be used to reproduce this issue, just need to pull and run it. I can share url in private message if you need it.

About this issue

  • Original URL
  • State: open
  • Created 5 months ago
  • Comments: 22 (8 by maintainers)

Most upvoted comments

Now I’m wondering if the problem might be that we use text encoder outputs as input to alignernet, which(text encoder outputs) are passed through convolution (to get dimensions like mel frame)? Because while I was testing pitch predictor it didn’t work when conditioned on output of text encoder, but when i tried to use x_emb directly it worked.

I will test and if work I will create PR

I will try later to clamp out of aligner.