FastSpeech2: RuntimeError: Error(s) in loading state_dict for FastSpeech2:

When I tried to load the pretrained model output/LJSpeech/ckpt/900000.pth.tar, I have some errors:

size mismatch for encoder.src_word_emb.weight: copying a param with shape torch.Size([361, 256]) from checkpoint, the shape in current model is torch.Size([151, 256]).

The code which loads the model from repo

base_config_path = "config/LJSpeech"
prepr_path = f"{base_config_path}/preprocess.yaml"
model_path = f"{base_config_path}/model.yaml"
train_path = f"{base_config_path}/train.yaml"

prepr_config = yaml.load(open(prepr_path, "r"), Loader=yaml.FullLoader)
model_config = yaml.load(open(model_path, "r"), Loader=yaml.FullLoader)
train_config = yaml.load(open(train_path, "r"), Loader=yaml.FullLoader)
configs = (prepr_config, model_config, train_config)
cpkt_path = "output/LJSpeech/ckpt/900000.pth.tar"

def get_model(ckpt_path, configs):
    (preprocess_config, model_config, train_config) = configs
    model = FastSpeech2(preprocess_config, model_config).to(device)
    ckpt = torch.load(ckpt_path)
    model.load_state_dict(ckpt["model"])
    model.eval()
    model.requires_grad_ = False
    return model

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 15 (5 by maintainers)

Most upvoted comments

@leminhnguyen It is a good suggestion. I think using ground-truth pitch, duration, and energy as the inputs of FastSpeech2 is somehow similar to using teacher forcing mode in Tacotron 2 (but still different). Looking forward to your result!

ming024 on Mar 12, 2021