transformers: SpeechT5 cannot read numbers

System Info

transformers == 4.29.0 environment = Colab Python == 3.10.11 tensorflow == 2.12.0 torch == 2.0.1+cu118 torchaudio == 2.0.2+cu118

Who can help?

@sanchit-gandhi

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

  1. Init a Transformer agent
  2. Init a text which contains numbers. For example text = “More than 10 people have been killed by Covid.”
  3. Call the agent for a text-to-speech (SpeechT5). For example, audio_translated = agent.run(“Read out loud the text”, text=text)
  4. Play the generated audio

The audio blanks all the numbers/digits.

I am suspecting SpeechT5 to behave wrongly as the code generated by the agent seems to be correct.

Good luck 😃

Expected behavior

The audio file should contain numbers/digits indicated in the text.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 15 (11 by maintainers)

Most upvoted comments

Hey @heytanay - thanks for jumping on here, it’s all yours! Feel free to open a PR and tag me - happy to assist with the integration! Think the details of how we can do this are more or less detailed in this thread, but let me know if you have any questions

I have created a draft PR @sanchit-gandhi: #25447

Thanks for the reply. It should not be too difficult to ask the LLM to process the text in order to replace all numbers by their litteral equivalents. I will see the agent code to propose a fix.