transformers: SpeechT5 cannot read numbers
System Info
transformers == 4.29.0 environment = Colab Python == 3.10.11 tensorflow == 2.12.0 torch == 2.0.1+cu118 torchaudio == 2.0.2+cu118
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
- Init a Transformer agent
- Init a text which contains numbers. For example text = “More than 10 people have been killed by Covid.”
- Call the agent for a text-to-speech (SpeechT5). For example, audio_translated = agent.run(“Read out loud the text”, text=text)
- Play the generated audio
The audio blanks all the numbers/digits.
I am suspecting SpeechT5 to behave wrongly as the code generated by the agent seems to be correct.
Good luck 😃
Expected behavior
The audio file should contain numbers/digits indicated in the text.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (11 by maintainers)
Hey @heytanay - thanks for jumping on here, it’s all yours! Feel free to open a PR and tag me - happy to assist with the integration! Think the details of how we can do this are more or less detailed in this thread, but let me know if you have any questions
I have created a draft PR @sanchit-gandhi: #25447
Thanks for the reply. It should not be too difficult to ask the LLM to process the text in order to replace all numbers by their litteral equivalents. I will see the agent code to propose a fix.