cognitive-services-speech-sdk: First Bookmark's audio_offset Incorrectly Includes Following Break Time Duration

Describe the bug When a bookmark is placed at the very beginning of the SSML string and is followed by a <break> tag, the evt.audio_offset value of the bookmark incorrectly includes the break time. This does not occur when the bookmark is placed elsewhere in the SSML string.

To Reproduce Steps to reproduce the behavior:

Generate SSML that starts with a bookmark followed by a break, e.g., <bookmark mark="1"/> <break time="2s"/>.
Process the SSML with the Azure Text to Speech service.
In the bookmark_reached callback, log the evt.audio_offset value.

Example SSML:

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="es-ES"><voice name="es-ES-AlvaroNeural"><prosody rate="1"><bookmark mark="1"/> <break time="2s"></break> ¡Hola! ¿Qué tal? <bookmark mark="2"/> <break time="2s"></break></prosody></voice></speak>

Expected behavior The evt.audio_offset value of the bookmark should not include the subsequent break time. As the bookmark is placed at the very start of the SSML, the evt.audio_offset should be 0.

Version of the Cognitive Services Speech SDK azure-cognitiveservices-speech 1.29.0

Platform, Operating System, and Programming Language

OS: Any
Programming language: Python

About this issue

Original URL
State: open
Created a year ago
Comments: 15 (2 by maintainers)

Most upvoted comments

Tracked as long-term fix, we have put it into backlog for future plan. Will continue follow up with it.

Kerry-LinZhang on Jul 27, 2023