amazon-transcribe-streaming-sdk: amazon_transcribe.exceptions.BadRequestException: Signature expired Exception

Hi,

I have used this sample to test with a wav file in local, I have just changed param to point to my audio file in local. This script worked perfectly and I was able to get the transcript text for first 5 minutes.
But after 5 minutes, I got the below exception :

File "test_amazon_transcribe.py", line 82, in basic_transcribe await asyncio.gather(write_chunks(), handler.handle_events()) File "/root/apps/build/python36/lib/python3.6/site-packages/amazon_transcribe/handlers.py", line 26, in handle_events async for event in self._transcript_result_stream: File "/root/apps/build/python36/lib/python3.6/site-packages/amazon_transcribe/eventstream.py", line 666, in __aiter__ parsed_event = self._parser.parse(event) File "/root/apps/build/python36/lib/python3.6/site-packages/amazon_transcribe/deserialize.py", line 147, in parse raise self._parse_event_exception(raw_event) amazon_transcribe.exceptions.BadRequestException: Signature expired: 20201123T084949Z is now earlier than 20201123T084949Z (20201123T085449Z - 5 min.)

I have re run the script for few times, but the issue still exists. Could you please help me?

About this issue

Original URL
State: open
Created 4 years ago
Comments: 20 (6 by maintainers)

Most upvoted comments

Hi @lalogonzalez

For the first issue, I have used the snap code of this example. All parameters are the same (chunk_size, sample_rate, …), only one different is my audio in local has file size > 100Mb and the audio duration is longer than 1 hour. Original example did not reproduce the issue because the audio duration is too short. To fix the first issue, I have just put one more line code sleep 0.5 seconds just right after you send a audio chunk

await stream.input_stream.send_audio_event(audio_chunk=chunk) time.sleep(0.5)

I am not sure if this is a good fix or just a workaround, but definitively I did not get any more exception kind of Signature expired

But right now, I sill have second issue The request signature we calculated does not match the signature raised in permanent after roughly 30 minutes or more. This issue, I am thinking to retry with a new TranscribeStreamingClient and reused the session_id ,even if this solution works, it sounds to me the workaround as I have to use retry for every times.

Is there anyone test successfully for the audio file with the duration longer than 1 hour?

bangnguyen on Nov 26, 2020

@bangnguyen

Here’s a quick example of what I explained in my previous message, using a modified version of the file based example.

Big disclaimer: I have not thoroughly tested the following code outside of a couple test files – I have no clue how robust the following sample code is but should be close enough to get something functional streaming for long running streams. In particular the parsing code makes some assumptions but should work for basic PCM WAV files.

import asyncio
import aiofile

from amazon_transcribe.client import TranscribeStreamingClient
from amazon_transcribe.handlers import TranscriptResultStreamHandler
from amazon_transcribe.model import TranscriptEvent


async def parse_int(file, byte_length=4):
    chunk = await file.read(byte_length)
    return int.from_bytes(chunk, 'little')


async def parse_wav_metadata(file):
    riff = await file.read(4)
    assert riff == b'RIFF'

    overall_size = await parse_int(file)

    wave = await file.read(4)
    assert wave == b'WAVE'

    fmt = await file.read(4)
    assert fmt == b'fmt '

    fmt_data_len = await parse_int(file)
    fmt_type = await parse_int(file, byte_length=2)
    num_channels = await parse_int(file, byte_length=2)
    sample_rate = await parse_int(file)
    byte_rate = await parse_int(file)
    block_align = await parse_int(file, byte_length=2)
    bits_per_sample = await parse_int(file, byte_length=2)

    # Byte rate should equal (Sample Rate * BitsPerSample * Channels) / 8
    assert (sample_rate * bits_per_sample * num_channels) / 8 == byte_rate

    data_header = await file.read(4)
    assert data_header == b'data'

    data_len = await parse_int(file)

    wav_metadata = {
        'OverallSize': overall_size,
        'FormatLength': fmt_data_len,
        'FormatType': fmt_type,
        'Channels': num_channels,
        'SampleRate': sample_rate,
        'ByteRate': byte_rate,
        'BlockAlign': block_align,
        'BitsPerSample': bits_per_sample,
        'DataLength': data_len,
    }

    return wav_metadata


async def rate_limit(file, byte_rate):
    chunk = await file.read(byte_rate)
    loop = asyncio.get_event_loop()
    last_yield_time = -1.0 # -1 to allow the first yield immediately
    while chunk:
        time_since_last_yield = loop.time() - last_yield_time
        if time_since_last_yield < 1.0:
            # Only yield once per second at most, compensating for how long
            # between the last yield it's been
            await asyncio.sleep(1.0 - time_since_last_yield)
        last_yield_time = loop.time()
        yield chunk
        chunk = await file.read(byte_rate)


class MyEventHandler(TranscriptResultStreamHandler):
    async def handle_transcript_event(self, transcript_event: TranscriptEvent):
        results = transcript_event.transcript.results
        for result in results:
            for alt in result.alternatives:
                print(alt.transcript)


async def write_chunks(stream, f, wav_metadata):
    async for chunk in rate_limit(f, wav_metadata['ByteRate']):
        await stream.input_stream.send_audio_event(audio_chunk=chunk)
    await stream.input_stream.end_stream()


async def basic_transcribe(filepath):
    # Setup up our client with our chosen AWS region
    client = TranscribeStreamingClient(region="us-west-2")

    async with aiofile.async_open(filepath, 'rb') as f:
        wav_metadata = await parse_wav_metadata(f)

        # Start transcription to generate our async stream
        stream = await client.start_stream_transcription(
            language_code="en-US",
            media_sample_rate_hz=wav_metadata['SampleRate'],
            media_encoding="pcm",
        )

        # Instantiate our handler and start processing events
        await asyncio.gather(
            write_chunks(stream, f, wav_metadata),
            MyEventHandler(stream.output_stream).handle_events(),
        )


loop = asyncio.get_event_loop()
loop.run_until_complete(basic_transcribe('tests/integration/assets/test.wav'))
loop.close()

joguSD on Dec 2, 2020