cognitive-services-speech-sdk: Error for MULAW push stream on Windows

Hello,

I am trying to stream Twilio audio (MULAW) to Speech to Text service similar to #446 but I’m running C# on Windows.

I have GStreamer installed and using 1.11.0 of Microsoft.CognitiveServices.Speech

  • OS: Windows
  • Hardware - x64
  • Programming language: C#
private SpeechRecognizer _recognizer;
        private PushAudioInputStream _inputStream;
        private AudioConfig _audioInput;

public async Task Start()
        {
            var config = SpeechConfig.FromSubscription(_projectSettings.AzureSpeechServiceSubscriptionKey, _projectSettings.AzureSpeechServiceRegionName);
            
            var audioFormat = AudioStreamFormat.GetCompressedFormat(AudioStreamContainerFormat.MULAW);

            _inputStream = AudioInputStream.CreatePushStream(audioFormat);
            _audioInput = AudioConfig.FromStreamInput(_inputStream);

            _recognizer = new SpeechRecognizer(config, _audioInput);
            _recognizer.SessionStarted += RecognizerStarted;
            _recognizer.Recognized += RecognizerRecognized;
            _recognizer.Canceled += RecognizerCancelled;

            await _recognizer.StartContinuousRecognitionAsync();
        }

I am taking the raw Twilio stream (media.payload see here) base64 decoding it and feeding it directly into the push stream with no buffer

public async Task Transcribe(byte[] audioBytes) { _inputStream.Write(audioBytes); }

I get the following error:

Message: The stream is of a different type than handled by this element. DebugInfo: riff-read.c(262): gst_riff_parse_file_header (): /GstPipeline:pipeline/GstWavParse:wavparse: Stream is no RIFF stream: 0x7a6f7afe SessionId: 3aeeac302e6f410a9411751dd25512c8

Please let me know if my setup is incorrect as I could find no examples with a stream to pushstream scenario

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 21 (8 by maintainers)

Most upvoted comments

@twilio-jyoung - no problem Ive updated repo here

Hi @twilio-jyoung

I was able to get it working (credit to Joris Kalz from Microsoft who gave me the solution) Using the following two files:

https://www.codeproject.com/Articles/14237/Using-the-G711-standard MuLawDecoder.cs MuLawEncoder.cs

Let me know if you need me to upload my feature branch with full example.

Its a lot of code - and not even sure what it does 😃 so it would really be better if Microsoft supported MULAW/8000 format or Twilio was able to support streaming in WAV/PCM (was told this was on long term road map)

@garethkelly, I cannot tell you how much I appreciate you sharing this with me. drop me an email and I’ll ship you some twilio swag!

@twilio-jyoung As I have told that currently we do not support Twilio, but we would be really happy to provide a support for Twilio. Could you provide me a sample code (C#/C++) to convert from Twilio stream to a wave file using gstreamer? Also if possible could you attach some sample Twilio stream for our testing ?

I too ran into this problem and only found this thread after working all weekend trying to get it to work.

@amitkumarshukla, I work for Twilio. Let me know if there is anything I can do to provide more details about how our streams work, or help you get setup with an account and unblock your testing. We have many customers who would like for this integration to work (hence my PoC efforts over the weekend).

our media payloads are mulaw/8000 base64-encoded. We’ve got pretty much every other major player built out and working except this one:

Amazon Transcribe - https://github.com/TwilioDevEd/talkin-cedric-node Google Cloud Speech - https://github.com/twilio/media-streams/tree/master/node/realtime-transcriptions, https://github.com/twilio/media-streams/tree/master/java/realtime-transcriptions IBM Watson - https://github.com/twilio/media-streams/tree/master/node/keyword-detection Google Dialogflow - https://github.com/twilio/media-streams/tree/master/node/dialogflow-integration and quite a few others…