BotBuilder-Samples: experimental nodejs for Direct Line Speech does not respond to streaming audio

I implemented the experimental code here (I understand it is experimental, but I think it should at least work fully, correct?)

https://github.com/microsoft/BotBuilder-Samples/tree/master/experimental/directline-speech/javascript_nodejs/02.echo-bot

I attach to the WebSocketConnector via a python websocket using pyaudio. python code:

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)
for i in range(0, (RATE // CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK, exception_on_overflow=False)
    frames.append(data)
    r = requests.get(url = WEBSOCKET_HOST, stream=True)
    s.send(pickle.dumps(frames), opcode=websocket.ABNF.OPCODE_BINARY)

Logging output from the nodejs code:

restify listening to http://[::]:3978

Get Bot Framework Emulator: https://aka.ms/botframework-emulator

To talk to your bot, open the emulator select "Open Bot"
websocket
Creating socket for WebSocket connection.
Creating server for WebSocket connection.
Listening on WebSocket server.

the "Creating socket… " happens when I open up the stream and send the audio to the websocket. nothing happens beyond this - the bot onMessage call is not activated via the audio stream. What should the user do additionally?

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 15 (6 by maintainers)

Most upvoted comments

ok, thanks for the information and clarification @DDEfromOR. I think the term ‘streaming’ can indeed be misleading - whether one is streaming audio from an application to the node.js example echobot or to separately set up the DLS channel and send the audio that way. I’d appreciate clearer examples and the differentiations. I’m not sure it makes sense to close b/c I would think you want to make sure the docs are updated before you do so…

cindyloo on Sep 17, 2019

I’m pretty late to this thread, but FWIW @mdrichardson is correct that the audio stream needs to be sent to the Direct Line Speech channel which then converts the audio to text and sends the result to the bot’s endpoint. @cindyloo is correct that the websocket connection to the bot technically allows a bot to receive streamed audio, but the current samples don’t support this and the intent with Direct Line Speech is for bots to continue to operate on text.

All of that being said, the bigger issue is the experimental samples are outdated and need to be fixed. There is now a public preview build of the Node.js library that can be added to bots to enable the web socket connection used by Direct Line Speech.

TLDR: The updated sample will look something like this:

Modify the call to create a Restify server to have it also handle websocket upgrades. let server = restify.createServer({ handleUpgrades: true });

Then add the following code to connect an incoming web socket request with the “streaming” adapter:

server.get('/api/messages', function upgradeRoute(req, res) {
    const adapter = new BotFrameworkStreamingAdapter(bot);
    adapter.connectWebSocket(req, res, { appId: process.env.MicrosoftAppId,
        appPassword: process.env.MicrosoftAppPassword,
        channelService: process.env.ChannelService,
    });
});

From here the bot endpoint is able to accept web socket GET requests from the channel and establish the “streaming” connection.

One of the main sources of confusion we’re trying to clear up with better documentation and samples is around the term ‘streaming’.

The bot upgrade to speak with Direct Line Speech and similar channels is “streaming” in the sense there is only one connection established and it’s used for all communication between the channel and the bot, eliminating the overhead of the normal REST setup, with the goal being a reduction in latency in order to allow the bot to respond to user interactions without pauses. The main driving factor behind this was allowing existing bots that were built around text interactions to be able to seamlessly upgrade to working with Direct Line Speech.

Meanwhile, the client connection to the channel is “streaming” in the sense that it really does send audio (though AFAIK the audio is actually sent in clips and not a live stream here, either).

In a nutshell, the protocol used by the bot hasn’t changed, we’ve only added support for new transports and opened the door to develop a real streaming protocol (or much more likely, adopt an existing one) in the future.

Thank you for trying out our experimental tech! You have no idea how much we appreciate developers willing to get their hands dirty and put up with all of the pain points involved with works in progress. I’ll get in touch with the samples team about getting the experimental bot designs updated.

DDEfromOR on Sep 11, 2019

thanks Jessica - here is my calling code in python which obtains the audio stream from my client

import pyaudio
import pickle
import websocket
#record
CHUNK = 512
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS = 10

HOST = ''    # The remote host
PORT = 3978              # The same port as used by the server
WEBSOCKET_HOST = 'ws://127.0.0.1:3978/api/messages'

p = pyaudio.PyAudio()
s = websocket.create_connection(WEBSOCKET_HOST)

s.send('Hii')

for i in range(0, p.get_device_count()):
    print(i, p.get_device_info_by_index(i)['name'])



print("open stream...")

frames = []

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

print((RATE / CHUNK) * RECORD_SECONDS)

for i in range(0, (RATE // CHUNK * RECORD_SECONDS)):

    data = stream.read(CHUNK, exception_on_overflow=False)
    frames.append(data)
    #r = requests.get(url = WEBSOCKET_HOST, stream=True)
    s.send(pickle.dumps(frames), opcode=websocket.ABNF.OPCODE_BINARY)


print("---done recording---")

###
#waveFile = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
#waveFile.setnchannels(CHANNELS)
#waveFile.setsampwidth(p.get_sample_size(FORMAT))
#waveFile.setframerate(RATE)
#waveFile.writeframes(b''.join(frames))
#waveFile.close()

####


stream.stop_stream()
stream.close()
p.terminate()
s.close()

print("*closed")

the javascript code is just the example code from this repo

cindyloo on Aug 27, 2019