openai-node: Non-ASCII tokens are corrupted sometimes when using the streaming API

Confirm this is a Node library issue and not an underlying OpenAI API issue

This is an issue with the Node library

Describe the bug

When using the streaming API, sometimes tokens get corrupted. Characters get replaced by two or more \uFFFD. For example:

{
  choices: [ { text: ' из��естни' } ],
}

when the token received is actually supposed to be ' известни'.

The issue occurs because LineDecoder does not deal with multi-byte characters on chunk boundaries. Instead of using a separate TextDecoder instance per buffer, perhaps it should use a single TextDecoderStream for the entire stream.

To Reproduce

Send a streaming completion request that will get non-ASCII tokens as a response.
Observe the output. With some probability, some of the tokens will be corrupted.

Code snippets

No response

OS

Linux

Node version

Node v18.19.1

Library version

openai v4.14.2

About this issue

Original URL
State: closed
Created 4 months ago
Comments: 18

Most upvoted comments

Sorry, we’ve been a bit delayed here - we hope to take another crack tomorrow.

rattrayalex on Mar 29, 2024