openai-node: Non-ASCII tokens are corrupted sometimes when using the streaming API

Confirm this is a Node library issue and not an underlying OpenAI API issue

  • This is an issue with the Node library

Describe the bug

When using the streaming API, sometimes tokens get corrupted. Characters get replaced by two or more \uFFFD. For example:

{
  choices: [ { text: ' из��естни' } ],
}

when the token received is actually supposed to be ' известни'.

The issue occurs because LineDecoder does not deal with multi-byte characters on chunk boundaries. Instead of using a separate TextDecoder instance per buffer, perhaps it should use a single TextDecoderStream for the entire stream.

To Reproduce

  1. Send a streaming completion request that will get non-ASCII tokens as a response.
  2. Observe the output. With some probability, some of the tokens will be corrupted.

Code snippets

No response

OS

Linux

Node version

Node v18.19.1

Library version

openai v4.14.2

About this issue

  • Original URL
  • State: closed
  • Created 4 months ago
  • Comments: 18

Most upvoted comments

Sorry, we’ve been a bit delayed here - we hope to take another crack tomorrow.