openai-node: Non-ASCII tokens are corrupted sometimes when using the streaming API
Confirm this is a Node library issue and not an underlying OpenAI API issue
- This is an issue with the Node library
Describe the bug
When using the streaming API, sometimes tokens get corrupted. Characters get replaced by two or more \uFFFD
. For example:
{
choices: [ { text: ' из��естни' } ],
}
when the token received is actually supposed to be ' известни'
.
The issue occurs because LineDecoder
does not deal with multi-byte characters on chunk boundaries. Instead of using a separate TextDecoder
instance per buffer, perhaps it should use a single TextDecoderStream
for the entire stream.
To Reproduce
- Send a streaming completion request that will get non-ASCII tokens as a response.
- Observe the output. With some probability, some of the tokens will be corrupted.
Code snippets
No response
OS
Linux
Node version
Node v18.19.1
Library version
openai v4.14.2
About this issue
- Original URL
- State: closed
- Created 4 months ago
- Comments: 18
Sorry, we’ve been a bit delayed here - we hope to take another crack tomorrow.