vocode-python: [EPD-458] Openai completions stopping

After making minimal changes to the chat.py example to tailor it for a golf booking chatbot flow, the openai completions stop consistently.

The below has been repeatable many times. When the conversation reaches this point, the system responds Thank you for letting me know. but doesn’t send the following sentence asking the user another question.

AI: Hello, I'm Tom from the golf course. How may I help you?
Human: hey i want to book comp
Human: DEBUG:__main__:Responding to transcription
AI: Sure, I can help you with that.
AI: Are you a member of our club?
yep
Human: DEBUG:__main__:Responding to transcription
AI: Great!
AI: Could you please provide me with your member number?
12345
Human: DEBUG:__main__:Responding to transcription
AI: Thank you for providing your member number.
AI: May I have your name, please?
cam
Human: DEBUG:__main__:Responding to transcription
AI: Thank you, Cam.
AI: How many players will be participating in the competition?
2
Human: DEBUG:__main__:Responding to transcription
AI: Thank you for letting me know.

Human: DEBUG:__main__:Responding to transcription
ERROR:asyncio:Unclosed connection
client_connection: Connection<ConnectionKey(host='api.openai.com', port=443, is_ssl=True, ssl=None, proxy=None, proxy_auth=None, proxy_headers_hash=None)>

The response stops, then after I send an empty message to the system I receive the above asyncio error and the system continues as normal.

EDIT: I should state that the completion stops at Thank you for letting me know. 100% of the time but the asyncio error only happens occasionally.

EPD-458

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 25 (6 by maintainers)

Commits related to this issue

Most upvoted comments

I have identified the problem. Vocode splits the OpenAI response on sentences in order to synthesize them as fast as possible. After something is spoken, Vocode adds the utterance to the transcript associated with the ChatGPT Agent. As a result, OpenAI’s response gets added to the transcript but split apart by sentence. So, when the user sends another message and this transcript is reformatted and sent back to OpenAI to generate the next message, the previous assistant message is split.

For example, when recreating @cammoore54’s example with temperature=0 and the gpt-3.5-turbo-16k-0613 model, this is what is sent to the OpenAI API when the user says “yep”:

{'role': 'assistant', 'content': "Hello, I'm Tom from the golf course. How may I help you?"},
{'role': 'user', 'content': 'hey i want to book comp'},
{'role': 'assistant', 'content': 'Sure, I can help you with that.'},
{'role': 'assistant', 'content': 'Are you a member of our golf club?'},
{'role': 'user', 'content': 'yep'},

This is what should be sent (what you put into the OpenAI playground):

{'role': 'assistant', 'content': "Hello, I'm Tom from the golf course. How may I help you?"},
{'role': 'user', 'content': 'hey i want to book comp'},
{'role': 'assistant', 'content': 'Sure, I can help you with that. Are you a member of our golf club?'},
{'role': 'user', 'content': 'yep'},

This difference is the source of the problem. If the previous chat history contains only one sentence responses, then future assistant messages will also only be one sentence.

Good catch finding this bug! The messages should definitely not be split apart when they are sent back to the OpenAI API.

Here are the differences from the OpenAI playground (both with temperature=0 and the gpt-3.5-turbo-16k-0613 model): Screenshot_20230803_184330 Above: Formatted properly, the second sentence is generated.

Screenshot_20230803_183547 Above: Formatted how Vocode currently does it, only one sentence is generated.

So, it seems like the OpenAI playground and the OpenAI python library create the exact same response (testing with temperature=0). Also, setting the stream option or using the async vs sync api doesn’t make a difference. The OpenAI Python issue https://github.com/openai/openai-python/issues/555 is probably not an issue after all. Depending on how the messages are formatted, the second sentence is not generated.

We are currently working on a fix for this! Thanks 😄!

Thanks @bjquinn.

I have tested in isolation with async implementation using the below code and I get the desired responses 100% of the time (the same as playground). Therefore it has to do with the implementation in vocode.

@ajar98 @Kian1354 Do you have the capacity to look into this? I am happy to support but am still familiarising myself with the codebase

async def generate_response(messages):
    async for chunk in await openai.ChatCompletion.acreate(
            model="gpt-3.5-turbo-16k-0613",
            messages=messages,
            stream=True,
            functions=functions
    ):
        chunk_message = chunk['choices'][0]['delta']
        yield chunk_message

async def handle_convo():
    while True:
        message = input("User : ")
        if message:
            messages.append(
                {"role": "user", "content": message},
            )
            collected_messages = []
            async for item in generate_response(messages):
                print(item)
                collected_messages.append(item)

            full_reply_content = ''.join([m.get('content', '') for m in collected_messages])
            print(f"Full conversation received: {full_reply_content}")
            messages.append(
                    {"role": "assistant", "content": full_reply_content},
                )
            
asyncio.run(handle_convo())