llama_index: [Bug]: The Gemini model streaming interface doesn't work as expected.
Bug Description
When using the new gemini llm interface with a query_engine or a chat_engine, the response is always None if it is streamed.
For the query engine, a none is returned on all queries. for the chat engine, an parsing error is thrown:
Version
0.9.15.post2
Steps to Reproduce
Just run a query engine or a chat engine with gemini as the llm and llm = Gemini() service_context = ServiceContext.from_defaults(llm=llm) and ensure that the engine is set to stream.
Relevant Logs/Tracbacks
##Query engine response for gemini with openai_embedding as the embed_model:
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Time taken: 14.811168895015726 seconds
None response is printed on the terminal.
##Chat engine:
/lib/python3.10/site-packages/llama_index/chat_engine/types.py", line 157, in response_gen
self._unformatted_response += delta
TypeError: can only concatenate str (not "NoneType") to str
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Comments: 16 (7 by maintainers)
🤖
Hey @gich2009! Good to see you again. Hope you’re having a great day!
Thank you for reporting this issue. Based on the information provided, it seems like the Gemini model streaming interface is returning None when used with a query_engine or a chat_engine in the LlamaIndex v0.9.15.post2 due to the way the
stream_completeandstream_chatmethods are implemented in the Gemini class.These methods use the
yield fromstatement to return a generator of responses from thegenerate_contentandsend_messagemethods of the_modelobject respectively. If these methods do not yield any values, the generator will be empty, and when converted to a list or iterated over, it will return None. This could happen if the prompt or messages do not match any patterns that the model has been trained on, or if the model’s generation parameters (like temperature and max_tokens) are set such that it does not generate any output.As for the parsing error in the chat engine, it could be due to the
chat_message_to_geminiandchat_from_gemini_responseutility functions expecting a certain format for the chat messages and responses. If the format of the messages or responses does not match this expected format, a parsing error could occur.I found two similar issues in the LlamaIndex repository that were solved:
[Bug]: Chat engine astream chat error (ValueError: achat_stream is None. Cannot asynchronously write to history without achat_stream.) which was last updated on August 03, 2023. This issue was closed by a pull request: fix astream_chat for chat engines on August 03, 2023.
[Bug]: response_gen doesn’t generate anything from chat with as_chat_engine in the 0.7.0 version which was last updated on July 11, 2023. This issue was closed without any comments or solution.
You might want to check the format of your prompts and messages, adjust the model’s generation parameters, or debug the utility functions and the
stream_completeandstream_chatmethods to ensure they are working as expected. If you need further assistance, please provide more details about the prompts and messages you are using, and the generation parameters you have set.For more information, you can refer to the source code of the Gemini class in the LlamaIndex codebase: llama_index/llms/gemini.py
I hope this helps! If you have any other questions or need further clarification, feel free to ask.
Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.