langchain: Token usage calculation is not working for ChatOpenAI
Token usage calculation is not working for ChatOpenAI.
How to reproduce
from langchain.callbacks import get_openai_callback
from langchain.chat_models import ChatOpenAI
from langchain.schema import (
AIMessage,
HumanMessage,
SystemMessage
)
chat = ChatOpenAI(model_name="gpt-3.5-turbo")
with get_openai_callback() as cb:
result = chat([HumanMessage(content="Tell me a joke")])
print(f"Total Tokens: {cb.total_tokens}")
print(f"Prompt Tokens: {cb.prompt_tokens}")
print(f"Completion Tokens: {cb.completion_tokens}")
print(f"Successful Requests: {cb.successful_requests}")
print(f"Total Cost (USD): ${cb.total_cost}")
Output:
Total Tokens: 0
Prompt Tokens: 0
Completion Tokens: 0
Successful Requests: 0
Total Cost (USD): $0.0
Possible fix
The following patch fixes the issues, but breaks the linter.
From f60afc48c9082fc6b09d69b8c8375353acc9fc0b Mon Sep 17 00:00:00 2001
From: Fabio Perez <fabioperez@users.noreply.github.com>
Date: Mon, 3 Apr 2023 19:06:34 -0300
Subject: [PATCH] Fix token usage in ChatOpenAI
---
langchain/chat_models/openai.py | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/langchain/chat_models/openai.py b/langchain/chat_models/openai.py
index c7ee4bd..a8d5fbd 100644
--- a/langchain/chat_models/openai.py
+++ b/langchain/chat_models/openai.py
@@ -274,7 +274,9 @@ class ChatOpenAI(BaseChatModel, BaseModel):
gen = ChatGeneration(message=message)
generations.append(gen)
llm_output = {"token_usage": response["usage"], "model_name": self.model_name}
- return ChatResult(generations=generations, llm_output=llm_output)
+ result = ChatResult(generations=generations, llm_output=llm_output)
+ self.callback_manager.on_llm_end(result, verbose=self.verbose)
+ return result
async def _agenerate(
self, messages: List[BaseMessage], stop: Optional[List[str]] = None
--
2.39.2 (Apple Git-143)
I tried to change the signature of on_llm_end
(langchain/callbacks/base.py) to:
async def on_llm_end(
self, response: Union[LLMResult, ChatResult], **kwargs: Any
) -> None:
but this will break many places, so I’m not sure if that’s the best way to fix this issue.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 12
- Comments: 15 (1 by maintainers)
Still an issue today for me. Code to reproduce.
Results:
@hinthornw this doesn’t work for streaming responses though. Is there any way to make
OpenAICallbackHandler
work withChatOpenAI(streaming=True)
? The issue is thaton_llm_end
is entered before the response is complete which leads to usage being 0.having the same issue… following thread
Related to https://github.com/hwchase17/langchain/pull/1924, pls take a look at the discussion there
@liowalex I guess we really want the count that OpenAI is returning. Note that langchain will retry failed calls, which will also count towards the token rate limit. So input and output tokens are not the complete picture.