langchain: Token usage calculation is not working for ChatOpenAI

Token usage calculation is not working for ChatOpenAI.

How to reproduce

from langchain.callbacks import get_openai_callback
from langchain.chat_models import ChatOpenAI
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
chat = ChatOpenAI(model_name="gpt-3.5-turbo")
with get_openai_callback() as cb:
    result = chat([HumanMessage(content="Tell me a joke")])
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Successful Requests: {cb.successful_requests}")
    print(f"Total Cost (USD): ${cb.total_cost}")

Output:

Total Tokens: 0
Prompt Tokens: 0
Completion Tokens: 0
Successful Requests: 0
Total Cost (USD): $0.0

Possible fix

The following patch fixes the issues, but breaks the linter.

From f60afc48c9082fc6b09d69b8c8375353acc9fc0b Mon Sep 17 00:00:00 2001
From: Fabio Perez <fabioperez@users.noreply.github.com>
Date: Mon, 3 Apr 2023 19:06:34 -0300
Subject: [PATCH] Fix token usage in ChatOpenAI

---
 langchain/chat_models/openai.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/langchain/chat_models/openai.py b/langchain/chat_models/openai.py
index c7ee4bd..a8d5fbd 100644
--- a/langchain/chat_models/openai.py
+++ b/langchain/chat_models/openai.py
@@ -274,7 +274,9 @@ class ChatOpenAI(BaseChatModel, BaseModel):
             gen = ChatGeneration(message=message)
             generations.append(gen)
         llm_output = {"token_usage": response["usage"], "model_name": self.model_name}
-        return ChatResult(generations=generations, llm_output=llm_output)
+        result = ChatResult(generations=generations, llm_output=llm_output)
+        self.callback_manager.on_llm_end(result, verbose=self.verbose)
+        return result
 
     async def _agenerate(
         self, messages: List[BaseMessage], stop: Optional[List[str]] = None
-- 
2.39.2 (Apple Git-143)

I tried to change the signature of on_llm_end (langchain/callbacks/base.py) to:

    async def on_llm_end(
        self, response: Union[LLMResult, ChatResult], **kwargs: Any
    ) -> None:

but this will break many places, so I’m not sure if that’s the best way to fix this issue.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 12
  • Comments: 15 (1 by maintainers)

Most upvoted comments

Still an issue today for me. Code to reproduce.

model_name = 'gpt-4'

with get_openai_callback() as cb:
    chat4 = ChatOpenAI(
        temperature=0.1,
        model=model_name,
    )

response = chat4(chat_prompt)

print(cb)

Results:

Tokens Used: 0
	Prompt Tokens: 0
	Completion Tokens: 0
Successful Requests: 0
Total Cost (USD): $0.0

@hinthornw this doesn’t work for streaming responses though. Is there any way to make OpenAICallbackHandler work with ChatOpenAI(streaming=True) ? The issue is that on_llm_end is entered before the response is complete which leads to usage being 0.

having the same issue… following thread

Related to https://github.com/hwchase17/langchain/pull/1924, pls take a look at the discussion there

@liowalex I guess we really want the count that OpenAI is returning. Note that langchain will retry failed calls, which will also count towards the token rate limit. So input and output tokens are not the complete picture.