langchain: Token usage calculation is not working for ChatOpenAI

Token usage calculation is not working for ChatOpenAI.

How to reproduce

from langchain.callbacks import get_openai_callback
from langchain.chat_models import ChatOpenAI
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
chat = ChatOpenAI(model_name="gpt-3.5-turbo")
with get_openai_callback() as cb:
    result = chat([HumanMessage(content="Tell me a joke")])
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Successful Requests: {cb.successful_requests}")
    print(f"Total Cost (USD): ${cb.total_cost}")

Output:

Total Tokens: 0
Prompt Tokens: 0
Completion Tokens: 0
Successful Requests: 0
Total Cost (USD): $0.0

Possible fix

The following patch fixes the issues, but breaks the linter.

From f60afc48c9082fc6b09d69b8c8375353acc9fc0b Mon Sep 17 00:00:00 2001
From: Fabio Perez <fabioperez@users.noreply.github.com>
Date: Mon, 3 Apr 2023 19:06:34 -0300
Subject: [PATCH] Fix token usage in ChatOpenAI

---
 langchain/chat_models/openai.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/langchain/chat_models/openai.py b/langchain/chat_models/openai.py
index c7ee4bd..a8d5fbd 100644
--- a/langchain/chat_models/openai.py
+++ b/langchain/chat_models/openai.py
@@ -274,7 +274,9 @@ class ChatOpenAI(BaseChatModel, BaseModel):
             gen = ChatGeneration(message=message)
             generations.append(gen)
         llm_output = {"token_usage": response["usage"], "model_name": self.model_name}
-        return ChatResult(generations=generations, llm_output=llm_output)
+        result = ChatResult(generations=generations, llm_output=llm_output)
+        self.callback_manager.on_llm_end(result, verbose=self.verbose)
+        return result
 
     async def _agenerate(
         self, messages: List[BaseMessage], stop: Optional[List[str]] = None
-- 
2.39.2 (Apple Git-143)

I tried to change the signature of on_llm_end (langchain/callbacks/base.py) to:

    async def on_llm_end(
        self, response: Union[LLMResult, ChatResult], **kwargs: Any
    ) -> None:

but this will break many places, so I’m not sure if that’s the best way to fix this issue.

About this issue

Original URL
State: closed
Created a year ago
Reactions: 12
Comments: 15 (1 by maintainers)

Most upvoted comments

Still an issue today for me. Code to reproduce.

model_name = 'gpt-4'

with get_openai_callback() as cb:
    chat4 = ChatOpenAI(
        temperature=0.1,
        model=model_name,
    )

response = chat4(chat_prompt)

print(cb)

Results:

Tokens Used: 0
	Prompt Tokens: 0
	Completion Tokens: 0
Successful Requests: 0
Total Cost (USD): $0.0

+16

captivus on Jun 12, 2023

@hinthornw this doesn’t work for streaming responses though. Is there any way to make OpenAICallbackHandler work with ChatOpenAI(streaming=True) ? The issue is that on_llm_end is entered before the response is complete which leads to usage being 0.

gustavz on Aug 24, 2023

having the same issue… following thread

kinged007 on Apr 4, 2023

Related to https://github.com/hwchase17/langchain/pull/1924, pls take a look at the discussion there

stephenleo on Apr 10, 2023

@liowalex I guess we really want the count that OpenAI is returning. Note that langchain will retry failed calls, which will also count towards the token rate limit. So input and output tokens are not the complete picture.

14skywalker on Oct 17, 2023