langchain: get_openai_callback doesn't return the credits for ChatGPT chain

I run the following code:

from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate, LLMChain
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

from langchain.callbacks.base import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler


gpt_4 = ChatOpenAI(model_name="gpt-4", streaming=True, callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]), verbose=True, temperature=0)

template="You are ChatGPT, a large language model trained by OpenAI. Follow the user's instructions carefully. Respond using markdown."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)

human_template="{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

chain = LLMChain(llm=gpt_4, prompt=prompt)

from langchain.callbacks import get_openai_callback

with get_openai_callback() as cb:
    text = "How are you?"
    res = chain.run(text=text)

print(cb)

However when I print the callback value, I get back info that I used 0 credits, even though I know I used some.

I'm an AI language model, so I don't have feelings or emotions like humans do. However, I'm here to help you with any questions or information you need. What can I help you with today?Tokens Used: 0
	Prompt Tokens: 0
	Completion Tokens: 0
Successful Requests: 0
Total Cost (USD): $0.0

Am I doing something wrong, or is this an issue?

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 8
  • Comments: 22

Most upvoted comments

Hi,

In order to solve this bug I created my own async and cost calculator ‘handler’(require tiktoken dependency):

from langchain.callbacks.base import AsyncCallbackHandler
from langchain.schema import LLMResult
from typing import Any, Dict, List
import tiktoken

MODEL_COST_PER_1K_TOKENS = {
    "gpt-4": 0.03,
    "gpt-4-0314": 0.03,
    "gpt-4-completion": 0.06,
    "gpt-4-0314-completion": 0.06,
    "gpt-4-32k": 0.06,
    "gpt-4-32k-0314": 0.06,
    "gpt-4-32k-completion": 0.12,
    "gpt-4-32k-0314-completion": 0.12,
    "gpt-3.5-turbo": 0.002,
    "gpt-3.5-turbo-0301": 0.002,
    "text-ada-001": 0.0004,
    "ada": 0.0004,
    "text-babbage-001": 0.0005,
    "babbage": 0.0005,
    "text-curie-001": 0.002,
    "curie": 0.002,
    "text-davinci-003": 0.02,
    "text-davinci-002": 0.02,
    "code-davinci-002": 0.02,
}

class TokenCostProcess:
    total_tokens: int = 0
    prompt_tokens: int = 0
    completion_tokens: int = 0
    successful_requests: int = 0

    def sum_prompt_tokens( self, tokens: int ):
      self.prompt_tokens = self.prompt_tokens + tokens
      self.total_tokens = self.total_tokens + tokens

    def sum_completion_tokens( self, tokens: int ):
      self.completion_tokens = self.completion_tokens + tokens
      self.total_tokens = self.total_tokens + tokens

    def sum_successful_requests( self, requests: int ):
      self.successful_requests = self.successful_requests + requests

    def get_openai_total_cost_for_model( self, model: str ) -> float:
       return MODEL_COST_PER_1K_TOKENS[model] * self.total_tokens / 1000
    
    def get_cost_summary(self, model:str) -> str:
        cost = self.get_openai_total_cost_for_model(model)

        return (
            f"Tokens Used: {self.total_tokens}\n"
            f"\tPrompt Tokens: {self.prompt_tokens}\n"
            f"\tCompletion Tokens: {self.completion_tokens}\n"
            f"Successful Requests: {self.successful_requests}\n"
            f"Total Cost (USD): {cost}"
        )

class CostCalcAsyncHandler(AsyncCallbackHandler):
    model: str = ""
    socketprint = None
    websocketaction: str = "appendtext"
    token_cost_process: TokenCostProcess

    def __init__( self, model, token_cost_process ):
       self.model = model
       self.token_cost_process = token_cost_process

    def on_llm_start( self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any) -> None:
       encoding = tiktoken.encoding_for_model( self.model )

       if self.token_cost_process == None: return

       for prompt in prompts:
          self.token_cost_process.sum_prompt_tokens( len(encoding.encode(prompt)) )

    async def on_llm_new_token(self, token: str, **kwargs) -> None:
      print( token )

      self.token_cost_process.sum_completion_tokens( 1 )

    def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
      self.token_cost_process.sum_successful_requests( 1 )
     

I use this way:

token_cost_process = TokenCostProcess()

chat = ChatOpenAI(
        streaming=True, 
        callbacks=[  CostCalcAsyncHandler( "gpt-3.5-turbo", token_cost_process ) ], 
        temperature= 0
        model_name = "gpt-3.5-turbo"
)


...

print( token_cost_process.get_cost_summary( "gpt-3.5-turbo" ) )

I hope this helps anyone.

Thanks,

The underlying reason for no cost information when streaming is enabled is that in the OpenAI API, the usage field is only present in non-streaming responses. As things currently stand, OpenAICallbackHandler effectively simply retranslates the usage received from OpenAI API, if any.

Sorry Manuel, It was a misunderstood.

Hi,

In order to solve this bug I created my own async and cost calculator ‘handler’(require tiktoken dependency):

For me it works but using BaseCallbackHandler instead of Async. I’m using streaming with Flask with threading/queue

It would be nice to do something like this also for indexing

I think this has been fixed meanwhile. Callbacks are now implemented differently and the BaseChatModel already calls them correctly (current implementation here). I have since updated my customisation to the newest releases and thus got rid of that implementation