llama_index: [Bug]: Asking for OpenAI key after loading index from disc.

Bug Description

I am using AzureOpenAI embedding and llm which is set in ServiceContext. Once the index is generated, it is persisted to a local folder. While loading index from storage, it asks for OpenAI key when I have already configured it to use AzureOpenAI as the llm. The issue occurs on line 47. It works fine when the index is created for the first time.

Screenshot 2023-10-31 at 8 13 46 PM

Version

0.8.56

Steps to Reproduce

Run the file. Index is created fine using AzureOpenAi llm and embeddings. When run again, after loading index from storage, the issue occurs.

Relevant Logs/Tracbacks

Error Log:
Traceback (most recent call last):
  File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/llms/utils.py", line 20, in resolve_llm
    llm = OpenAI()
          ^^^^^^^^
  File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/llms/openai.py", line 92, in __init__
    api_key, api_type, api_base, api_version = resolve_openai_credentials(
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/llms/openai_utils.py", line 297, in resolve_openai_credentials
    raise ValueError(MISSING_API_KEY_ERROR_MESSAGE)
ValueError: No API key found for OpenAI.
Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization.
API keys can be found or created at https://platform.openai.com/account/api-keys


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/anurag/Anurag Work/Coding/everything-ai/llmaindex-pdf/main.py", line 47, in <module>
    index = load_index_from_storage(storage_context)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/indices/loading.py", line 33, in load_index_from_storage
    indices = load_indices_from_storage(storage_context, index_ids=index_ids, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/indices/loading.py", line 78, in load_indices_from_storage
    index = index_cls(
            ^^^^^^^^^^
  File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/indices/vector_store/base.py", line 49, in __init__
    super().__init__(
  File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/indices/base.py", line 61, in __init__
    self._service_context = service_context or ServiceContext.from_defaults()
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/indices/service_context.py", line 157, in from_defaults
    llm_predictor = llm_predictor or LLMPredictor(
                                     ^^^^^^^^^^^^^
  File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/llm_predictor/base.py", line 92, in __init__
    self._llm = resolve_llm(llm)
                ^^^^^^^^^^^^^^^^
  File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/llms/utils.py", line 22, in resolve_llm
    raise ValueError(
ValueError: ******
Could not load OpenAI model. If you intended to use OpenAI, please check your OPENAI_API_KEY.
Original error:
No API key found for OpenAI.
Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization.
API keys can be found or created at https://platform.openai.com/account/api-keys

To disable the LLM entirely, set llm=None.
******

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Comments: 15

Most upvoted comments

🤖

Based on the error log you provided, it seems that the OpenAI API key is not being found. However, since you’re using AzureOpenAI, you need to ensure that the following environment variables are set correctly:

  1. OPENAI_API_TYPE: set this to azure, azure_ad, or azuread
  2. OPENAI_API_VERSION: set this to 2023-05-15 (This may change in the future.)
  3. OPENAI_API_BASE: your endpoint should look like the following https://YOUR_RESOURCE_NAME.openai.azure.com/
  4. OPENAI_API_KEY: your API key

You can set these environment variables using the export command in your terminal:

export OPENAI_API_TYPE='azure'
export OPENAI_API_VERSION='2023-05-15'
export OPENAI_API_BASE='https://YOUR_RESOURCE_NAME.openai.azure.com/'
export OPENAI_API_KEY='your-api-key'

Please replace 'your-api-key' and 'YOUR_RESOURCE_NAME' with your actual Azure OpenAI API key and resource name respectively.

If you’re sure that all these environment variables are set correctly, then the issue might be elsewhere. In that case, please provide more details about the error and the function call that leads to it.

In the context shared, it’s also important to note that the OpenAI class in the LlamaIndex library uses the resolve_openai_credentials function to set the API key, API type, API base, and API version. If you’re using this class directly, you can also pass these values as arguments when creating an instance of the class.

The resolve_openai_credentials function in the LlamaIndex codebase is used to determine the OpenAI credentials. It takes four optional parameters: api_key, api_type, api_base, and api_version. The function follows a specific order of precedence to resolve these parameters: it first checks if the parameters are provided directly to the function, then checks if the parameters are set in the environment variables, then checks if they are set in the openai module, and finally uses the default values if they are not found in any of the previous steps. If no api_key is found by following the above steps, it raises a ValueError.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

That error just means you have to enable nesting async

import nest_asyncio

nest_asyncio.apply()

Going to close this issue for now 😃

🤖

Hello @AnuragSa,

I’m here to assist you with your issue. I’m an AI developed by OpenAI and I’m here to help you with your questions and issues related to the LlamaIndex repository.

From the error logs you’ve shared, it seems like the LlamaIndex is asking for the OpenAI key again when loading the index from storage because the OpenAI key is not stored in the index. Instead, it is resolved at runtime using the resolve_openai_credentials function. This function checks for the OpenAI key in a specific order: as a parameter to the function, in the environment variable OPENAI_API_KEY, or in the openai module. If none of these are set, it raises a ValueError.

The AzureOpenAI class, which is a subclass of LLM, also requires the OpenAI key to be provided at runtime. It validates the necessary credentials, including the OpenAI key, in the __init__ method. If the required environment variables are not set, it raises a ValueError.

So, even if the LlamaIndex is configured to use AzureOpenAI, it still needs the OpenAI key and other AzureOpenAI specific parameters to be provided at runtime, either as a parameter, an environment variable, or set in the openai module.

I found some similar issues in the LlamaIndex repository that might be helpful:

  1. [Question]: When customize the LLM, using VectorStoreIndex, it reports error
  2. [Bug]: Exception condition leads to download of an llm
  3. [Bug]: Setting Palm LLM Abstraction to ServiceContext still looks for OpenAI API Key

These issues suggest that both the LLM and the embedding model need to be set in the service context to avoid using the OpenAI model. A custom embedding model class may need to be implemented to handle tokenization.

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Best, Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Having looked into the details of the implementation @Disiok almost had the correct answer to this problem:

try index_from_storage = load_index_from_storage(storage_context, service_context)

TL;DR: put the keyword in front of the service_context, because it’s not a default parameter for the load_index_from_storage i.e.:

index_from_storage = load_index_from_storage(
    storage_context=storage_context,
    service_context=service_context
)

this should load the index from disk without needing any OpenAI key, however notably make sure you set up the ServiceContext with the correct embedding model, otherwise you won’t be able to look up things correctly.

Detailed Explanation: Looking specifically at the code for the load_index_from_storage, the storage_context and index_id are the two arguments you can pass in without needing the keywords passed in as well. So the issue with the code before was that since you weren’t explicitly calling out that the service_context was the service_context the function assumed it was the index_id, which is not hashable, meaning you can’t use it to look up the index in a dictionary, ergo the error @scorpio0101 was seeing. The code for load_index_from_storage actually invokes load_indices_from_storage with all of the same kwargs but only returns the first result. Digging a little deeper it’s clear that at one point the persisted index is loaded into a new instance of the index class, this is where the **kwargs can come be defined, otherwise the constructor will assume the defaults, which is why if you don’t pass an explicit instance of the ServiceContext with the correct embed_model it will go looking for the OPENAI_API_KEY in your environment. This could be mitigated if in future versions llama_index, if during the index saving step the ServiceContext was saved as well, however there are a lot of issues with trying to seriously do this (allowing for custom index types, and custom LLMs makes this a tough). My recommendation if you are seriously trying to persist this, is to save a JSON that has all of the information regarding the embedding model and other information so that you can pass those in programmatically in the future and save it as a log.

I found out that by adding the following line after creating a ServiceContext solves the problem. In my original screenshot, that would be line #39.

set_global_service_context(service_context)

https://gpt-index.readthedocs.io/en/v0.6.28/how_to/customization/service_context.html

Unfortunately, its not intuitive and there aren’t too many examples but it seems to be working now. It would have been great if there was a way to provide service context in load_index_from_storage (or one of its overloads) itself.