llama_index: [Bug]: Asking for OpenAI key after loading index from disc.
Bug Description
I am using AzureOpenAI embedding and llm which is set in ServiceContext. Once the index is generated, it is persisted to a local folder. While loading index from storage, it asks for OpenAI key when I have already configured it to use AzureOpenAI as the llm. The issue occurs on line 47. It works fine when the index is created for the first time.
Version
0.8.56
Steps to Reproduce
Run the file. Index is created fine using AzureOpenAi llm and embeddings. When run again, after loading index from storage, the issue occurs.
Relevant Logs/Tracbacks
Error Log:
Traceback (most recent call last):
File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/llms/utils.py", line 20, in resolve_llm
llm = OpenAI()
^^^^^^^^
File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/llms/openai.py", line 92, in __init__
api_key, api_type, api_base, api_version = resolve_openai_credentials(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/llms/openai_utils.py", line 297, in resolve_openai_credentials
raise ValueError(MISSING_API_KEY_ERROR_MESSAGE)
ValueError: No API key found for OpenAI.
Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization.
API keys can be found or created at https://platform.openai.com/account/api-keys
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/anurag/Anurag Work/Coding/everything-ai/llmaindex-pdf/main.py", line 47, in <module>
index = load_index_from_storage(storage_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/indices/loading.py", line 33, in load_index_from_storage
indices = load_indices_from_storage(storage_context, index_ids=index_ids, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/indices/loading.py", line 78, in load_indices_from_storage
index = index_cls(
^^^^^^^^^^
File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/indices/vector_store/base.py", line 49, in __init__
super().__init__(
File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/indices/base.py", line 61, in __init__
self._service_context = service_context or ServiceContext.from_defaults()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/indices/service_context.py", line 157, in from_defaults
llm_predictor = llm_predictor or LLMPredictor(
^^^^^^^^^^^^^
File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/llm_predictor/base.py", line 92, in __init__
self._llm = resolve_llm(llm)
^^^^^^^^^^^^^^^^
File "/Users/anurag/Anurag Work/Coding/everything-ai/.venv/lib/python3.11/site-packages/llama_index/llms/utils.py", line 22, in resolve_llm
raise ValueError(
ValueError: ******
Could not load OpenAI model. If you intended to use OpenAI, please check your OPENAI_API_KEY.
Original error:
No API key found for OpenAI.
Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization.
API keys can be found or created at https://platform.openai.com/account/api-keys
To disable the LLM entirely, set llm=None.
******
About this issue
- Original URL
- State: closed
- Created 8 months ago
- Comments: 15
🤖
Based on the error log you provided, it seems that the OpenAI API key is not being found. However, since you’re using AzureOpenAI, you need to ensure that the following environment variables are set correctly:
OPENAI_API_TYPE: set this toazure,azure_ad, orazureadOPENAI_API_VERSION: set this to2023-05-15(This may change in the future.)OPENAI_API_BASE: your endpoint should look like the followinghttps://YOUR_RESOURCE_NAME.openai.azure.com/OPENAI_API_KEY: your API keyYou can set these environment variables using the
exportcommand in your terminal:Please replace
'your-api-key'and'YOUR_RESOURCE_NAME'with your actual Azure OpenAI API key and resource name respectively.If you’re sure that all these environment variables are set correctly, then the issue might be elsewhere. In that case, please provide more details about the error and the function call that leads to it.
In the context shared, it’s also important to note that the OpenAI class in the LlamaIndex library uses the
resolve_openai_credentialsfunction to set the API key, API type, API base, and API version. If you’re using this class directly, you can also pass these values as arguments when creating an instance of the class.The
resolve_openai_credentialsfunction in the LlamaIndex codebase is used to determine the OpenAI credentials. It takes four optional parameters:api_key,api_type,api_base, andapi_version. The function follows a specific order of precedence to resolve these parameters: it first checks if the parameters are provided directly to the function, then checks if the parameters are set in the environment variables, then checks if they are set in theopenaimodule, and finally uses the default values if they are not found in any of the previous steps. If noapi_keyis found by following the above steps, it raises aValueError.I hope this helps! If you have any other questions or need further clarification, feel free to ask.
Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
That error just means you have to enable nesting async
Going to close this issue for now 😃
🤖
Hello @AnuragSa,
I’m here to assist you with your issue. I’m an AI developed by OpenAI and I’m here to help you with your questions and issues related to the LlamaIndex repository.
From the error logs you’ve shared, it seems like the LlamaIndex is asking for the OpenAI key again when loading the index from storage because the OpenAI key is not stored in the index. Instead, it is resolved at runtime using the
resolve_openai_credentialsfunction. This function checks for the OpenAI key in a specific order: as a parameter to the function, in the environment variableOPENAI_API_KEY, or in theopenaimodule. If none of these are set, it raises aValueError.The AzureOpenAI class, which is a subclass of LLM, also requires the OpenAI key to be provided at runtime. It validates the necessary credentials, including the OpenAI key, in the
__init__method. If the required environment variables are not set, it raises aValueError.So, even if the LlamaIndex is configured to use AzureOpenAI, it still needs the OpenAI key and other AzureOpenAI specific parameters to be provided at runtime, either as a parameter, an environment variable, or set in the
openaimodule.I found some similar issues in the LlamaIndex repository that might be helpful:
These issues suggest that both the LLM and the embedding model need to be set in the service context to avoid using the OpenAI model. A custom embedding model class may need to be implemented to handle tokenization.
I hope this helps! If you have any more questions or need further clarification, feel free to ask.
Best, Dosu
Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Having looked into the details of the implementation @Disiok almost had the correct answer to this problem:
TL;DR: put the keyword in front of the service_context, because it’s not a default parameter for the
load_index_from_storagei.e.:this should load the index from disk without needing any OpenAI key, however notably make sure you set up the ServiceContext with the correct embedding model, otherwise you won’t be able to look up things correctly.
Detailed Explanation: Looking specifically at the code for the
load_index_from_storage, thestorage_contextandindex_idare the two arguments you can pass in without needing the keywords passed in as well. So the issue with the code before was that since you weren’t explicitly calling out that the service_context was the service_context the function assumed it was the index_id, which is not hashable, meaning you can’t use it to look up the index in a dictionary, ergo the error @scorpio0101 was seeing. The code forload_index_from_storageactually invokesload_indices_from_storagewith all of the same kwargs but only returns the first result. Digging a little deeper it’s clear that at one point the persisted index is loaded into a new instance of the index class, this is where the**kwargscan come be defined, otherwise the constructor will assume the defaults, which is why if you don’t pass an explicit instance of the ServiceContext with the correctembed_modelit will go looking for theOPENAI_API_KEYin your environment. This could be mitigated if in future versionsllama_index, if during the index saving step the ServiceContext was saved as well, however there are a lot of issues with trying to seriously do this (allowing for custom index types, and custom LLMs makes this a tough). My recommendation if you are seriously trying to persist this, is to save a JSON that has all of the information regarding the embedding model and other information so that you can pass those in programmatically in the future and save it as a log.I found out that by adding the following line after creating a ServiceContext solves the problem. In my original screenshot, that would be line #39.
set_global_service_context(service_context)https://gpt-index.readthedocs.io/en/v0.6.28/how_to/customization/service_context.html
Unfortunately, its not intuitive and there aren’t too many examples but it seems to be working now. It would have been great if there was a way to provide service context in load_index_from_storage (or one of its overloads) itself.