openai-python: Uploading JSON to Files API returns invalid file format

Upload to the files endpoint with a JSON file throws an error

Code:

from openai import OpenAI

client = OpenAI()
file = client.files.create(
    file=open("example_1.json", "rb"),
    # Can either be fine-tuned or assistant
    purpose="assistants",
)

Stacktrace:

ile [~/anaconda3/lib/python3.10/site-packages/openai/resources/files.py:88](https://file+.vscode-resource.vscode-cdn.net/Users/vashishtmadhavan/Documents/playground/~/anaconda3/lib/python3.10/site-packages/openai/resources/files.py:88), in Files.create(self, file, purpose, extra_headers, extra_query, extra_body, timeout)
     82 if files:
     83     # It should be noted that the actual Content-Type header that will be
     84     # sent to the server will contain a `boundary` parameter, e.g.
     85     # multipart/form-data; boundary=---abc--
     86     extra_headers = {"Content-Type": "multipart/form-data", **(extra_headers or {})}
---> 88 return self._post(
     89     "[/files](https://file+.vscode-resource.vscode-cdn.net/files)",
     90     body=maybe_transform(body, file_create_params.FileCreateParams),
     91     files=files,
     92     options=make_request_options(
     93         extra_headers=extra_headers, extra_query=extra_query, extra_body=extra_body, timeout=timeout
     94     ),
     95     cast_to=FileObject,
     96 )

File [~/anaconda3/lib/python3.10/site-packages/openai/_base_client.py:1055](https://file+.vscode-resource.vscode-cdn.net/Users/vashishtmadhavan/Documents/playground/~/anaconda3/lib/python3.10/site-packages/openai/_base_client.py:1055), in SyncAPIClient.post(self, path, cast_to, body, options, files, stream, stream_cls)
   1041 def post(
   1042     self,
   1043     path: str,
   (...)
   1050     stream_cls: type[_StreamT] | None = None,
   1051 ) -> ResponseT | _StreamT:
   1052     opts = FinalRequestOptions.construct(
   1053         method="post", url=path, json_data=body, files=to_httpx_files(files), **options
   1054     )
-> 1055     return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))

File [~/anaconda3/lib/python3.10/site-packages/openai/_base_client.py:834](https://file+.vscode-resource.vscode-cdn.net/Users/vashishtmadhavan/Documents/playground/~/anaconda3/lib/python3.10/site-packages/openai/_base_client.py:834), in SyncAPIClient.request(self, cast_to, options, remaining_retries, stream, stream_cls)
    825 def request(
    826     self,
    827     cast_to: Type[ResponseT],
   (...)
    832     stream_cls: type[_StreamT] | None = None,
    833 ) -> ResponseT | _StreamT:
--> 834     return self._request(
    835         cast_to=cast_to,
    836         options=options,
    837         stream=stream,
    838         stream_cls=stream_cls,
    839         remaining_retries=remaining_retries,
    840     )

File [~/anaconda3/lib/python3.10/site-packages/openai/_base_client.py:877](https://file+.vscode-resource.vscode-cdn.net/Users/vashishtmadhavan/Documents/playground/~/anaconda3/lib/python3.10/site-packages/openai/_base_client.py:877), in SyncAPIClient._request(self, cast_to, options, remaining_retries, stream, stream_cls)
    874     # If the response is streamed then we need to explicitly read the response
    875     # to completion before attempting to access the response text.
    876     err.response.read()
--> 877     raise self._make_status_error_from_response(err.response) from None
    878 except httpx.TimeoutException as err:
    879     if retries > 0:

BadRequestError: Error code: 400 - {'error': {'message': "Invalid file format. Supported formats: ['c', 'cpp', 'csv', 'docx', 'html', 'java', 'json', 'md', 'pdf', 'php', 'pptx', 'py', 'rb', 'tex', 'txt', 'css', 'jpeg', 'jpg', 'js', 'gif', 'png', 'tar', 'ts', 'xlsx', 'xml', 'zip']", 'type': 'invalid_request_error', 'param': None, 'code': None}}

Here is the example file: example_1.json

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Reactions: 2
  • Comments: 15 (2 by maintainers)

Most upvoted comments

@kennymatic , I believe as you do that in this case size matters. @albertaleksieiev also seems to be correct (i.e. send an invalid JSON formatted file and it will work.)

I found to be routinely successful uploading a 10kb json file. In a small file (i.e. < 1kb) I replaced the open and close square brackets with curly brackets, then saved. File uploaded successfully.

It’s one of those, “Are you kidding me?” bugs.

BTW, ChatGPT can’t help solve this bug. I asked for help repeatedly even after providing documentation from: https://github.com/openai/openai-python https://github.com/openai/openai-python/tree/main#file-uploads

OK I think I finally found a pattern, at least for our case. Valid JSON files upload fine so long as they are over 1025 bytes in size. If they are 1025 bytes or under, they will also work if you make them invalid by making them invalid as @albertaleksieiev suggested.

The same issue, just try to send incorrect JSON file, and it will work 😉

I was really hoping you were kidding about this. I’ve been banging my head over this for the past 5 hours. I added garbage to the beginning of my file and now it works. 🤦🏻‍♂️

I don’t even understand how this can be possible on their back end.

BTW, a stupid but successful work-around is to create an initial entry in the JSON file with enough content that will make the file size exceed the 1kb that @kennymatic mentioned. I put in the following: [ { “id”: “msg_0001”, “role”: “user”, “content”: "\n This messgage is to create and initialize the JSON file with enough file size that OpenAI will upload it.\n A JSON (JavaScript Object Notation) file is a lightweight data interchange format that is easy for humans to read and write,\n and easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language Standard ECMA-262 3rd Edition - December 1999.\n JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages,\n including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.\n The structure of a JSON file is simple yet flexible. It represents data in a text format consisting of key-value pairs,\n making it analogous to a dictionary in Python or an object in JavaScript.\n These key-value pairs are enclosed in curly braces, with the key being a string and the value being a valid JSON data type such as\n a string, number, array, or even another JSON object.\n This hierarchical structure allows for the representation of complex data in an organized and hierarchical manner,\n which is particularly useful in web applications for data exchange between a client and a server, as well as in many other\n programming contexts where data needs to be stored or transmitted in a structured format.\n " }, { “id”: “msg_0002”, “role”: “user”, “content”: "\n We need to have a meeting of everyone within the company. Our strategic goal is to raise revenue by 10%.\n Your mission is to discuss amongst yourself to provide a consensus suggestion of the top five initiatives the company should take.\n When the mission has been completed provide the five initiatives and your reasonsing for each.\n " } ]

This is basically what we did except we got the difference in length to get up to the min characters. Then we added a field to the JSON and filled it with spaces to get us up to the min limit. 😬