ludwig: upload_to_hf_hub model path mismatch with model.save

Describe the bug

There seems to be a mismatch between how model.save() and LudwigModel.upload_to_hf_hub() handle model paths. A model saved into a directory with a custom name using model.save cannot be uploaded to HF Hub using the same directory name.

To Reproduce Steps to reproduce the behavior:

  1. Fine-tune a LLM (I used the Zephyr fine-tuning example from Predibase)
  2. Save the fine-tuned model into a directory using model.save("finetuned-model")
  3. Try to upload the model to HF Hub using LudwigModel.upload_to_hf_hub(MY_HF_MODEL_NAME, "finetuned-model")
  4. See error:
File .../ludwig-finetune-llm/venv/lib/python3.11/site-packages/ludwig/utils/upload_utils.py:101, in BaseModelUpload._validate_upload_parameters(repo_id, model_path, repo_type, private, commit_message, commit_description)
     99 trained_model_artifacts_path = os.path.join(model_path, "model", "model_weights")
    100 if not os.path.exists(trained_model_artifacts_path):
--> 101     raise Exception(
    102         f"Model artifacts not found at {trained_model_artifacts_path}. "
    103         f"It is possible that model at '{model_path}' hasn't been trained yet, or something went"
    104         "wrong during training where the model's weights were not saved."
    105     )

Exception: Model artifacts not found at finetuned-model/model/model_weights. It is possible that model at 'finetuned-model' hasn't been trained yet, or something wentwrong during training where the model's weights were not saved.

Expected behavior

Expected the upload to succeed, since I gave the same path name, "finetuned-model", as a parameter both to model.save and to LudwigModel.upload_to_hf_hub.

Screenshots

n/a

Environment (please complete the following information):

  • OS: Linux
  • Version: AlmaLinux release 8.7 (Stone Smilodon)
  • Python version 3.11.5
  • Ludwig version 0.9.2

Additional context

After model.save, this is the file and directory structure created under the finetuned-model directory:

model_hyperparameters.json
training_set_metadata.json
model_weights/adapter_config.json
model_weights/adapter_model.safetensors
model_weights/README.md

But according to the error message, upload_to_hf_hub is checking for the existence of finetuned-model/model/model_weights. It doesn’t exist (there is no intermediate directory called model) so this fails with the above error. Below is the relevant code. Note that model is always added to the path on line 99. https://github.com/ludwig-ai/ludwig/blob/51f38c5cdac0139fc6d297f8e93752f7edc6d8ab/ludwig/utils/upload_utils.py#L98-L105

About this issue

  • Original URL
  • State: closed
  • Created 5 months ago
  • Comments: 16 (7 by maintainers)

Most upvoted comments

@osma Your criticism is absolutely valid. Last night (Pacific Time), @sanjaydasgupta and I had a call to brainstorm how this can be improved. Your filing this issue and continued comments have been very helpful, and so your participation going forward is welcome. One idea that @sanjaydasgupta is currently evaluating is: What if we just modify _validate_upload_parameters() in ludwig/utils/upload_utils.py, such that we do not hard code the path in line 102 (trained_model_artifacts_path = os.path.join(model_path, "model", "model_weights")) but instead only use model_path – and then enhance the validation so as to make sure that the files we need from the subdirectories model and model_weights are there. In fact, we already have the validation for the model_weights directory contents; we would just need to add the validation for the directory one above (i.e., model) for things that we care about, such as the hyperparameter configuration, which @sanjaydasgupta added last week. Then any model_path would work as long as its contents pass this validation; in this case, modifying tests would be acceptable. What do you both think? Thank you.

@sanjaydasgupta and @alexsherstinsky met to discuss this and agreed that there is a design issue that needs to be studied deeper, potentially affecting “ludwig/utils/upload_utils.py” and modules that use it.

After poking around a bit in the code-base here is what I learned:

Changing the implementation of Ludwig.api.LudwigModel.save() so that it first suffixes a literal /model to the provided save_path (an intuitive but naive solution) breaks 7 or 8 tests. Inspection of those tests reveals that save() has a twin – predictably named load(). Also, save() and load() are designed to play well together as is.

Therefore, changing save() so that it works more intuitively with upload_to_hf_hub() will entail changes to load() and possibly other things – probably be a lot of work. One alternative to save that effort is to improve the documentation of save() to clearly state that the save_path provided must end in a model. That same save_path can then be passed to upload_to_hf_hub() successfully.

@alexsherstinsky please let me know what you think.

@sanjaydasgupta Has graciously agreed to work on this issue.

Thanks for reporting the issue @osma and I’m glad you were able to find a workaround. @alexsherstinsky Will be taking a look at it over the next few days

FWIF, I was able to work around the problem by creating the directory finetuned-model/model and moving all files below that directory.