ludwig: upload_to_hf_hub model path mismatch with model.save
Describe the bug
There seems to be a mismatch between how model.save() and LudwigModel.upload_to_hf_hub() handle model paths. A model saved into a directory with a custom name using model.save cannot be uploaded to HF Hub using the same directory name.
To Reproduce Steps to reproduce the behavior:
- Fine-tune a LLM (I used the Zephyr fine-tuning example from Predibase)
- Save the fine-tuned model into a directory using
model.save("finetuned-model") - Try to upload the model to HF Hub using
LudwigModel.upload_to_hf_hub(MY_HF_MODEL_NAME, "finetuned-model") - See error:
File .../ludwig-finetune-llm/venv/lib/python3.11/site-packages/ludwig/utils/upload_utils.py:101, in BaseModelUpload._validate_upload_parameters(repo_id, model_path, repo_type, private, commit_message, commit_description)
99 trained_model_artifacts_path = os.path.join(model_path, "model", "model_weights")
100 if not os.path.exists(trained_model_artifacts_path):
--> 101 raise Exception(
102 f"Model artifacts not found at {trained_model_artifacts_path}. "
103 f"It is possible that model at '{model_path}' hasn't been trained yet, or something went"
104 "wrong during training where the model's weights were not saved."
105 )
Exception: Model artifacts not found at finetuned-model/model/model_weights. It is possible that model at 'finetuned-model' hasn't been trained yet, or something wentwrong during training where the model's weights were not saved.
Expected behavior
Expected the upload to succeed, since I gave the same path name, "finetuned-model", as a parameter both to model.save and to LudwigModel.upload_to_hf_hub.
Screenshots
n/a
Environment (please complete the following information):
- OS: Linux
- Version: AlmaLinux release 8.7 (Stone Smilodon)
- Python version 3.11.5
- Ludwig version 0.9.2
Additional context
After model.save, this is the file and directory structure created under the finetuned-model directory:
model_hyperparameters.json
training_set_metadata.json
model_weights/adapter_config.json
model_weights/adapter_model.safetensors
model_weights/README.md
But according to the error message, upload_to_hf_hub is checking for the existence of finetuned-model/model/model_weights. It doesn’t exist (there is no intermediate directory called model) so this fails with the above error. Below is the relevant code. Note that model is always added to the path on line 99. https://github.com/ludwig-ai/ludwig/blob/51f38c5cdac0139fc6d297f8e93752f7edc6d8ab/ludwig/utils/upload_utils.py#L98-L105
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Comments: 16 (7 by maintainers)
@osma Your criticism is absolutely valid. Last night (Pacific Time), @sanjaydasgupta and I had a call to brainstorm how this can be improved. Your filing this issue and continued comments have been very helpful, and so your participation going forward is welcome. One idea that @sanjaydasgupta is currently evaluating is: What if we just modify
_validate_upload_parameters()inludwig/utils/upload_utils.py, such that we do not hard code the path in line 102 (trained_model_artifacts_path = os.path.join(model_path, "model", "model_weights")) but instead only usemodel_path– and then enhance the validation so as to make sure that the files we need from the subdirectories model and model_weights are there. In fact, we already have the validation for themodel_weightsdirectory contents; we would just need to add the validation for the directory one above (i.e.,model) for things that we care about, such as the hyperparameter configuration, which @sanjaydasgupta added last week. Then anymodel_pathwould work as long as its contents pass this validation; in this case, modifying tests would be acceptable. What do you both think? Thank you.@sanjaydasgupta and @alexsherstinsky met to discuss this and agreed that there is a design issue that needs to be studied deeper, potentially affecting “ludwig/utils/upload_utils.py” and modules that use it.
After poking around a bit in the code-base here is what I learned:
Changing the implementation of Ludwig.api.LudwigModel.save() so that it first suffixes a literal
/modelto the providedsave_path(an intuitive but naive solution) breaks 7 or 8 tests. Inspection of those tests reveals thatsave()has a twin – predictably namedload(). Also,save()andload()are designed to play well together as is.Therefore, changing
save()so that it works more intuitively withupload_to_hf_hub()will entail changes toload()and possibly other things – probably be a lot of work. One alternative to save that effort is to improve the documentation of save() to clearly state that thesave_pathprovided must end in amodel. That same save_path can then be passed toupload_to_hf_hub()successfully.@alexsherstinsky please let me know what you think.
@sanjaydasgupta Has graciously agreed to work on this issue.
Thanks for reporting the issue @osma and I’m glad you were able to find a workaround. @alexsherstinsky Will be taking a look at it over the next few days
FWIF, I was able to work around the problem by creating the directory
finetuned-model/modeland moving all files below that directory.