ludwig: upload_to_hf_hub model path mismatch with model.save
Describe the bug
There seems to be a mismatch between how model.save()
and LudwigModel.upload_to_hf_hub()
handle model paths. A model saved into a directory with a custom name using model.save
cannot be uploaded to HF Hub using the same directory name.
To Reproduce Steps to reproduce the behavior:
- Fine-tune a LLM (I used the Zephyr fine-tuning example from Predibase)
- Save the fine-tuned model into a directory using
model.save("finetuned-model")
- Try to upload the model to HF Hub using
LudwigModel.upload_to_hf_hub(MY_HF_MODEL_NAME, "finetuned-model")
- See error:
File .../ludwig-finetune-llm/venv/lib/python3.11/site-packages/ludwig/utils/upload_utils.py:101, in BaseModelUpload._validate_upload_parameters(repo_id, model_path, repo_type, private, commit_message, commit_description)
99 trained_model_artifacts_path = os.path.join(model_path, "model", "model_weights")
100 if not os.path.exists(trained_model_artifacts_path):
--> 101 raise Exception(
102 f"Model artifacts not found at {trained_model_artifacts_path}. "
103 f"It is possible that model at '{model_path}' hasn't been trained yet, or something went"
104 "wrong during training where the model's weights were not saved."
105 )
Exception: Model artifacts not found at finetuned-model/model/model_weights. It is possible that model at 'finetuned-model' hasn't been trained yet, or something wentwrong during training where the model's weights were not saved.
Expected behavior
Expected the upload to succeed, since I gave the same path name, "finetuned-model"
, as a parameter both to model.save
and to LudwigModel.upload_to_hf_hub
.
Screenshots
n/a
Environment (please complete the following information):
- OS: Linux
- Version: AlmaLinux release 8.7 (Stone Smilodon)
- Python version 3.11.5
- Ludwig version 0.9.2
Additional context
After model.save
, this is the file and directory structure created under the finetuned-model
directory:
model_hyperparameters.json
training_set_metadata.json
model_weights/adapter_config.json
model_weights/adapter_model.safetensors
model_weights/README.md
But according to the error message, upload_to_hf_hub
is checking for the existence of finetuned-model/model/model_weights
. It doesn’t exist (there is no intermediate directory called model
) so this fails with the above error. Below is the relevant code. Note that model
is always added to the path on line 99. https://github.com/ludwig-ai/ludwig/blob/51f38c5cdac0139fc6d297f8e93752f7edc6d8ab/ludwig/utils/upload_utils.py#L98-L105
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Comments: 16 (7 by maintainers)
@osma Your criticism is absolutely valid. Last night (Pacific Time), @sanjaydasgupta and I had a call to brainstorm how this can be improved. Your filing this issue and continued comments have been very helpful, and so your participation going forward is welcome. One idea that @sanjaydasgupta is currently evaluating is: What if we just modify
_validate_upload_parameters()
inludwig/utils/upload_utils.py
, such that we do not hard code the path in line 102 (trained_model_artifacts_path = os.path.join(model_path, "model", "model_weights")
) but instead only usemodel_path
– and then enhance the validation so as to make sure that the files we need from the subdirectories model and model_weights are there. In fact, we already have the validation for themodel_weights
directory contents; we would just need to add the validation for the directory one above (i.e.,model
) for things that we care about, such as the hyperparameter configuration, which @sanjaydasgupta added last week. Then anymodel_path
would work as long as its contents pass this validation; in this case, modifying tests would be acceptable. What do you both think? Thank you.@sanjaydasgupta and @alexsherstinsky met to discuss this and agreed that there is a design issue that needs to be studied deeper, potentially affecting “ludwig/utils/upload_utils.py” and modules that use it.
After poking around a bit in the code-base here is what I learned:
Changing the implementation of Ludwig.api.LudwigModel.save() so that it first suffixes a literal
/model
to the providedsave_path
(an intuitive but naive solution) breaks 7 or 8 tests. Inspection of those tests reveals thatsave()
has a twin – predictably namedload()
. Also,save()
andload()
are designed to play well together as is.Therefore, changing
save()
so that it works more intuitively withupload_to_hf_hub()
will entail changes toload()
and possibly other things – probably be a lot of work. One alternative to save that effort is to improve the documentation of save() to clearly state that thesave_path
provided must end in amodel
. That same save_path can then be passed toupload_to_hf_hub()
successfully.@alexsherstinsky please let me know what you think.
@sanjaydasgupta Has graciously agreed to work on this issue.
Thanks for reporting the issue @osma and I’m glad you were able to find a workaround. @alexsherstinsky Will be taking a look at it over the next few days
FWIF, I was able to work around the problem by creating the directory
finetuned-model/model
and moving all files below that directory.