autogluon: Tabular: Fit on Windows, load on MacOS/Linux causes exception

Hi , I am using Autogluon Tabular predictor. The predictor is as follows:

predictor= TabularPredictor(
    label="label1",
    verbosity=4,
    problem_type="regression",
    path="agModels-predicttest1”,
    eval_metric='mean_absolute_error'
  ).fit(
    train_data=train_data_transformed.drop(
      ["label2", "label3", "label4", "label5"], axis=1),
    feature_generator=IdentityFeatureGenerator(),
    time_limit=3200,
    # visualizer='tensorboard',
    presets='medium_quality',   #'optimize_for_deployment', #'best_quality',
    hyperparameters={"GBM":{'num_boost_round': [ag.space.Int](http://ag.space.int/)(lower=50, upper=2000, default=100),
    'num_leaves': [ag.space.Int](http://ag.space.int/)(lower=64, upper=1024, default=256),
    'learning_rate': ag.space.Real(3e-2, 2e-1, default=1e-1,log=True),
    'max_depth':[ag.space.Int](http://ag.space.int/)(lower=6, upper=10, default=8),
    'early_stopping_round':10,'min_data_in_leaf': [ag.space.Int](http://ag.space.int/)(lower=10, upper=100, default=20)}},
    hyperparameter_tune_kwargs={'num_trials': 10,'searcher': 'auto','scheduler':'local'}
  )

With medium_quality presets and hyper parameter tuning, if I try to load the model after training as follows: predictor_test1=TabularPredictor.load("/home/hadoop/agModels-predicttest1/“) I get the following error:

Traceback (most recent call last):
 File "/tmp/1665687139040-0/zeppelin_python.py", line 158, in <module>
  exec(code, _zcUserQueryNameSpace)
 File "<stdin>", line 7, in <module>
 File "/usr/local/lib/python3.7/site-packages/autogluon/tabular/predictor/predictor.py", line 2901, in load
  predictor = cls._load(path=path)
 File "/usr/local/lib/python3.7/site-packages/autogluon/tabular/predictor/predictor.py", line 2833, in _load
  predictor._set_post_fit_vars(learner=learner)
 File "/usr/local/lib/python3.7/site-packages/autogluon/tabular/predictor/predictor.py", line 2787, in _set_post_fit_vars
  self._learner.persist_trainer(low_memory=True)
 File "/usr/local/lib/python3.7/site-packages/autogluon/tabular/learner/abstract_learner.py", line 704, in persist_trainer
  self.trainer = self.load_trainer()
 File "/usr/local/lib/python3.7/site-packages/autogluon/core/learner/abstract_learner.py", line 121, in load_trainer
  path=self.trainer_path, reset_paths=self.reset_paths
 File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 2315, in load
  obj.set_contexts(path)
 File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 214, in set_contexts
  self.path, model_paths = self.create_contexts(path_context)
 File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 224, in create_contexts
  model_local_path = prev_path.split(abs_path, 1)[1]
IndexError: list index out of range

However, it works fine if I use presets=Best_quality where auto_stack=True. I am not sure why medium_quality Presets with hyperparameter tuning are giving this error at the time of loading the weights.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 19 (1 by maintainers)

Most upvoted comments

@Innixma you bet, I will follow these steps and follow up with a reply. Thanks!

Thanks for providing the additional information @Alex-Wenner-FHR and @rsj123!

I have this on my radar and will see if it can be fixed for the upcoming v0.7 release.

You are doing some pretty advanced things here without providing the original data / code, so it is hard to assist. Please provide a reproducible example on Colab so we can help.

I also met this problem, I think it may cause this: in core\trainer\abstract_trainer.py line 223:

    def create_contexts(self, path_context: str) -> (str, dict):
        path = path_context
        model_paths = self.get_models_attribute_dict(attribute='path')
        for model, prev_path in model_paths.items():
            prev_path = os.path.abspath(prev_path) + os.path.sep
            abs_path = os.path.abspath(self.path) + os.path.sep
            model_local_path = prev_path.split(abs_path, 1)[1]
            new_path = path + model_local_path
            model_paths[model] = new_path

self.path is a relative path, in my case it is AutogluonModels/ag-20221128_161031/models/ And in some case working directory is not the code directory, so abspath will return a wrong path So, prev_path.split(abs_path, 1)[1] will get into error