wandb: [CLI]: seep agent fails when running sweeps with 'None' values for sweep parameters

Describe the bug

It looks like recent versions of wandb, definitely wandb==0.12.19, fail when one of the sweep parameters is None.

Note that None values are very common, i.e. for hyperparameters of many models in scikit-learn. For example, this sweep uses None for the value of the max_features hyperparameter in (sklearn.ensemble.GradientBoostingClassifier)[https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html]

The affected sweeps all work fine with wandb==0.12.15. The workaround for me was to downgrade to that, for now.

Sorry, can’t provide a sweep to reproduce. Create any grid search sweep where one of the hyperparameter values takes a value of ‘None’.


2022-07-29 19:24:15,782 - wandb.wandb_agent - ERROR - Exception while processing command: {'run_id': '<redacted>', 'program': 'scripts/train.py', 'type': 'run', 'args': <...> 'max_features': {'value': None},  

......

Traceback (most recent call last):
  File "/home/<redacted>/python3.6/site-packages/wandb/wandb_agent.py", line 299, in _process_command
    result = self._command_run(command)
  File "/home/<redacted>/python3.6/site-packages/wandb/wandb_agent.py", line 409, in _command_run
    sweep_vars: Dict[str, Any] = Agent._create_command_args(command)
  File "/home/<redacted>/python3.6/site-packages/wandb/wandb_agent.py", line 342, in _create_command_args
    raise ValueError('No "value" found for command["args"]["%s"]' % param)
ValueError: No "value" found for command["args"]["max_features"]

Additional Files

No response

Environment

WandB version: 0.12.19

OS: linux

Python version: 3.6.8

Versions of relevant libraries:

Additional Context

No response

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 7
  • Comments: 23 (6 by maintainers)

Most upvoted comments

Thanks! That’s exactly what I needed. I’m going to ticket this out internally right now to be resolved by our engineering team and I will keep you updated on the status of this issue.

Thanks, Ramit

I was able to identify the issue. The problem stems from the fact that the sweep agent is sending the parameter None over the command line. This turns None into a string “None”.

Bump. One of my RNN type parameters is ‘None’. I have trained 70 models with it manually and if I now want to start a sweep I have to set the parameter in the config to ‘null’, which is already counterintuitive for the sweep to be able to recognize the models that were trained with ‘None’.

sweep config:
method: bayes
metric:
  goal: maximize
  name: val_auc
parameters:
  rnn_type:
    distribution: categorical
    values:
      - null
      - LSTM
      - GRU

Code example:

 if self.params["rnn_type"] != None:
                if self.params["rnn_type"] == "LSTM":
                    rnn = LSTM
                elif self.params["rnn_type"] == "GRU":
                    rnn = GRU
                else:
                    raise ValueError("rnn_type '{}' not supported.".format(self.params["rnn_type"]))
                for x in range(self.params["rnn_num"]-1):
                    self._add_rnn_layer(rnn, True, x)
                self._add_rnn_layer(rnn, False, self.params["rnn_num"]-1)
            else:
                self.cnn = Flatten()(self.cnn)

I turned off the ‘None’ check in wandb_agent.py to see what is going on (because this None check get triggered when running the sweep):

  for param, config in command["args"].items():
            _value: Any = config.get("value", None)
            #if _value is None:
            #    raise ValueError('No "value" found for command["args"]["%s"]' % param)
            _flag: str = f"{param}={_value}"

now running a sweep produces this error:

Traceback (most recent call last):
  File "/home/profts/P09/scripts/MYOD/train.py", line 77, in <module>
    model, summary = searcher.train(data, verbose=False)
  File "/home/profts/.conda/envs/pysster/lib/python3.9/site-packages/pysster/Grid_Search.py", line 82, in train
    model = Model(candidate, data)
  File "/home/profts/.conda/envs/pysster/lib/python3.9/site-packages/pysster/Model.py", line 139, in __init__
    self._prepare_model()
  File "/home/profts/.conda/envs/pysster/lib/python3.9/site-packages/pysster/Model.py", line 630, in _prepare_model
    raise ValueError("rnn_type '{}' not supported.".format(self.params["rnn_type"]))
ValueError: rnn_type 'None' not supported. 

which shows that the ‘None’ send by wandb is not the same as the ‘None’ used in python as it passes self.params["rnn_type"] != None:

This makes it almost impossible to use ‘None’ as a parameter in python models.

What is really confusing is the fact that somewhere between checking the parameters in wandb and handing the parameters to the training script the ‘None’ gets changed and I have yet to understand why. Any help would be appreciated.

Bump. Please solve this asap as this makes the sweeps quite impractical

Hey all, thanks for your patience! This was fixed here and this will be available in our next SDK release