transformers: [example scripts] inconsistency around eval vs val
val== validation set (split)eval== evaluation (mode)
those two are orthogonal to each other - one is a split, another is a model’s run mode.
the trainer args and the scripts are inconsistent around when it’s val and when it’s eval in variable names and metrics.
examples:
eval_datasetbut--validation_fileeval_*metrics key for validation dataset - why the prediction is thentest_*metric keys?data_args.max_val_samplesvseval_datasetin the same line
the 3 parallels:
trainis easy - it’s both the process and the splitpredictionis almost never used in the scripts it’s alltest- var names and metrics and cl argsevalvsvalvsvalidationis very inconsistent. when writing tests I’m never sure whether I’m looking upeval_*orval_*key. And one could run evaluation on the test dataset.
Perhaps asking a question would help and then a consistent answer is obvious:
Are metrics reporting stats on a split or a mode?
A. split - rename all metrics keys to be train|val|test
B. mode - rename all metrics keys to be train|eval|predict
Thank you.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 19 (16 by maintainers)
No the key in the dataset dictionary is “validation”, so it should be
validation_file.Awesome! Thank you, @bhadreshpsavani!
So the changes we need are:
evalinstead ofvalpredictinstead oftestin cl args and variable names in example scripts (only the active ones, please ignore legacy/research subdirs).
I hope this will be a last rename in awhile.
@bhadreshpsavani, would this be something you’d like to work on by chance? If you haven’t tired of examples yet.
I vote for B, for consistency with
do_train,do_eval,do_predict.For examples: switching an arg name can be done without taking precautions for BC as long as the README is updated at the same time, but for
TrainingArguments(if any is concerned), a proper deprecation cycle has to be made.ok, so I had to disable a
Privacy Badgerfirefox extension and colab started working.First, make a habit to start colab with:
sometimes I get 12GB RAM, other times 25GB, 12GB is typically too low for much.
So
run_clmworks just fine even on 12GB. I had to use a small bs so edited your cmd lines to limit bs:this worked too:
and so did:
Ya, I think it silently aborted the run w/o any traceback. Might be because it is occupying the entire ram somehow. Similar behavior I observed when I run a really big docker image locally.
I will definitely try this command and dig more!
Thanks a lot for your input. This is really insightful! I will note down this as well 😃
Hi @stas00, Ya i will be happy to work more. Actually I was looking for some issues to work on!
Not really my area of expertise here, but I do agree with @stas00 -> I think we should keep the liberty of quickly adapting the examples