transformers: [example scripts] inconsistency around eval vs val
val
== validation set (split)eval
== evaluation (mode)
those two are orthogonal to each other - one is a split, another is a model’s run mode.
the trainer args and the scripts are inconsistent around when it’s val
and when it’s eval
in variable names and metrics.
examples:
eval_dataset
but--validation_file
eval_*
metrics key for validation dataset - why the prediction is thentest_*
metric keys?data_args.max_val_samples
vseval_dataset
in the same line
the 3 parallels:
train
is easy - it’s both the process and the splitprediction
is almost never used in the scripts it’s alltest
- var names and metrics and cl argseval
vsval
vsvalidation
is very inconsistent. when writing tests I’m never sure whether I’m looking upeval_*
orval_*
key. And one could run evaluation on the test dataset.
Perhaps asking a question would help and then a consistent answer is obvious:
Are metrics reporting stats on a split or a mode?
A. split - rename all metrics keys to be train|val|test
B. mode - rename all metrics keys to be train|eval|predict
Thank you.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 19 (16 by maintainers)
No the key in the dataset dictionary is “validation”, so it should be
validation_file
.Awesome! Thank you, @bhadreshpsavani!
So the changes we need are:
eval
instead ofval
predict
instead oftest
in cl args and variable names in example scripts (only the active ones, please ignore legacy/research subdirs).
I hope this will be a last rename in awhile.
@bhadreshpsavani, would this be something you’d like to work on by chance? If you haven’t tired of examples yet.
I vote for B, for consistency with
do_train
,do_eval
,do_predict
.For examples: switching an arg name can be done without taking precautions for BC as long as the README is updated at the same time, but for
TrainingArguments
(if any is concerned), a proper deprecation cycle has to be made.ok, so I had to disable a
Privacy Badger
firefox extension and colab started working.First, make a habit to start colab with:
sometimes I get 12GB RAM, other times 25GB, 12GB is typically too low for much.
So
run_clm
works just fine even on 12GB. I had to use a small bs so edited your cmd lines to limit bs:this worked too:
and so did:
Ya, I think it silently aborted the run w/o any traceback. Might be because it is occupying the entire ram somehow. Similar behavior I observed when I run a really big docker image locally.
I will definitely try this command and dig more!
Thanks a lot for your input. This is really insightful! I will note down this as well 😃
Hi @stas00, Ya i will be happy to work more. Actually I was looking for some issues to work on!
Not really my area of expertise here, but I do agree with @stas00 -> I think we should keep the liberty of quickly adapting the examples