joeynmt: Regression: custom batches no longer supported
Describe the bug
In joeynmt<2.0.0, the TrainManager used to receive a batch_class to handle custom batches.
In joeynmt=2.0.0, it is no longer needed, as there is the make_data_iter which calls a collate_fn responsible to return whatever.
This created a regression - it used to be possible to use this repository as a library - but now one has to make hard-coded changes in the code in order to have a custom make_data_iter.
It is hard coded in training and prediction
Possible Solutions: For training:
- to have
data_itercallable passed to theTrainManager - in the train manager, have a method so we can extend it:
def make_data_iter(self, **kwargs):
return make_data_iter(**kwargs)
For prediction:
- pass this as an argument to the
predictmethod?
What solutions would you prefer/approve if any?
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (8 by maintainers)
I moved it to two files as @may- and @juliakreutzer both thought “three classes for data-related functions might be confusing”.
I’d argue for merging and releasing as is (to support the functionality), and if down the line some restructuring needs to be done, that’s fine.
@AmitMY
Do we need to separate the
load_data()func fromdata.py? I wonder if we really need bothdata.pyanddata_loader.py. If it’s just because of the circular import problem, we could movemake_data_iter()fromdata.pytodatasets.py, maybe?make_data_iter()is now actually a member of the dataset class. And other sampler funcs + collate func belong to thismake_data_iter(). What do you think about this?I did some nasty hack to avoid circular import before, and I regret it… 🤕 https://github.com/joeynmt/joeynmt/blob/38fcd3a2b19ee657f355cd9c6833c19cf29fb703/joeynmt/helpers.py#L29-L31 I should have put the
log_data_info()indata.pyfile, instead.Don’t worry about it, I could have probably made myself clearer.
PR is ready to be reviewed. @juliakreutzer could you give it a look? I’d recommend reviewing it one-commit-at-a-time as each has a specific change. (https://github.com/joeynmt/joeynmt/pull/189/commits/7a03d297373bdcb8a154e42955ab08ed0660e119 then https://github.com/joeynmt/joeynmt/pull/189/commits/0fb08b8d5e276d026528c2c36c49217618119854)