joeynmt: Regression: custom batches no longer supported

Describe the bug In joeynmt<2.0.0, the TrainManager used to receive a batch_class to handle custom batches. In joeynmt=2.0.0, it is no longer needed, as there is the make_data_iter which calls a collate_fn responsible to return whatever.

This created a regression - it used to be possible to use this repository as a library - but now one has to make hard-coded changes in the code in order to have a custom make_data_iter.

It is hard coded in training and prediction

Possible Solutions: For training:

  1. to have data_iter callable passed to the TrainManager
  2. in the train manager, have a method so we can extend it:
def make_data_iter(self, **kwargs):
  return make_data_iter(**kwargs)

For prediction:

  1. pass this as an argument to the predict method?

What solutions would you prefer/approve if any?

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (8 by maintainers)

Most upvoted comments

I moved it to two files as @may- and @juliakreutzer both thought “three classes for data-related functions might be confusing”.

I’d argue for merging and releasing as is (to support the functionality), and if down the line some restructuring needs to be done, that’s fine.

@AmitMY

Do we need to separate the load_data() func from data.py? I wonder if we really need both data.py and data_loader.py. If it’s just because of the circular import problem, we could move make_data_iter() from data.py to datasets.py, maybe? make_data_iter() is now actually a member of the dataset class. And other sampler funcs + collate func belong to this make_data_iter(). What do you think about this?


I did some nasty hack to avoid circular import before, and I regret it… 🤕 https://github.com/joeynmt/joeynmt/blob/38fcd3a2b19ee657f355cd9c6833c19cf29fb703/joeynmt/helpers.py#L29-L31 I should have put the log_data_info() in data.py file, instead.

Don’t worry about it, I could have probably made myself clearer.

PR is ready to be reviewed. @juliakreutzer could you give it a look? I’d recommend reviewing it one-commit-at-a-time as each has a specific change. (https://github.com/joeynmt/joeynmt/pull/189/commits/7a03d297373bdcb8a154e42955ab08ed0660e119 then https://github.com/joeynmt/joeynmt/pull/189/commits/0fb08b8d5e276d026528c2c36c49217618119854)