data: Graph traversal is broken for custom iter datapipes

from torch.utils.data.graph import traverse
from torchdata.datapipes.iter import IterDataPipe, IterableWrapper


class CustomIterDataPipe(IterDataPipe):
    def noop(self, x):
        return x

    def __init__(self):
        self._dp = IterableWrapper([]).map(self.noop)

    def __iter__(self):
        yield from self._dp


traverse(CustomIterDataPipe())
RecursionError: maximum recursion depth exceeded

Without the .map() call it works fine. I don’t think this is specific to .map() though. From trying a few datapipes, this always happens if self._dp is composed in some way.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 3
  • Comments: 24 (24 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks @NivekT

This should mean that your code snippet above will work if dill is not installed.

This means that the cyclic reference issue is still happening for users that have dill installed, right? Unfortunately I’m not sure we can expect torchvision’s users to not have installed dill.

Do you think there is a way to fix the dill serialization to properly handle the cyclic ref issue?

Yep, the PR is meant to be a quick fix to unblock your work and I marked that as a TODO. I agree with your point and am looking into it.

@ejguan @NivekT , going back to @pmeier 's https://github.com/pytorch/data/issues/237#issuecomment-1080651807:

  • Does torchdata have to support dill?
  • If not, would we consider merging Philip’s proposed diff for fixing the cyclic reference issue?
  • If yes, would you mind providing some details on why dill is needed? Hopefully we can still find a fix for the cyclic reference issue while still supporting dill?

Fixing the cyclic reference issue would allow us to move forward with our preferred design for torchvision new datasets.

So that we can plan on how to best move forward on our side, how long do you think a fix would take @ejguan ?

It will be my top priority next week. I need to prepare all release-related stuff this week.

I think adding dill as a hard dependency is even more BC-breaking. I’m adding a PR to only use the custom serialization when dill is available. This should mean that your code snippet above will work if dill is not installed.