nltk: Infinite loop in Wordnet closure and tree

When I try to compute the hyponymic closure of restrain.v.01 I get stuck in an infinite loop. Example code:

>>> import nltk
>>> nltk.__version__
'3.4.3'
>>> from nltk.corpus import wordnet as wn
>>> ss = wn.synset('restrain.v.01')
>>> list(ss.closure(lambda s:s.hyponyms()))

And from here it just hangs. I guess that the reason is that the method is based on breadth_first, which explicitly states in the documentation that is not designed to handle loops.

The same happens when running

>>> list(ss.tree(lambda s:s.hyponyms()))

Which fails with a RecursionError.

This seems to be closely related to #52, which is now closed, but the bug seems to be still present nonetheless.

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Comments: 15 (10 by maintainers)

Commits related to this issue

Most upvoted comments

If you go with proposed solution, don’t use mutable default argument values, change

def tree(synset, rel, traversed=set()):

To

def tree(synset, rel, traversed=None): if traversed is None: traversed = set()

Best, Dima

On Sat, May 9, 2020, 08:43 Zhuopsticks notifications@github.com wrote:

Just tested this issue. It still exists. Can I have a try to fix it then?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nltk/nltk/issues/2314#issuecomment-626194703, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGYH52WHILET3SMUGFKKBDRQV2ZNANCNFSM4HW6THBQ .

Please note however that although @ekaf’s proposed solution above for “closure” solves the endless cycles (by checking if a synset was already traversed), it introduces a bug that is worse than the original problem, since it truncates trees that contain non-cyclic duplicate branches.

For example, please consider the hypernyms tree of ‘dog.n.01’ (listed in the comments of the original source code in wordnet.py). This tree starts with two distinct hypernym branches (‘canine.n.02’ and ‘domestic_animal.n.01’), which merge again at ‘animal.n.01’. Then, the hypernyms of ‘animal.n.01’ only need to be collected once, but would need to be copied to the second branch in order to display the full tree.

On the contrary, @basaldella solution for tree does not have that problem because, for each child, it recurses with a different copy of the “traversed” set.

Meanwhile, using only one global traversed set, “tree” does not display the full tree, but truncates the second branch:

[Synset(‘dog.n.01’), [Synset(‘canine.n.02’), [Synset(‘carnivore.n.01’), [Synset(‘placental.n.01’), [Synset(‘mammal.n.01’), [Synset(‘vertebrate.n.01’), [Synset(‘chordate.n.01’), [Synset(‘animal.n.01’), [Synset(‘organism.n.01’), [Synset(‘living_thing.n.01’), [Synset(‘whole.n.02’), [Synset(‘object.n.01’), [Synset(‘physical_entity.n.01’), [Synset(‘entity.n.01’)]]]]]]]]]]]]], [Synset(‘domestic_animal.n.01’), [Synset(‘animal.n.01’), ‘(Already Traversed…)’]]]

However, using only one global “traversed” set, is the only solution that can terminate when calling a tree that is too big to otherwise fit in memory, which happens when relations produce a big Strongly Connected Component, like f.ex the also_sees() relation.