nltk: Infinite loop in Wordnet closure and tree
When I try to compute the hyponymic closure of restrain.v.01
I get stuck in an infinite loop. Example code:
>>> import nltk
>>> nltk.__version__
'3.4.3'
>>> from nltk.corpus import wordnet as wn
>>> ss = wn.synset('restrain.v.01')
>>> list(ss.closure(lambda s:s.hyponyms()))
And from here it just hangs. I guess that the reason is that the method is based on breadth_first
, which explicitly states in the documentation that is not designed to handle loops.
The same happens when running
>>> list(ss.tree(lambda s:s.hyponyms()))
Which fails with a RecursionError
.
This seems to be closely related to #52, which is now closed, but the bug seems to be still present nonetheless.
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 15 (10 by maintainers)
Commits related to this issue
- Fixes issue #2314 (Infinite loop in Wordnet closure and tree) — committed to ekaf/nltk by ekaf 4 years ago
- Fixes issue #2314 (Infinite loop in Wordnet closure and tree) (#2612) * Fixes issue #2314 (Infinite loop in Wordnet closure and tree) * Edited comments for closure and tree * use pop(0) instead... — committed to nltk/nltk by ekaf 4 years ago
If you go with proposed solution, don’t use mutable default argument values, change
def tree(synset, rel, traversed=set()):
To
def tree(synset, rel, traversed=None): if traversed is None: traversed = set()
Best, Dima
On Sat, May 9, 2020, 08:43 Zhuopsticks notifications@github.com wrote:
Please note however that although @ekaf’s proposed solution above for “closure” solves the endless cycles (by checking if a synset was already traversed), it introduces a bug that is worse than the original problem, since it truncates trees that contain non-cyclic duplicate branches.
For example, please consider the hypernyms tree of ‘dog.n.01’ (listed in the comments of the original source code in wordnet.py). This tree starts with two distinct hypernym branches (‘canine.n.02’ and ‘domestic_animal.n.01’), which merge again at ‘animal.n.01’. Then, the hypernyms of ‘animal.n.01’ only need to be collected once, but would need to be copied to the second branch in order to display the full tree.
On the contrary, @basaldella solution for tree does not have that problem because, for each child, it recurses with a different copy of the “traversed” set.
Meanwhile, using only one global traversed set, “tree” does not display the full tree, but truncates the second branch:
[Synset(‘dog.n.01’), [Synset(‘canine.n.02’), [Synset(‘carnivore.n.01’), [Synset(‘placental.n.01’), [Synset(‘mammal.n.01’), [Synset(‘vertebrate.n.01’), [Synset(‘chordate.n.01’), [Synset(‘animal.n.01’), [Synset(‘organism.n.01’), [Synset(‘living_thing.n.01’), [Synset(‘whole.n.02’), [Synset(‘object.n.01’), [Synset(‘physical_entity.n.01’), [Synset(‘entity.n.01’)]]]]]]]]]]]]], [Synset(‘domestic_animal.n.01’), [Synset(‘animal.n.01’), ‘(Already Traversed…)’]]]
However, using only one global “traversed” set, is the only solution that can terminate when calling a tree that is too big to otherwise fit in memory, which happens when relations produce a big Strongly Connected Component, like f.ex the also_sees() relation.