umap: RecursionError
Hi, thanks for the great package.
I have a dataset which has 200000 rows and 15 columns. I tried to apply UMAP as following
embedding = umap.UMAP(n_neighbors=5, min_dist=0.3, metric='correlation').fit_transform(data)
After 10 seconds, I got following exceptions :
- RecursionError: maximum recursion depth exceeded while calling a Python object
- return make_angular_tree(data, indices, rng_state, leaf_size) SystemError: CPUDispatcher(<function angular_random_projection_split at 0x000001C8260D6378>) returned a result with an error set
I set the system recursion limit to 10000 as below and tried again but then python exited with a code like -143537645 meaning exited with error.
sys.setrecursionlimit(10000)
Is there any solution, workaround or anything I can do for this problem?
Thanks.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 23 (11 by maintainers)
Commits related to this issue
- #99: port recursion error fix from pynndescent — committed to jlmelville/umap by jlmelville 5 years ago
- Merge pull request #220 from jlmelville/master #99: port recursion error fix from pynndescent — committed to lmcinnes/umap by lmcinnes 5 years ago
@lmcinnes I attached you a 5000x100 numpy array that always makes UMAP crashes. I’m reproducing the error with :
umap = UMAP(metric="cosine", n_components=2, min_dist=0.3).fit_transform(data)
data.npy.zip
I think some fixes to other issues may actually resolve this, so I’ll try to roll out a patch release in the next few days that will hopefully solve this.
That’s greta news. I’ll try to put out a release soon.
On Sun, Apr 28, 2019 at 2:17 AM Radamés Ajna notifications@github.com wrote:
I’m also having this problem. Adding an uncomfortably large amount of noise as a hack also works for me
This is an odd issue that I never fully tracked down – it seems to depend on an odd data distribution (often involving duplicate points). What is happening is that the random projection tree recursively splits the data into smaller and smaller pieces. Apparently we hit the recursion limit. In practice we should expect the data to be split approximately in half each time, so the tree depth should be expected to be around log_2(200000) ~ 18. Somehow, instead we have a tree depth that has exceeded 10000, so the splitting is working very strangely.
One potential solution is to add a small amount of noise to the data (smaller than the smallest distances between non-identical samples). This may work around the problem for you.
On Thu, Aug 2, 2018 at 3:59 AM Samet Dumankaya notifications@github.com wrote: