scipy: Massive performance regression in cKDTree.query with L_inf distance in scipy 0.17

Hi,

While running some of my own code, I noticed a major slowdown in scipy 0.17 which I tracked to the cKDtree.query call with L_inf distance. I did a git bisect and tracked the commit that caused the issue – the commit is : 58a10c2688398ab440de8d6d7b5fba29671df182 Here is the example program that finishes in ~ 10 seconds before this commit and pretty much never finishes after it

import scipy.spatial.ckdtree
import numpy as np
arr1 = np.random.uniform(size=(6000000, 3))
tree2 = scipy.spatial.ckdtree.cKDTree(arr1)
distances, ind = tree2.query(arr1, 1, p=np.inf)

I also checked that if I change the distance from L_inf to L_2, then program finishes pretty quickly. I also tried to stop the program with gdb a few times to see where the code is mostly spending the time on and it seems to be here scipy/spatial/ckdtree/src/query.cxx:311. I also tried to dig into the problematic commit but I don’t know enough about ckdtree internals to suggest a patch.

Thanks, Sergey

If relevant, I’m seeing this issue on a both systems that I tried 64bit RHEL6.7 with gcc 4.4 and Debian 7.9 with gcc-4.7. In all the cases it is python 2.7

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 26 (26 by maintainers)

Commits related to this issue

Most upvoted comments

I will just add a couple of const members to the template structs and use a conditional branch on those instead of p and raw_boxsize_data. Then the compiler will optimize it away.