galois: Matrix row reduction is very slow

Hi,

I’ve tried to use this library to row reduce a (quite) big matrix over a finite field. It succeeded and took quite a huge time in comparison with a naïve implementation.

As an example I copy paste the timing from the jupyter notebook.

with: https://www.nayuki.io/page/gauss-jordan-elimination-over-any-field

mat.reduced_row_echelon_form()

CPU times: user 4min 15s, sys: 18.7 ms, total: 4min 15s
Wall time: 4min 15s

with galois:

MM_reduced = MM.row_reduce()

CPU times: user 4h 21min 24s, sys: 3.06 s, total: 4h 21min 27s
Wall time: 4h 21min 28s

the matrix, for reference is sparse and defined over a 128-bit prime, the shape is: 513x1025

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 18 (7 by maintainers)

Most upvoted comments

@ddddavidee I pushed another performance improvement to that branch. You’ll likely see another speed up.

@nayuki, what I meant is that the same computation took a very different time (~60x times) changing from your code to the one in galois… and I was thinking that there was a problem somewhere…

@mhostetter I pulled the branch and re-tried the computation (with some other small changes, I used lru_cache from functools, and shortcut the ‘division-by-one’ case, and now the computation for the very same matrix takes less than two minutes (Wall time: 1min 43s)

@mhostetter Thanks. My comment reflected a misunderstanding of the problem @ddddavidee was facing.

My opinion now is that probably both my algorithm and your galois library algorithm are O(n3), but supposedly yours has a constant factor 100× bigger.

@nayuki I saw a comment from you, but don’t see it anymore. I love your website and definitely used it when developing this library. Would love to collaborate in some way.

@ddddavidee I think potentially the issue arises in how the FieldArray is verified in the constructor. Since it uses dtype=object for such large fields, the __new__() method makes sure the objects in the array are Python ints, in the valid range, and so forth. Potentially that overhead is getting called far too often when all the row reduction operations are being performed.

I need to verify this, but that’s my hunch.

yes, I can provide a file with the matrix. I’ll do later today or tomorrow morning (europe time)

Hey, thanks for pointing out this issue.

I’ll need to run some tests and dig into this. Since .row_reduce() is not JIT compiled, as referenced in #136, it is not as fast as it could be. However, your comparison is against a pure Python, naive implementation (or so it seems), and the performance delta is quite surprising to me. I thought I had a naive implementation as well. Perhaps I overlooked something.

Were the reduced row echelon forms the same?