rust-phf: Complexity of codegen algorithm
I’ve got a list of 14_000_000 passwords that I want to create a hash from. I’m finding that up to say 500_000 it completes more or less instantly, but at 1_000_000 it takes a long time (20mins+). I would guess anecdotally that the build step is O(n^2) or thereabouts. Has anyone else done perf work on the generation algorithm?
Some timings:
100_000- 1 sec500_000- 40 sec1_000_000- abort after 20 mins
Rough complexity
Using the first 2 results Gives time = k * size ^ n where n = 2.3, k = 3e-12, so for my 14_000_000 passwords I’m looking at about 24 hours 😢 (modulo horrendous assumptions)
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 27 (1 by maintainers)
Comparing the performance between these two commits:
In the former, both Siphasher and FNV produce a solution in ~1.5s for 500,000. For 1M, FNV solves in ~4s and Siphasher solves in ~2. The only things I changed is it now generates 3 hashes instead of 1, and takes the top 32 bits of the result (which this page suggests is higher entropy than low bits). For both, I am running
cargo run --releaseinsidephf_bench/. (I also changed the RNG seed just to see what happened.)However, FNV still fails to solve when running
cargo testfor the whole repo so Siphasher is probably the better choice overall. The only tests that are failing right now are the compile-fail tests forphf_macros.I have found that performance is heavily dependent on the quality of the hash function. I tried switching to FNV a while ago and even small key sets failed to solve in a reasonable amount of time.