rust-phf: Complexity of codegen algorithm

I’ve got a list of 14_000_000 passwords that I want to create a hash from. I’m finding that up to say 500_000 it completes more or less instantly, but at 1_000_000 it takes a long time (20mins+). I would guess anecdotally that the build step is O(n^2) or thereabouts. Has anyone else done perf work on the generation algorithm?

Some timings:

100_000 - 1 sec
500_000 - 40 sec
1_000_000 - abort after 20 mins

Rough complexity

Using the first 2 results Gives time = k * size ^ n where n = 2.3, k = 3e-12, so for my 14_000_000 passwords I’m looking at about 24 hours 😢 (modulo horrendous assumptions)

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 27 (1 by maintainers)

Most upvoted comments

Comparing the performance between these two commits:

https://github.com/abonander/rust-phf/commit/c2888012c2d202a9582eb56f3ee305d3c45c650d
https://github.com/abonander/rust-phf/commit/1919661307592aedbb4c3ea5c6604fc8586f7824 (@derekdreery’s branch rebased on latest master)

In the former, both Siphasher and FNV produce a solution in ~1.5s for 500,000. For 1M, FNV solves in ~4s and Siphasher solves in ~2. The only things I changed is it now generates 3 hashes instead of 1, and takes the top 32 bits of the result (which this page suggests is higher entropy than low bits). For both, I am running cargo run --release inside phf_bench/. (I also changed the RNG seed just to see what happened.)

However, FNV still fails to solve when running cargo test for the whole repo so Siphasher is probably the better choice overall. The only tests that are failing right now are the compile-fail tests for phf_macros.

abonander on Jul 3, 2019

I have found that performance is heavily dependent on the quality of the hash function. I tried switching to FNV a while ago and even small key sets failed to solve in a reasonable amount of time.

sfackler on Nov 27, 2018