regex: Blocking behavior when run in loop on multiple threads
What version of regex are you using?
latest (1.8.4)
Describe the bug at a high level.
When repeatedly replacing a string in a loop on multiple threads (CPU count -1) the threads seem to block (run in sequence), instead of running in parallel. It seems like something is blocking the threads (maybe shared memory?). Tested on 2 different multi-core machines, both with Windows 10 64bit.
What are the steps to reproduce the behavior?
cargo run --release -- 1 (runtime 1s) vs cargo run --release -- 6 (runtime 6s)
use std::time::Instant;
use regex::Regex;
fn main() {
let mut args = std::env::args();
let _ = args.next();
let arg1 = args.next();
if arg1.is_none() {
panic!("arg1 missing!");
}
let n = arg1.unwrap().parse().unwrap();
let ts = Instant::now();
let mut handles = Vec::with_capacity(n);
for _ in 0..n {
handles.push(std::thread::spawn(|| {
let mut subject = "#".to_string();
let search = Regex::new(®ex::escape("#")).unwrap();
let replace = "benchmark#";
let ts = Instant::now();
for _ in 0..38000 {
subject = search.replace_all(&subject, replace).to_string();
}
println!("thread {}", ts.elapsed().as_secs_f32());
}));
}
for handle in handles {
handle.join().unwrap();
}
println!("total {}", ts.elapsed().as_secs_f32());
}
What is the actual behavior?
The total runtime is the sequential runtime.
What is the expected behavior?
The total runtime should be at most the runtime of the slowest thread on a multi-core machine.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 30 (8 by maintainers)
Did anyone try to rule out the ~CPU~ OS thread scheduler? Apparently I missed a lot in the 2 hours since this was posted, but threw together a slight alteration of the benchmark to pin the threads individual CPUs and measure work over time. I suspect this is a scheduler issue based on my experience with low latency stuff in my professional life.
Demo repo if anyone is interested: https://github.com/code-ape/regex-bench-rs
@V0ldek Rust itself just uses the system allocator, which is usually maintained by different folks. In this case, the allocator used in what I’m guessing is MSVC. So probably the Windows folks.
Oh, sure, I missed this consideration. It might be worth saying that my results are from Linux but via WSL. Here are the results ran natively on Windows:
and the memchr bench:
I think that’s confirmation enough, clearly some shenanigans with allocator + specifically Windows.