regex: Blocking behavior when run in loop on multiple threads

What version of regex are you using?

latest (1.8.4)

Describe the bug at a high level.

When repeatedly replacing a string in a loop on multiple threads (CPU count -1) the threads seem to block (run in sequence), instead of running in parallel. It seems like something is blocking the threads (maybe shared memory?). Tested on 2 different multi-core machines, both with Windows 10 64bit.

What are the steps to reproduce the behavior?

cargo run --release -- 1 (runtime 1s) vs cargo run --release -- 6 (runtime 6s)

use std::time::Instant;

use regex::Regex;

fn main() {
    let mut args = std::env::args();
    let _ = args.next();
    let arg1 = args.next();
    if arg1.is_none() {
        panic!("arg1 missing!");
    }
    let n = arg1.unwrap().parse().unwrap();
    let ts = Instant::now();
    let mut handles = Vec::with_capacity(n);
    for _ in 0..n {
        handles.push(std::thread::spawn(|| {
            let mut subject = "#".to_string();
            let search = Regex::new(&regex::escape("#")).unwrap();
            let replace = "benchmark#";

            let ts = Instant::now();
            for _ in 0..38000 {
                subject = search.replace_all(&subject, replace).to_string();
            }
            println!("thread {}", ts.elapsed().as_secs_f32());
        }));
    }

    for handle in handles {
        handle.join().unwrap();
    }
    println!("total {}", ts.elapsed().as_secs_f32());
}

What is the actual behavior?

The total runtime is the sequential runtime.

What is the expected behavior?

The total runtime should be at most the runtime of the slowest thread on a multi-core machine.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 30 (8 by maintainers)

Most upvoted comments

Did anyone try to rule out the ~CPU~ OS thread scheduler? Apparently I missed a lot in the 2 hours since this was posted, but threw together a slight alteration of the benchmark to pin the threads individual CPUs and measure work over time. I suspect this is a scheduler issue based on my experience with low latency stuff in my professional life.

Demo repo if anyone is interested: https://github.com/code-ape/regex-bench-rs

@V0ldek Rust itself just uses the system allocator, which is usually maintained by different folks. In this case, the allocator used in what I’m guessing is MSVC. So probably the Windows folks.

I think the key here is that neither you nor me are reproducing the much more significant slowdown being observed by @c-antin on his Windows machine. I wonder if other Windows users can reproduce this.

Oh, sure, I missed this consideration. It might be worth saying that my results are from Linux but via WSL. Here are the results ran natively on Windows:

.\target\release\regex-bench.exe 1
total 0.14726941

.\target\release\regex-bench.exe 4
total 0.1904817

.\target\release\regex-bench.exe 8
total 0.2505401

.\target\release\regex-bench.exe 16
total 1.5162753

.\target\release\regex-bench.exe 32
total 3.470512

and the memchr bench:

.\target\release\regex-bench-memchr.exe 1
total 0.3882568

.\target\release\regex-bench-memchr.exe 4
total 0.98220855

.\target\release\regex-bench-memchr.exe 8
total 1.9249403

.\target\release\regex-bench-memchr.exe 16
total 7.0616555

.\target\release\regex-bench-memchr.exe 32
total 16.557264

I think that’s confirmation enough, clearly some shenanigans with allocator + specifically Windows.