ripgrep: ripgrep much slower than (rip)grep

What version of ripgrep are you using?

ripgrep 11.0.1 -SIMD -AVX (compiled) +SIMD -AVX (runtime)

How did you install ripgrep?

pacman -S ripgrep

What operating system are you using ripgrep on?

Arch Linux

Describe your question, feature request, or bug.

ripgrep is 3-4-5 times slower than GNU grep on simple string pattern.

If this is a bug, what is the actual behavior?

grep file\.php -r .  0.19s user 0.10s system 98% cpu 0.298 total
rg (--no-ignore) file\.php  3.68s user 0.45s system 365% cpu 1.129 total
rg --debug --hidden --no-ignore file\.php  6.12s user 0.42s system 383% cpu 1.706 total
DEBUG|grep_regex::literal|grep-regex/src/literal.rs:115: required literal found: "file"
DEBUG|globset|globset/src/lib.rs:435: built glob set; 0 literals, 0 basenames, 11 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|globset/src/lib.rs:435: built glob set; 0 literals, 0 basenames, 11 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|globset/src/lib.rs:435: built glob set; 0 literals, 0 basenames, 11 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|globset/src/lib.rs:435: built glob set; 0 literals, 0 basenames, 11 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes
DEBUG|globset|globset/src/lib.rs:435: built glob set; 0 literals, 0 basenames, 11 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 0 regexes

As I tried to find one-by-one what top-level file (or directory) causes the slowdown, I came to the conclusion: none. Ripgrep should be faster… So I did this table: (Sorry, for hiding the names, I’m not sure if I’m allowed to show them.)

#!/bin/sh
$ echo GREPWHAT=$GREPWHAT RGFLAGS=$RGFLAGS GREPFLAGS=$GREPFLAGS; find -maxdepth 1 | xargs -I{} sh -c 'echo $(echo {} | tr a-z x):$(du -sh {} | cut -f1):$(find {} | wc -l):$( ( time rg "$GREPWHAT" $RGFLAGS {} &>/dev/null ) 2>&1 | tr \\t\\n \  ):$( ( time find -mindepth 1 -maxdepth 1 -not -path {} | xargs rg "$GREPWHAT" $RGFLAGS  &>/dev/null ) 2>&1 | tr \\t\\n \  ):$( ( time grep "$GREPWHAT" $GREPFLAGS -r {} &>/dev/null ) 2>&1 | tr \\t\\n \ ):$( ( time find -mindepth 1 -maxdepth 1 -not -path {} | xargs grep "$GREPWHAT" $GREPFLAGS -r &>/dev/null ) 2>&1 | tr \\t\\n \  )' | column -ts: -Rsize,files,rgtime,rgwothistime,greptime,grepwothistime -Nname,size,files,rgtime,rgwothistime,greptime,grepwothistime

GREPWHAT=xxxxxxxxxxxxxxxxxxx.xxx RGFLAGS=-j1 GREPFLAGS=
name                  size  files                                      rgtime                                rgwothistime                                    greptime                             grepwothistime
.                     127M  12961   real 0m0.964s user 0m3.295s sys 0m0.348s    real 0m0.971s user 0m3.285s sys 0m0.388s    real 0m0.146s user 0m0.039s sys 0m0.100s    real 0m0.148s user 0m0.069s sys 0m0.078s
./xx                  4.0K      1   real 0m0.009s user 0m0.006s sys 0m0.003s    real 0m0.919s user 0m3.097s sys 0m0.406s    real 0m0.003s user 0m0.000s sys 0m0.003s    real 0m0.146s user 0m0.055s sys 0m0.089s
./_xxx_xxxxxx.xxx     428K      1   real 0m0.011s user 0m0.008s sys 0m0.004s    real 0m0.936s user 0m3.123s sys 0m0.388s    real 0m0.004s user 0m0.004s sys 0m0.000s    real 0m0.144s user 0m0.043s sys 0m0.100s
./xxxxxx               89M  11741   real 0m0.217s user 0m0.576s sys 0m0.211s    real 0m0.221s user 0m0.688s sys 0m0.109s    real 0m0.106s user 0m0.036s sys 0m0.064s    real 0m0.051s user 0m0.023s sys 0m0.029s
./xxxxx                16K      7   real 0m0.010s user 0m0.009s sys 0m0.004s    real 0m0.942s user 0m3.101s sys 0m0.418s    real 0m0.003s user 0m0.003s sys 0m0.000s    real 0m0.147s user 0m0.049s sys 0m0.098s
./xxxxxxx              52K     28   real 0m0.014s user 0m0.015s sys 0m0.004s    real 0m0.946s user 0m3.148s sys 0m0.418s    real 0m0.003s user 0m0.003s sys 0m0.000s    real 0m0.152s user 0m0.048s sys 0m0.099s
./xxxxxx.xxx          4.0K      1   real 0m0.008s user 0m0.006s sys 0m0.003s    real 0m0.929s user 0m3.113s sys 0m0.400s    real 0m0.003s user 0m0.000s sys 0m0.003s    real 0m0.143s user 0m0.051s sys 0m0.091s
./xxxxxx               20K      5   real 0m0.012s user 0m0.015s sys 0m0.000s    real 0m0.926s user 0m3.114s sys 0m0.421s    real 0m0.003s user 0m0.000s sys 0m0.003s    real 0m0.144s user 0m0.038s sys 0m0.105s
./xxxxxxxxx           248K     60   real 0m0.013s user 0m0.015s sys 0m0.005s    real 0m0.939s user 0m3.139s sys 0m0.384s    real 0m0.004s user 0m0.000s sys 0m0.004s    real 0m0.151s user 0m0.056s sys 0m0.095s
./xxxxxx.xx           4.0K      1   real 0m0.006s user 0m0.000s sys 0m0.006s    real 0m0.937s user 0m3.176s sys 0m0.413s    real 0m0.004s user 0m0.004s sys 0m0.000s    real 0m0.145s user 0m0.055s sys 0m0.089s
./xxxxxx               35M    522   real 0m0.068s user 0m0.138s sys 0m0.077s    real 0m0.227s user 0m0.639s sys 0m0.211s    real 0m0.036s user 0m0.025s sys 0m0.011s    real 0m0.114s user 0m0.031s sys 0m0.082s
./xxxx.xxx            4.0K      1   real 0m0.006s user 0m0.006s sys 0m0.001s    real 0m0.933s user 0m3.123s sys 0m0.408s    real 0m0.003s user 0m0.000s sys 0m0.003s    real 0m0.150s user 0m0.045s sys 0m0.104s
./xxxxxxx.xxx         4.0K      1   real 0m0.007s user 0m0.001s sys 0m0.006s    real 0m0.941s user 0m3.139s sys 0m0.409s    real 0m0.003s user 0m0.003s sys 0m0.000s    real 0m0.145s user 0m0.072s sys 0m0.072s
./xxxxx.xxx           4.0K      1   real 0m0.008s user 0m0.004s sys 0m0.004s    real 0m0.957s user 0m3.127s sys 0m0.419s    real 0m0.004s user 0m0.004s sys 0m0.000s    real 0m0.154s user 0m0.048s sys 0m0.101s
./xxxxxxxx            276K     52   real 0m0.013s user 0m0.016s sys 0m0.000s    real 0m1.037s user 0m2.956s sys 0m0.385s    real 0m0.004s user 0m0.004s sys 0m0.000s    real 0m0.147s user 0m0.036s sys 0m0.109s
./xxxxxx              128K     34   real 0m0.012s user 0m0.010s sys 0m0.005s    real 0m0.934s user 0m3.142s sys 0m0.398s    real 0m0.003s user 0m0.000s sys 0m0.003s    real 0m0.146s user 0m0.062s sys 0m0.083s
./xxxxxxxx.xxxx       212K      1   real 0m0.009s user 0m0.005s sys 0m0.004s    real 0m0.924s user 0m3.123s sys 0m0.362s    real 0m0.004s user 0m0.000s sys 0m0.003s    real 0m0.147s user 0m0.063s sys 0m0.083s
./xxxxxxxx.xxxx       4.0K      1   real 0m0.008s user 0m0.003s sys 0m0.005s    real 0m0.928s user 0m3.067s sys 0m0.422s    real 0m0.004s user 0m0.004s sys 0m0.000s    real 0m0.151s user 0m0.057s sys 0m0.088s
./xxxxxx              1.2M    294   real 0m0.018s user 0m0.027s sys 0m0.008s    real 0m0.852s user 0m2.761s sys 0m0.476s    real 0m0.007s user 0m0.000s sys 0m0.007s    real 0m0.141s user 0m0.033s sys 0m0.108s
./xxxxxxxxx            12K      5   real 0m0.012s user 0m0.016s sys 0m0.000s    real 0m0.987s user 0m2.797s sys 0m0.365s    real 0m0.003s user 0m0.000s sys 0m0.003s    real 0m0.148s user 0m0.038s sys 0m0.109s
./xxxxxxx             4.0K      1   real 0m0.005s user 0m0.000s sys 0m0.005s    real 0m1.074s user 0m2.989s sys 0m0.386s    real 0m0.005s user 0m0.004s sys 0m0.000s    real 0m0.143s user 0m0.057s sys 0m0.085s
./xxx                 764K    198   real 0m0.014s user 0m0.011s sys 0m0.012s    real 0m0.970s user 0m2.792s sys 0m0.481s    real 0m0.005s user 0m0.005s sys 0m0.000s    real 0m0.201s user 0m0.065s sys 0m0.133s
./.xxxxxxxx.xxxx.xxx   44K      1   real 0m0.007s user 0m0.003s sys 0m0.003s    real 0m1.393s user 0m2.965s sys 0m0.569s    real 0m0.004s user 0m0.000s sys 0m0.003s    real 0m0.205s user 0m0.079s sys 0m0.119s
./.xxxxxxxxx          4.0K      1   real 0m0.007s user 0m0.001s sys 0m0.006s    real 0m1.140s user 0m2.975s sys 0m0.556s    real 0m0.004s user 0m0.000s sys 0m0.004s    real 0m0.156s user 0m0.051s sys 0m0.099s
./.xxxxxxxxxxxxx      4.0K      1   real 0m0.006s user 0m0.003s sys 0m0.003s    real 0m0.954s user 0m3.055s sys 0m0.484s    real 0m0.003s user 0m0.003s sys 0m0.000s    real 0m0.149s user 0m0.070s sys 0m0.078s
./.xxx.xxxxxxx        4.0K      1   real 0m0.007s user 0m0.004s sys 0m0.003s    real 0m0.956s user 0m3.130s sys 0m0.402s    real 0m0.003s user 0m0.003s sys 0m0.000s    real 0m0.149s user 0m0.068s sys 0m0.076s
$ # Note that the sum of ripgrep times doesn't match with . time.
$ # wothis means: grepping in the directory with that file excluded.
$ time tokei
-------------------------------------------------------------------------------
 Language            Files        Lines         Code     Comments       Blanks
-------------------------------------------------------------------------------
...
-------------------------------------------------------------------------------
 Total                9256      1221713       827992       262436       131285
-------------------------------------------------------------------------------

real    0m1.061s
user    0m3.384s
sys     0m0.295s
$ # Note that tokei has similar running time.
$ stat -f . | head -3
  File: "."
    ID: 0        Namelen: 255     Type: tmpfs
Block size: 4096       Fundamental block size: 4096
$ # Fast as hell tmpfs. Does it count?

Tomorrow I will make a line length statistics if it counts, but I think it should be an issue around directory listing. Anyway, I would like to help, so tell me what to do/share. --debug didn’t help too much.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 38 (36 by maintainers)

Most upvoted comments

Wonderful, thank you. I can finally repro it. This is a very interesting bug. To exaggerate it a bit, I copied your long-line file so that there was 120MB on a single line:

$ wc longline
        1         1 120365121 longline

To establish a baseline and get my bearings, I just tried poking at that single file to see how ripgrep behaves compared to grep:

$ time rg -a zzzzz longline

real    0.031
user    0.023
sys     0.007
maxmem  117 MB
faults  0

$ time rg -a zzzzz longline --no-mmap

real    0.154
user    0.067
sys     0.087
maxmem  159 MB
faults  0

$ time grep -a zzzzz longline

real    0.345
user    0.124
sys     0.221
maxmem  231 MB
faults  0

The first command is a bit deceptive, because ripgrep will open the file using memory maps, which avoids the whole buffering strategy completely. So ripgrep zips through it without needing to care about line terminators and such. My guess is that this probably added to some of your confusion when trying to get a smaller reproduction, but I forgot about it. Once we disable memory maps, we can see that ripgrep slows down a bit, but it’s still 2x faster than grep.

Another curiosity here is that the maximum memory used by grep is ~2x more than ripgrep. My guess here is that this is because grep is using a larger buffer size, but I’m not 100% on this.

However, things start to change once we try to search multiple files at once. Notably, this does not do directory traversal or use parallelism, so we rule that out as the cause:

$ time rg -a -j1 zzzzz longline 2*

real    0.549
user    0.464
sys     0.083
maxmem  159 MB
faults  0

$ time rg -a -j1 zzzzz longline 2* --mmap

real    0.030
user    0.020
sys     0.010
maxmem  117 MB
faults  0

$ time grep -a zzzzz longline 2*

real    0.331
user    0.150
sys     0.180
maxmem  231 MB
faults  0

The first two commands confirm that using memory maps “fixes” this problem, so it’s really starting to look like very long lines are the issue. Despite the fact that grep was 2x slower searching just longline, it winds up being noticeably faster in this particular search. For completeness, we verify that 2* isn’t causing any issues on its own:

$ time rg -a -j1 zzzzz 2*

real    0.010
user    0.006
sys     0.003
maxmem  10 MB
faults  0

$ time grep -a zzzzz 2*

real    0.008
user    0.002
sys     0.005
maxmem  10 MB
faults  0

So that seems fine. Digging into this a bit more, running perf record with the --call-graph option enabled lets us track more precisely why things are being slow. Running

$ perf record -g --call-graph rg -a -j1 --no-mmap zzzzz longline

looks fairly uninteresting and about what I’d expect. Most time is being spent in resizing the buffer allocation to accommodate the large line, zeroing the memory and running memchr on it for the actual search.

However, changing the command to

$ perf record -g --call-graph rg -a -j1 --no-mmap zzzzz longline 2*

reveals that 90% of the time is being spent in __memmove_avx_unaligned_erms, and moreover, all of that time is coming from within ripgrep’s LineBuffer::roll method. Now that’s strange, but I think it’s also consistent with everything you’ve observed so far. In other words, we have two pieces that, on their own, are fine, but when combined, something perverse is happening.

My instinct tells me that something is going wrong with reusing the allocation. In particular, if a file like longline grows the allocation for the line buffer to be very large, and that same buffer is reused for much smaller files, then perhaps whenever those files need to roll the buffer, they wind up paying a much larger cost for shifting things around.

Hmmm… yes… this is starting to make sense. In the roll implementation, we have this call:

self.buf.copy_within_str(self.pos.., 0);

which effectively moves everything at self.pos and after it back to the beginning of the buffer. If this line buffer had grown very large—due to a file with very long lines—then this is going to do a ton of copying.

But copying everything after self.pos is not necessary. All we need to do is copy anything that has been written after self.pos back to the beginning. This is exactly the range self.pos..self.end, where any bytes after self.end are just treated as free space that the buffer can use on the next read. So, let’s try changing the code to

self.buf.copy_within_str(self.pos..self.end, 0);

And re-running the command

$ time rg -a -j1 zzzzz longline 2*

real    0.159
user    0.066
sys     0.092
maxmem  159 MB
faults  0

Voilà! To pop back and make sure everything works across the entire directory:

$ time rg -auuuj1 zzzzz

real    0.158
user    0.053
sys     0.104
maxmem  159 MB
faults  0

$ time rg -auuu zzzzz

real    0.167
user    0.057
sys     0.153
maxmem  177 MB
faults  0

$ time grep -ar zzzzz

real    0.338
user    0.120
sys     0.217
maxmem  231 MB
faults  0

So I think that fixes it. 😃 I think this explains everything. Even your changes to directory traversal, and the otherwise inconsistent results you were seeing. Namely, if the file with a very long line is searched after most other files, then the performance impact will be small, since the large buffer will be used for a comparatively small number of files. So the only way to observe this is to be in a situation where searching the file with a long line occurs earlier in the search.

This was an excellent bug. Thank you so much for sticking with it and providing a reproduction that I could dig into. This is now fixed on master!

But generally speaking yes, if I grep for regexp-less or queries with moderated amount of regexps I get similar times

For grins, can you try searching for qqqqqqqqqq? (I am trying to rule out pathological behavior in the literal searcher. ripgrep uses a different one from grep. Searching for a pattern with the same rare letter repeated should hopefully put these on a level playing field. Again, just a shot in the dark.)

Probably You do, but I didn’t think of it before, that openat and stat errors come from nonexistent .ignore and .git… etc files.

Right yeah. Those shouldn’t be happening though when running with -uuu. If they are, then that’s a bug.

Thanks for really digging into this! I’ll be quite curious to see what you find!

Have you tried different queries?

Yes, of course, but results have to taken with care, because with “complex” regexp queries ripgrep can easily beat grep even if it’s slowed down by this issue. But generally speaking yes, if I grep for regexp-less or queries with moderated amount of regexps I get similar times.


In the mean time, I created some simple projects to test ignore and walkdir crates. Probably You do, but I didn’t think of it before, that openat and stat errors come from nonexistent .ignore and .git… etc files.

UPDATE: And futex errors come from ignore crate parallel walking.

UPDATE2: Walking the directory: 0.150s; parallel: highly varies between 0.125-0.170s. I will try making some changes in the parallel walk code of ignore to see if some changes.

Nope, that’s very wrong

Oops, sorry. What was in my mind is that they don’t have to be printed, so they don’t have to be stored. Silly thing.