ripgrep: --vimgrep doesn't satisfy vim's spec for multi-line searches

NOTE from BurntSushi: This issue has been re-classified as a bug that was discovered in the course of discussion. There’s too much context here to split it apart, which GitHub doesn’t really make easy to do anyway. So I’ve re-purposed this issue for the bug described in the title. The main details of the bug are described here: https://github.com/BurntSushi/ripgrep/issues/1866#issuecomment-843020533

Below is the original issue.


Describe your feature request

Vim recently added a new feature called “text properties” that can be used to highlight text. All it needs to know to highlight is:

  • lnum : line number where match begins
  • col : column (counted in bytes) where match begins
  • end_lnum : line number where match ends
  • end_col : column (counted in bytes) where match ends

Ideally there would be an option like --json-pos that just prints positions of matches for each file, perhaps in json format like following. If that is not possible or two cumbersome, is there a way I can already do this in a performance efficient way using ripgrep?

{
  {
    "file": "fileA.txt",
    "positions": [
      {
        "lnum": 10,
        "col": 4,
        "end_lnum": 12,
        "end_col": 7
      },
      {
        "lnum": 15,
        "col": 3,
        "end_lnum": 15,
        "end_col": 5
      },
      {
        "lnum": 17,
        "col": 12,
        "end_lnum": 20,
        "end_col": 10
      }
    ]
  },
  {
    "file": "fileB.txt",
    "positions": [
      {
        "lnum": 1,
        "col": 4,
        "end_lnum": 1,
        "end_col": 7
      },
      {
        "lnum": 30,
        "col": 3,
        "end_lnum": 31,
        "end_col": 5
      }
    ]
  }
}

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 28 (14 by maintainers)

Commits related to this issue

Most upvoted comments

You make it sound like we should know what you don’t know. There was nothing in your prior comments to come to that conclusion. Also, given you added the --vimgrep option, any user would assume you have read what the spec is.

Yes, you’re right, I’m sorry. I spoke out of frustration.

Thank you for the follow up details. If you don’t mind, I’d like to classify this issue as a bug and re-purpose it to “--vimgrep doesn’t satisfy vim’s spec for multi-line searches.”

If and when you get to a point where it’s clear that the --json output needs more information, please open a new issue.

@BurntSushi Regarding your “I don’t really know why you would expect the complete matches to not be printed.”. You are not printing complete matches, you are giving \n some special treatment and printing lines that don’t really “match” in rest of the industry’s terminology. In :vimgrep format only starting lines of matches are printed, this also helps in counting the number of matches if someone uses the same output in other scripts/tools. If someone were to use use ripgrep to feed its --vimgrep output for 3 different consumers: 1) feed to a vim, 2) feed to another program that expects :vimgrep format, 3) count the number of matches based on line count, in all the cases the results will be inconsistent because you aren’t really complying to :vimgrep’s format. Its never a good idea to just go by opinion (“…I don’t really know why you…”), because every format is a contract, something that ripgrep’s --vimgrep option breaks by using its name, and describing it as such.

Regarding :vimgrep’s format, the col there actually refers to byteoffet/byteindex from start of line. Just FYI, so you don’t print “column number” as “character number”.

–vimgrep prints a line for each match.

But there are only two matches in the multi-line example. Hence, there should be only two lines in the --vimgrep output (and not four). I would have expected to print only the line+column numbers where the match starts. Currently, every line which is part of the match is printed.

In my opinion, --vimgrep and multi-line mode can interact together. With Vim’s internal :vimgrep command, I get two matches (just like with ripgrep’s --json option):

Example file:

foobar
foobar
foo quux

Open the file in Vim and run:

:vimgrep /foobar\nfoobar\nfoo\|quux/ %

Vim will find the following two matches (open Vim’s quickfix window with :copen):

test|1 col 1| foobar
test|3 col 5| foo quux

This is the same result that I also get with --json, after transforming the byte offsets to line+column numbers.

I tried to compare the behavior with GNU/grep and git-grep using their PCRE regex engines but neither one seems to work correctly.

printf 'foobar\nfoobar\nfoo quux' | grep -znP 'foobar\nfoobar\nfoo|quux'

Output:

1:foobar
foobar
foo quux

Unfortunately, grep doesn’t provide a --column option, and multi-line regex patterns work only with the -z option which will treat the entire input as one line making it completely useless for Vim.

(I will dig into your links when I get some time, and I’ll plan to have this fixed for the next release if it’s feasible to do so. I expect it will be.)

Thank you for the quick fix. But shouldn’t there be only two matches just like with the --json output? Since the regex 'foobar\nfoobar\nfoo|quux' is matched only for the first foobar in the first line or the quux in the third line. Hence:

1:1:foobar
3:5:quux

or, to be consistent with the text entry of the json output:

1:1:foobar\nfoobar\nfoo
3:5:quux

For comparison:

printf 'foobar\nfoobar\nfoo quux' | rg -U 'foobar\nfoobar\nfoo|quux' --json | jq 'select(.type == "match")'

Output:

{
  "type": "match",
  "data": {
    "path": {
      "text": "<stdin>"
    },
    "lines": {
      "text": "foobar\nfoobar\nfoo quux"
    },
    "line_number": 1,
    "absolute_offset": 0,
    "submatches": [
      {
        "match": {
          "text": "foobar\nfoobar\nfoo"
        },
        "start": 0,
        "end": 17
      },
      {
        "match": {
          "text": "quux"
        },
        "start": 18,
        "end": 22
      }
    ]
  }
}

@BurntSushi Your example from above doesn’t print the correct numbers with the --vimgrep option:

printf 'foobar\nfoobar\nfoo quux' | rg -U 'foobar\nfoobar\nfoo|quux' --vimgrep

Output:

<stdin>:1:1:foobar
<stdin>:2:1:foobar
<stdin>:3:1:foo quux
<stdin>:3:19:foo quux

I would have expected something like:

<stdin>:1:1:foobar\nfoobar\nfoo
<stdin>:3:5:quux

Or am I misunderstanding something here?