vrl: vrl: null and unicode escapes not working in string literals

A note for the community

  • Please vote on this issue by adding a ๐Ÿ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I recently found out that the null and unicode character escape sequences are not working in string literal expressions in VRL. The documentation makes sense, therefore I think something is off in the VRL parser.

Examples from the VRL REPL:

$ "hello\0world"
error[E202]: syntax error
  โ”Œโ”€ :1:1
  โ”‚
1 โ”‚ "hello\0world"
  โ”‚ ^^^^^^^^^^^^^^ unexpected error: invalid escape character: \0
  โ”‚
  = see language documentation at https://vrl.dev
$ "hello\u{1F30E}world"  # also tested with \U{1F30E} and \u1F30E just in case
error[E202]: syntax error
  โ”Œโ”€ :1:1
  โ”‚
1 โ”‚ "hello\u{1F30E}world"
  โ”‚ ^^^^^^^^^^^^^^^^^^^^^ unexpected error: invalid escape character: \u
  โ”‚
  = see language documentation at https://vrl.dev

All the other documented escape sequences seems to be working fine (although not all printed correctly):

$ "hello\nworld"
"hello\nworld"

$ "hello\"world"
"hello\"world"

$ "hello\'world"
"hello'world"

$ "hello\\world"
"hello\\world"

$ "hello\nworld"
"hello\nworld"

$ "hello\rworld"  # \r (carriage-return) was rendered instead of printed with an escape
world"

$ "hello\tworld"  # \t (tab) was rendered instead of printed with an escape
"hello  world"

Configuration

N/A

Version

vector 0.20.0 (x86_64-unknown-linux-gnu 2a706a3 2022-02-11)

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 2
  • Comments: 19 (12 by maintainers)

Most upvoted comments

This is correct. I didnโ€™t port them over, not for any particular reason, other than wanting to start out simple, and adding more escape sequences as requested by the community.

I think it makes sense to handle escape characters similar to how the Rust compiler handles them.

@StephenWakely I compiled Vector changing every reference of VRL using my Github repo, I just pushed my fork of Vector where I made the change.

arthmoeros/vector@4f8eb9e

So maybe Iโ€™m doing something wrong on Vector compilation? (I dont really know what though hehe)

Ah, you want to specify branch rather than rev.

vrl = { package = "vrl", git = "https://github.com/arthmoeros/vrl", branch = "null_escape_lexer" }

An easier way to test directly in the VRL repo is to just run the cli project:

> cd lib/cli
> cargo run

Iโ€™m glad the omission was just for simplicity of the initial implementation instead of a limitation ๐Ÿ˜ƒ And yes, having parity with Rustโ€™s escape character mechanism would be very nice and complete indeed.

In fairness, I donโ€™t think the missing escape characters I reported are a big use-case, I havenโ€™t seen anyone asking for them except for me in this issue after a long time since the new lexer was introduced.

The use-case for me that triggered discovering the missing escape characters is that I was testing my implementations of ip_ntop() and ip_pton() using the VRL REPL and I wasnโ€™t able to generate strings with arbitrary bytes. In other words, the escape I needed was the binary \xNN escape. Then the documentation pointed me to \uNNNNNN (which is not exactly the same but close-enough), which didnโ€™t work either.