vector: Regex transform failing occasionally

Occasionally (on a million or two log lines), Vector would reporting an error like:

Mar 17 07:47:23 m80 vector[22684]: Mar 17 07:47:23.645  WARN transform{name=tinydns-anycast-regex type=regex}: vector::transforms::regex_parser: 1 "Regex pattern failed to match." events were rate limited. rate_limit_secs=5
Mar 17 07:47:23 m80 vector[22684]: Mar 17 07:47:23.645  WARN transform{name=tinydns-anycast-regex type=regex}: vector::transforms::regex_parser: Regex pattern failed to match. field="0000000000000ffff0d39b447:1cce:fb3b + 000c 92.143.112.1[...]" rate_limit_secs=30

The log line that it is getting stuck on is perfectly formatted in the log file. Due to specifics of our setup, each user query results in two log entries with timestamp being the only difference between them. One of the log lines makes it out to the final sink, and the other generates the error above (and is lost).

The twin logs lines looke like this:

@400000005e7080952657f0dc 00000000000000000000ffff0d39b447:1cce:fb3b + 000c 92.143.112.146.in-addr.arpa
@400000005e708095265bb5b4 00000000000000000000ffff0d39b447:1cce:fb3b + 000c 92.143.112.146.in-addr.arpa

The vector.toml source/transform configuration that is processing them looks like this:

data_dir = "/var/lib/vector"

[sources.tinydns-anycast-file]
  type = "file"
  include = ["/var/log/tinydns-anycast/current"]

# Parse out the data to distinct fields
[transforms.tinydns-anycast-regex]
  type = "regex_parser"
  inputs = ["tinydns-anycast-file"]
  regex = '\s*(?P<tai64n>\S+) (?P<ip>[^:]+):(?P<port>[^:]+):(?P<id>[^: ]+) ((?P<res>\S) )?(?P<type>\S+) (?P<qry>\S+)\s*'
"""

<snip>

FWIW, the errors/missed lines occur hours away from the rotation of the input file (/var/log/tinydns-anycast/current in this case).

Any ideas why the transform might be failing and how to avoid those failures?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (11 by maintainers)

Most upvoted comments

@jszwedko aware (and that’s what I’ve been doing over the past week). Although I didnt think of ever doing log(.) neat.

regex_parser has one feature though thats very handy, sequential array execution. When ported to remap we ended up with ~30 ?? seperated parse_regex calls for the same effect which reads a bit less optimally.