go-carbon: [BUG] find failed, can't expand globs

Describe the bug We attempted to upgrade from go-carbon 0.14.0 to version 0.15.0. Initially the upgrade seemed to go fine, but shortly afterwards we began seeing no data in Grafana and traced the issue back to these errors in the go-carbon logs.

Logs The following log has been sanitized for the hostname and metric strings. The metric string contained dot-delimited alphanumeric characters but no wildcard characters.

Oct 20 15:49:35 hostname go-carbon[130923]: {“level”:“ERROR”,“timestamp”:“2020-10-20T15:49:35.633Z”,“logger”:“access”,“message”:“fetch failed”,“handler”:“render”,“url”:“/render/?format=carbonapi_v3_pb”,“peer”:“127.0.0.1:33600”,“carbonapi_uuid”:“54ff70c2-a91b-47d4-82e0-e95e4a1685c0”,“carbonzipper_uuid”:“54ff70c2-a91b-47d4-82e0-e95e4a1685c0”,“format”:“carbonapi_v3_pb”,“targets”:[“foo.bar.baz”],“runtime_seconds”:0.000319742,“reason”:“failed to read data”,“http_code”:400,“error”:“find failed, can’t expand globs”}

Go-carbon Configuration:

[common]
user = "carbon"
graph-prefix = "carbon.agents.{host}"
metric-endpoint = "local"
max-cpu = 16
metric-interval = "1m0s"

[whisper]
data-dir = "/data/graphite/whisper"
schemas-file = "/etc/go-carbon/storage-schemas.conf"
#aggregation-file = "/etc/go-carbon/storage-aggregation.conf"
workers = 32
max-updates-per-second = 70000
max-creates-per-second = 2000
hard-max-creates-per-second = false
sparse-create = false
flock = false
hash-filenames = false
enabled = true

[cache]
max-size = 80000000
write-strategy = "max"

[udp]
listen = "127.0.0.1:2003"
log-incomplete = true
buffer-size = 0
enabled = true

[tcp]
listen = "127.0.0.1:2003"
buffer-size = 0
#compression = ""
enabled = true

[pickle]
listen = "127.0.0.1:2004"
max-message-size = 67108864
buffer-size = 0
enabled = true

[carbonlink]
listen = "127.0.0.1:7002"
read-timeout = "30s"
enabled = true

[grpc]
listen = "127.0.0.1:7003"
enabled = true

[tags]
tagdb-url = "http://127.0.0.1:8000"
tagdb-chunk-size = 32
tagdb-update-interval = 100
local-dir = "/var/lib/graphite/tagging/"
tagdb-timeout = "1s"
enabled = false

[carbonserver]
listen = "127.0.0.1:8080"
query-cache-enabled = true
query-cache-size-mb = 0
find-cache-enabled = true
buckets = 10
max-globs = 100
fail-on-max-globs = false
metrics-as-counters = false
trigram-index = false
trie-index = true
internal-stats-dir = ""
read-timeout = "1m0s"
idle-timeout = "1m0s"
write-timeout = "1m0s"
scan-frequency = "5m0s"
enabled = true

[dump]
path = "/var/lib/graphite/dump/"
restore-per-second = 0
enabled = false

[pprof]
listen = "127.0.0.1:7007"
enabled = true

[prometheus]
endpoint = "/metrics"
enabled = true

[prometheus.labels]

[[logging]]
logger = ""
file = "stderr"
level = "warn"
encoding = "json"
encoding-time = "iso8601"
encoding-duration = "seconds"

Metric retention and aggregation schemas storage-schemas.conf:

[01.carbon]
pattern = ^carbon\.
retentions = 60s:15d,5m:180d

[02.foo]
pattern = ^foo\.bar\.
retentions = 60s:15d,5m:180d

[03.metric_testing]
pattern = ^metric_testing
retentions = 60s:1d,5m:7d

[04.grafana]
pattern = ^grafana\.
retentions = 60s:15d

[99.default_1min_for_90day]
pattern = .*
retentions = 60s:15d,5m:180d

Simplified query (if applicable) See logging output.

Additional context This is go-carbon version 0.15.0 running on Ubuntu 16.04.7 LTS. It is being used as a backendsv2 backend (protocol carbonapi_v3_pb) for carbonapi version 0.14.1-1.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 35 (22 by maintainers)

Commits related to this issue

Most upvoted comments

I can’t really speak to what a sane default might be. I think the issue here is less about the value and more about making sure folks are aware of it during the upgrade cycle. Hopefully it’s far enough in the past that it won’t matter for most other users.

But in our case, the one query that surfaced this issue was returning almost 30k metrics. I don’t think we have a good way of measuring per-query metric numbers (this would be a useful metric to export imho, either in go-carbon or carbonapi), so for now we’re erring on the side of caution and increasing our setting to 100k.

That XXX_NoUnkeyedLiteral might be related to: https://github.com/go-graphite/protocol/commit/ec66858bcd41d96d7fd8aac4ffcc643763ad2cfd

Currently pinned https://github.com/go-graphite/go-carbon/blob/master/go.mod#L21 version is below that and there were two new fields introduced (and go-graphite/carbonapi uses them now).

Though good thing about that is that they do not matter much for go-carbon (could be somehow useful to take them into account if there is a graphite-web 1.1.x pointing to carbonapi, but that’s not a hard requirement though)

"debug" would be better.

There were no other logs but we were logging at level = "warn". Would it help for me to increase logging output?