google-cloud-go: logging: Flush should return an error if any of the logs failed

Currently Log will not return an error and neither will Flush.

This means that my only option to wait for all the previous entries to be logged successfully is to Close the client.

cc: @rakyll

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 21 (16 by maintainers)

Most upvoted comments

If Flush returned an error it would allow me to verify the auth is correct.

Use Ping for that.

Flush blocks until all logs have been sent, and returns an error if any of the logging operations failed since last call to Flush, if any. Close calls Flush and returns an error if it fails, or if any of the other things that Close does fails.

I can do that. I’ll make a CL in the next couple of days. I understand why people think that is useful.

But let me explain why it isn’t useful. I actually said it above, but I’ll try again, since no one addressed my points.

The logpipe program is designed incorrectly. It should not fail on the first log error. The best it can do is to write log errors to stderr and let humans decide what to do. That is (surprise!) the default behavior of the logging client. (But if you did want to exit on the first error, just put os.Exit into the OnError function and don’t bother with all the synchronization.)

Arguably, logpipe shouldn’t exit with failure even if there were any errors from the logging service. (Do you really want your 8-hour pipeline to fail because one debug log message didn’t get written?) Maybe it should only fail if all or most of the RPCs failed. I don’t know what the right behavior is, actually, but that’s exactly why nothing that Flush or Close could return would be useful (short of all the errors, which takes too much memory).

I think at the heart of the problem is your view that the logging service is like a database or a file system, where an error means data loss, which must be dealt with. I think of it as a best-effort fire-and-forget service, where a little bit of data loss probably doesn’t matter (but a lot does). Since it’s impossible to know what errors matter, the best the client can do is make all the errors available and let the user decide. Any one error returned from Flush or Close will be at best an approximation. It’s unlikely to be actionable. If an action is taken, it’s likely to be log.Fatal, which is almost certainly too dramatic.

(I’m sure you’ve never written a Go program that exits when log.Print fails. I’m sure because log.Print doesn’t return an error.)

So I think that the right way to treat the error from Close (which will include the error from Flush) is the same way we treat the error when we close a file after reading it:

defer client.Close()

This is what I would expect:

Flush blocks until all logs have been sent, and returns an error if any of the logging operations failed since last call to Flush, if any.

Close calls Flush and returns an error if it fails, or if any of the other things that Close does fails. Calling Log or Flush after Close is an error, maybe even a panic.

I’m very scared of the OnError mechanism, introducing data races is too easy making everyone’s code brittle. If Flush returned an error it would allow me to verify the auth is correct by simply doing:

logger.Log("server is starting")
if err := logger.Flush(); err != nil {
    log.Fatalf("could not log: %v", err)
}