azure-functions-host: Redesign how we log worker process errors
In the current implementation of how we handle worker process errors, we log a message as an error
based on if the message contains the words “error”, “fail” or “severe”, this is fragile and does not cover all cases on when an issue has occurred i.e. exceptions etc.
# Method: ParseErrorMessageAndLog
if ((msg.IndexOf("error", StringComparison.OrdinalIgnoreCase) > -1) ||
(msg.IndexOf("fail", StringComparison.OrdinalIgnoreCase) > -1) ||
(msg.IndexOf("severe", StringComparison.OrdinalIgnoreCase) > -1))
{
LogError(msg);
}
It’s worth noting that this method (ParseErrorMessageAndLog
) is only used by the error callback:
private void OnErrorDataReceived(object sender, DataReceivedEventArgs e)
{
if (e.Data != null)
{
ParseErrorMessageAndLog(e.Data);
}
}
Maybe it is safe to assume that all messages coming from the worker through the error event listener Process.ErrorDataReceived += (sender, e) => OnErrorDataReceived(sender, e);
should be logged as errors, or at least warnings?
Following tasks were identified as next steps to resolve this workitem
- Function host: Review the option of exposing UseStdErrorStreamForErrorsOnly setting to users under a more descriptive app setting value
- Extension Authors: Look into the possibility of intercepting the stdError stream, identify log level through language specific constructs and pass that info via a log prefix.
- Investigation task on dotnet worker (#1116)
- Identify what logs we are missing by introducing worker faults at different points of execution
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 17 (17 by maintainers)
We rarely use the stdout/stderr streams in the first place - we prioritize sending logs over the rpc channel. That being said, users can write directly to stdout/stderr themselves, so I would be hesitant to change this behavior (worker logs should be prefixed with “LanguageWorkerConsoleLog” and user logs are not).
Fwiw, I think the existing default is the “correct” behavior, although perhaps not the most intuitive behavior. Stderr should be assumed to be any diagnostic data, not just error data. I would advocate for improving the error detection (add exception to the list?) and somehow pressuring both users & workers to send proper rpc messages instead where they have to explicitly choose the log level
Have spoken to several language owners on this topic and it appears that this will be beneficial in catching issues that have been missed in our kusto logs. Will need to have a further discussion on usage of RpcLog vs stderr/stdout. We also discussed changing this to be a capability instead of a worker.config setting.
@fabiocav if we feel this is a breaking change, we can punt this down the road, otherwise we can push to next sprint to start making changes.
A few references pointing to
stderr
being used for diagnostic data, although I agree the name is terrible:This came up a lot when I worked on VS Code tooling that referenced a cli (usually core tools). Here is theoretical output for the command
func --version
:It’s really important that “1.0.0” is the only output that goes to “stdout” for anyone parsing the result of
func --version
(aka our VS Code extension). That means all the other stuff has to go to “stderr” even though not all of it is an error. This situation is the main reason that many cli’s use “stderr” for diagnostic data because they want to provide parse-able output and diagnostic data and you need two different streams to do that. I understand the host doesn’t fit neatly into this “parseable output for a cli” scenario, but just explaining why I think using stderr for diagnostic data is a lot more common that you’d think based on the name.It doesn’t look like any of the language workers are utilizing this in their worker.config files. The default value for
UseStdErrorStreamForErrorsOnly
is false, meaning logs from stderr are written with info level by default unless the message contains the keywords “error” , “severe” or “fail”, then the logs are written with error level.@pragnagopa from my understanding, Java is the only language that writes to the stderr stream for all msgs and so only Java should have
UseStdErrorStreamForErrorsOnly == false
and it should betrue
for all other languages?cc: @ejizba, @Francisco-Gamino, @kaibocai, @kshyju, @vrdmr - FYI, this might be impacting how errors from workers are logged and depending on logging behaviour, you maybe need to update your worker.config file
Example
Current behaviour -> an error in the process logs as info
Behaviour with UseStdErrorStreamForErrorsOnly set to true
Ah, that’s great! We’ll validate and close this issue once we do. We’ll work with the language worker owners to update this where appropriate.
I just realized - we did add the capability - https://github.com/Azure/azure-functions-host/blob/dev/src/WebJobs.Script/Workers/Rpc/RpcWorkerDescription.cs#L82 - so worker should be able to specify if stderr stream should be considered as error logs only - @liliankasem / @fabiocav - please give that a try