FluidFramework: DataProcessingError issues

  1. Looking at DataProcessingError.wrapIfUnrecognized() implementation, I do not understand why we are not using fluidErrorCode and instead coming up with a new concept - dataProcessingCodepath?

     const dpe = new DataProcessingError(errMsg, "" /* fluidErrorCode */);
    
     error.addTelemetryProperties({
         dataProcessingError: 1,
         dataProcessingCodepath,
     });
    
  2. The following code seems wrong. I feel DataProcessingError’s constructor should be protected, and this code likely wants DataCorruptionError instead. It would be also great to have good comments above these classes to clearly articular the expected difference in usage.

         const error = new DataProcessingError(
             "unexpectedAckReceived",
             "unexpectedAckReceived",
             {
    
  3. My main concern though is usage of errorType = ContainerErrorType.dataProcessingError. Given that some data processing error will get it, and some will not (because they already have some other type), presence of this errorType is not something I’d want to rely on. I think dataProcessingError: 1 is the right model here and the right way to find and differentiate these errors.

  4. I think we should have some kind of “undefined” or “none” errorType that can be overwritten by any layer (same way as we “overwrite” it if the error does not conform to IFluidErrorBase), but at the same time workflow allows us to convert error to IFluidErrorBase and add properties to it that will be preserved through any future conversions. I have a feeling that some of the other wrap / wrapErrorAndLog / wrapError flows might have similar concern.

Mark’s responses/proposal

  1. We are retiring fluidErrorCode, see #8929. This dataProcessingCodepath will probably not be interesting to primary consumer of these errors, which will be partners who see the container close like this, but might be interesting to us, so stashing in a new prop.
  2. Agree on both points - hiding the constructor, and adding comments everywhere
  3. See this comment and surrounding conversation. I agree with how it is now, and will try to clarify in the code / comments why it is this way
  4. This is a good point. Today if someone called normalizeError on the way up to a call to CreateProcessingError, it would be thwarted and would not classify it properly (it would remain genericError rather than dataProcessingError. Opened a separate issue: #9027

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (15 by maintainers)

Commits related to this issue

Most upvoted comments

Looking at our data again, now that the instrumentation bug is fixed, it’s back to what Tony showed above - only errorTypes DataCorruptionError and DataProcessingError. With that in mind, I’m going to do a quick PR to update CreateProcessingError function to always wrap, getting rid of the special case when errorType is already set.

So here is what I see - these 2 queries capture more of the essence of what happens (even though they capture same instance of error multiple times as it gets recorded by multiple events). Not all errors show up as error events, but it does not make a difference in how we should format these errors (as they do show up programmatically, as well as in telemetry). Plus some were incorrectly not marked as errors in prior builds. Example

  • eventName = “fluid:telemetry:Container:ContainerClose”
  • errorType = “outOfStorageError”

union Office_Fluid_FluidRuntime_* | where Event_Time > ago(10d) | where Data_dataProcessingError == 1 | summarize count() by Data_errorType, Data_dataProcessingCodepath

Data_errorType	Data_dataProcessingCodepath		count_
dataProcessingError					538
dataCorruptionError					293
dataProcessingError	realizeFluidDataStoreContext	161
authorizationError					66
genericNetworkError					35
outOfStorageError					24
serviceReadOnly						16
invalidFileNameError					3
throttlingError						3
dataProcessingError	deltaManagerInboundErrorHandler	3
fileNotFoundOrAccessDeniedError				2

union Office_Fluid_FluidRuntime_* | where Event_Time > ago(10d) | where Data_dataProcessingError == 1 | summarize count() by Data_errorType, Data_fluidErrorCode, Data_dataProcessingCodepath

Data_errorType		Data_fluidErrorCode	Data_dataProcessingCodepath	count_
dataCorruptionError	duplicateDataStoreCreatedWithExistingId			293
dataProcessingError	realizeFluidDataStoreContext				212
dataProcessingError	deltaManagerInboundErrorHandler				173
dataProcessingError				realizeFluidDataStoreContext	161
dataProcessingError	errorWhileUploadingBlobsWhileAttaching			99
dataProcessingError	channelDeltaConnectionFailedToProcessMessage		54
genericNetworkError	Error 400						27
authorizationError	Error 401						27
outOfStorageError	Error 507						19
authorizationError	Error 403						19
authorizationError	Error 401 (Unauthorized)				18
serviceReadOnly		Error 403 (Forbidden)					9
genericNetworkError	Error 400 (Bad Request)					8
serviceReadOnly		Error 403						7
outOfStorageError	Error 507 (Insufficient Storage)			5
throttlingError		Coherency409						3
dataProcessingError				deltaManagerInboundErrorHandler	3
fileNotFoundOrAccessDeniedError	Error 404					2
authorizationError	Error 403 (Forbidden)					2
invalidFileNameError	invalidFilename						2
invalidFileNameError	Invalid filename. Please try again.			1