tensorflow: Two roundings in MultiplyByQuantizedMultiplier(TFLite) leads to inconsistent result with tensorflow

https://github.com/tensorflow/tensorflow/blob/ff91cd691027076e6128afbd902d3d38e3672787/tensorflow/lite/kernels/internal/common.h#L105 Per my understanding, MultiplyByQuantizedMultiplier is used to simulate floating point multiplication. But in some cases, the two roundings in it will cause different result from floating point arithmetic, which will further lead to unpredictable tflite accuracy compared with tensorflow training pipeline.

inline int32 MultiplyByQuantizedMultiplier(int32 x, int32 quantized_multiplier, int shift) Assume x = 7984, quantized_multiplier = 1583594044, shift = -9 (the simulated floating point arithmetic is std::round(7984 * 0.0014402703931141053)). The multiplication of x and quantized_multiplier will be 12643414847296(0xB7FC6402F40), it will undergo two rounding shift(one in SaturatingRoundingDoublingHighMul and the other one in RoundingDivideByPOT), and results in number 12. But the floating point result is 11. We can find that if we combine these two rounding shifts into one, i.e. do only one rounding shift, the results will be 12, which matches with floating point result.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 16 (4 by maintainers)

Most upvoted comments

Thanks @bjacob for the detailed responses.

We cannot change the default rounding mode in TFLite due to backwards compat issues, that being said since you are targetting a microcontroller, TFLite Micro may be able to accomodate this need in their infra. Will assign to them to get their thoughts.

Thanks!

Bit-exactness is not part of the contract here. Do whatever is friendly to your target hardware as long as it’s a reasonable round-to-nearest scheme without significant bias.

It is expected that different hardware targets will run different, non-bit-exact flavors of MultiplyByQuantizedMultiplier and RoundingRightShift.

Regarding MultiplyByQuantizedMultiplier: For example, we have seen people use 16-bit fixedpoint instead of 32-bit fixedpoint multipliers on Qualcomm HVX, and we have seen people use float32 instead of 32-bit fixedpoint multipliers on Google EdgeTPU.

Regarding RoundingRightShift: Even on ARM NEON, our thinking has evolved a bit. We used to think that it would be important for the rounding right shift to break ties away from zero, out of a concern that breaking ties upward would be “biased”. However, in practice, away-from-zero is only unbiased if the data is centered around zero, which it not always is, so there was no practical unbiasing benefit from that, so we now happily use ARM NEON’s own “rounding right shift”, breaking ties upward.

In the newer matrix multiplication library used by TFLite in the quantized case on ARM, we are using SQRDMULH followed by RSHL i.e. native NEON rounding shift per the previous paragraph, by default. We’ve retained the ability to compile to the old break-ties-away-from-zero way but might drop it soon. Search for NATIVE_ROUNDING in this file, https://github.com/google/ruy/blob/a0ca5e6d3d1b1771197368c5cf38de1f9fe7f1e0/ruy/kernel_arm64.cc

We hadn’t bothered to update these TFLite functions because not much time was being spent there.

Do you care about the efficiency of these functions because they’re significant on your CPU profiles or do you care about the clarification that bit-exactness is not required for your own ports? In the former case, we’ll happily take a patch optimizing these functions by relaxing them as in the above-linked ruy code. In the latter case, this might be more of a documentation issue? Do you still see a need for SaturatingRoundingDoublingHighMulWithShift given the non-requirement of bit-exactness ?

I do agree that the function as a whole would no longer translate to a common hardware instruction.

Right, that’s the concern that I was highlighting here.

I agree with everything you say. I hadn’t been following these developments around TFLite for microcontrollers. It does sound like some documentation or at least comments to make the conclusions of the present discussion more discoverable, would be useful.

Point taken that, SaturatingRoundingDoublingHighMulWithShift, while less friendly to NEON hardware, is more friendly to Cortex-M, so it arguably makes a better reference function to have here in TFLite code. I’d like to let others take it from here, as I haven’t been very closely involved in this code for the past 2 years and I shouldn’t end up owning this again. @suharshs ?