Clipper2: FP exception in GetIntersectPoint()

Executing the following code results in a floating point exception in function Point64 GetIntersectPoint(const Active& e1, const Active& e2). Namely, when a double value is cast to int64_t, it appears to be outside the range of int64_t.

The coordinates of the polygon vertices are within the range -4.6e18 … 4.6e18. According to the documentation, that is a valid input.

#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <fenv.h>
#include <glob.h>
#include "clipper2/clipper.h"
using namespace Clipper2Lib;

#define ADD(A,X,Y) A.push_back(Point64(int64_t(X), int64_t(Y)))

int main(int, char**) {
    feenableexcept( FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW );

    Path64 a, b;
    ADD(a,          -7009388980LL,     1671079308836362LL);
    ADD(a,  -576460711222920704LL,     1671063793410802LL);
    ADD(a,  -864690986154904320LL,  -286977111392363424LL);
    ADD(a, -1152921411648951168LL,  -575625207815834176LL);
    ADD(a,  -864691057648438912LL,  -864273328105026304LL);
    ADD(a,  -576460762799345152LL, -1152921434302619648LL);
    ADD(a,         -12880478468LL, -1152921451122536832LL);
    ADD(a,   576460787720869760LL, -1152921504606846976LL);
    ADD(a,   864691268131862400LL,  -864273433561362176LL);
    ADD(a,  1152921504606846848LL,  -575625211080225088LL);
    ADD(a,   864691062448491520LL,  -286977010832613792LL);
    ADD(a,   576460654512679296LL,     1671130833237401LL);

    ADD(b,   -54234486065476976LL,  1151250415789406208LL);
    ADD(b,  -612617068314194048LL,  1152921504606846976LL);
    ADD(b,  -864691197085273984LL,   867615548203339008LL);
    ADD(b, -1152921504606846848LL,   578967376389736064LL);
    ADD(b,  -864691140504451968LL,   290319223463759488LL);
    ADD(b,  -576460711222912256LL,     1671063793410802LL);
    ADD(b,          -7009388980LL,     1671079308836362LL);
    ADD(b,   576460654512679296LL,     1671130833237401LL);
    ADD(b,   864690908098943872LL,   290319324023499392LL);
    ADD(b,  1100313577101746304LL,   576428327042791872LL);
    ADD(b,   785779376874207360LL,   863806873623168128LL);
    ADD(b,   487696647658790656LL,  1150382388220066304LL);

    const int m = 10;
    for(size_t i=0; i<a.size(); i++) { a[i].x /= m; a[i].y /= m; b[i].x /= m, b[i].y /= m; }

    Paths64 AA; AA.push_back(a);
    Paths64 BB; BB.push_back(b);
    Paths64 solution = Intersect(Paths64(AA), Paths64(BB), FillRule::NonZero);
}

Division by ten is not essential.

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 144 (75 by maintainers)

Commits related to this issue

Fixed overflow error in GetIntersectPoint (#317) — committed to AngusJohnson/Clipper2 by AngusJohnson 2 years ago
Fixed integer overflows (#317) Other minor tweaks :) — committed to AngusJohnson/Clipper2 by AngusJohnson 2 years ago
bugfixes for GetIntersectPoint() (#317) __cpp_exceptions to enable exceptions (#305) — committed to AngusJohnson/Clipper2 by AngusJohnson 2 years ago
further bugfixes for GetIntersectPoint() (#317) — committed to AngusJohnson/Clipper2 by AngusJohnson 2 years ago
additional tweaks to GetIntersectPoint() (#317) — committed to AngusJohnson/Clipper2 by AngusJohnson 2 years ago

Most upvoted comments

@AngusJohnson I checked your last version of GetIntersectPoint. It throws an exception in the following example.

#include <stdio.h>
#include "clipper2/clipper.h"
using namespace Clipper2Lib;

#define ADD(A,X,Y) A.push_back(Point64(int64_t(X), int64_t(Y)))

int main(int, char**) {
    Path64 a,b;
    ADD(a,-6991627179LL,1666844791673082LL);
    ADD(a,-574999959023595328LL,1666829315563631LL);
    ADD(a,-862499858060358144LL,-286249911015197856LL);
    ADD(a,-1149999907277659648LL,-574166572783239680LL);
    ADD(a,-862499929372727936LL,-862083258356526976LL);
    ADD(a,-575000010469324992LL,-1149999929873923840LL);
    ADD(a,-12847839319LL,-1149999946651219328LL);
    ADD(a,575000035327698432LL,-1150000000000000128LL);
    ADD(a,862500139322786176LL,-862083363545636480LL);
    ADD(a,1150000000000000000LL,-574166576039358592LL);
    ADD(a,862499934160617216LL,-286249810710266752LL);
    ADD(a,574999902457057728LL,1666896185511221LL);
    ADD(b,-54097055806558968LL,1148333145723817472LL);
    ADD(b,-611064695858513920LL,1150000000000000128LL);
    ADD(b,-862500068456230016LL,865417009264721152LL);
    ADD(b,-1150000000000000000LL,577500272297585920LL);
    ADD(b,-862500012018783744LL,289583554170215680LL);
    ADD(b,-574999959023586880LL,1666829315563631LL);
    ADD(b,-6991627179LL,1666844791673082LL);
    ADD(b,574999902457057728LL,1666896185511221LL);
    ADD(b,862499780202191616LL,289583654475137088LL);
    ADD(b,1097525381052289280LL,574967656905021504LL);
    ADD(b,783788210901215872LL,861617985870938496LL);
    ADD(b,486460823713113792LL,1147467317737478272LL);

    Paths64 AA; AA.push_back(a);
    Paths64 BB; BB.push_back(b);
    Paths64 solution = Difference(Paths64(AA), Paths64(BB), FillRule::NonZero);
}

bahvalo on Nov 14, 2022

Thanks. I will have a good look again, I just need a day or two to enjoy the weekend and recharge.

I’m getting seriously off-topic, but thanks for everything you do.

I actually admire that kind of dedication; I also would have a ton of code to share, but I guess I’m just scared of getting stuck doing tech support (or I don’t want that extra source of stress on top?). Either way, thanks, it’s well appreciated. I applaud you.

alexisnaveros on Nov 12, 2022

I had to look that up, it appears the Microsoft compiler doesn’t support a 128 bits integers type. But they have the intrinsics _mul128(), _udiv128(), __shiftright128(), _addcarry_u64(), _subborrow_u64(), __lzcnt64() which could be used to implement everything.

So it could be done… although with a #if for MSVC, another for gcc/clang, and a generic fallback.

Here’s BP5 and BP6:

static inline int64_t idiv128( __int128 num, int64_t denom )
{
  int64_t res;
  asm (
  "idivq %1\n\t"
  : "=a"(res) : "rm" (denom), "A" (num) );
  return res;
}

/* denom must be >= 0 for proper rounding behavior */
static inline int64_t idiv128_round( __int128 num, int64_t denom )
{
  int64_t res, rem;
  asm (
  "idivq %2\n\t"
  : "=a"(res), "=d"(rem) : "rm" (denom), "A" (num) );
  res += ( rem >= ( denom >> 1 ) );
  res -= ( rem <= -( denom >> 1 ) );
  return res;
}

static inline  __int128 imul128( int64_t a, int64_t b )
{
  __int128 res;
  asm (
  "imulq %2\n\t"
  : "=A"(res) : "a" (a), "rm" (b) );
  return res;
}

/* GCC is smart enough to grab the higher half of the __int128 register pair, without any store/load/shift/whatever */
/* Can all compilers do that? Feel free to tweak as necessary for your compiler to emit good code */
#if 1
 #define MATH_INT128_GET_HIGH64(x) (int64_t)((x)>>64)
#else
 #define MATH_INT128_GET_HIGH64(x) (int64_t)(*((int64_t *)(&x)+1))
#endif

static inline int mathVertexLineLineIntersection_PB5( int64_t *hitpt, int64_t l0p0x, int64_t l0p0y, int64_t l0p1x, int64_t l0p1y, int64_t l1p0x, int64_t l1p0y, int64_t l1p1x, int64_t l1p1y )
{
  int64_t xormask, denomhigh;
  __int128 denom;
  __int128 alpha_num;
  denom = imul128( l0p1x - l0p0x, l1p1y - l1p0y ) - imul128( l0p1y - l0p0y, l1p1x - l1p0x );
  if( denom == 0 )
  {
    // segments lay on parallel lines
    // special case with segments on the same line -- may be considered if necessary
    return 0;
  }
  else
  {
    alpha_num = imul128( l1p0x - l0p0x, l1p1y - l1p0y ) - imul128( l1p0y - l0p0y, l1p1x - l1p0x );
    xormask = MATH_INT128_GET_HIGH64( denom ) >> 63;
    alpha_num = ( alpha_num ^ xormask ) - xormask;
    denom = ( denom ^ xormask ) - xormask;
#if ABORT_IF_HIT_IS_OUTSIDE_LINE_SEGMENTS
    __int128 beta_num;
    if( (unsigned __int128)alpha_num > denom )
      return 0;
    beta_num  = imul128( l1p0x - l0p0x, l0p1y - l0p0y ) - imul128( l1p0y - l0p0y, l0p1x - l0p0x );
    if( (unsigned __int128)beta_num > denom )
      return 0;
#endif
    denomhigh = MATH_INT128_GET_HIGH64( denom );
    if( denomhigh )
    {
      int shift = 65 - __builtin_clzll( denomhigh );
      alpha_num >>= shift;
      denom >>= shift;
    }
    hitpt[0] = l0p0x + idiv128_round( imul128( alpha_num, l0p1x - l0p0x ), denom );
    hitpt[1] = l0p0y + idiv128_round( imul128( alpha_num, l0p1y - l0p0y ), denom );
  }
  return 1;
}

static inline int mathVertexLineLineIntersection_PB6( int64_t *hitpt, int64_t l0p0x, int64_t l0p0y, int64_t l0p1x, int64_t l0p1y, int64_t l1p0x, int64_t l1p0y, int64_t l1p1x, int64_t l1p1y )
{
  int shift;
  int64_t xormask, denomhigh, denomlow;
  int64_t signbit, halfdenom;
  __int128 denom;
  __int128 alpha_num;
  __int128 hitx, hity;
  denom = imul128( l0p1x - l0p0x, l1p1y - l1p0y ) - imul128( l0p1y - l0p0y, l1p1x - l1p0x );
  if( denom == 0 )
  {
    // segments lay on parallel lines
    // special case with segments on the same line -- may be considered if necessary
    return 0;
  }
  else
  {
    alpha_num = imul128( l1p0x - l0p0x, l1p1y - l1p0y ) - imul128( l1p0y - l0p0y, l1p1x - l1p0x );
    xormask = MATH_INT128_GET_HIGH64( denom ) >> 63;
    alpha_num = ( alpha_num ^ xormask ) - xormask;
    denom = ( denom ^ xormask ) - xormask;
#if ABORT_IF_HIT_IS_OUTSIDE_LINE_SEGMENTS
    __int128 beta_num;
    if( (unsigned __int128)alpha_num > denom )
      return 0;
    beta_num  = imul128( l1p0x - l0p0x, l0p1y - l0p0y ) - imul128( l1p0y - l0p0y, l0p1x - l0p0x );
    if( (unsigned __int128)beta_num > denom )
      return 0;
#endif
    denomhigh = MATH_INT128_GET_HIGH64( denom );
    if( denomhigh )
    {
      shift = 65 - __builtin_clzll( denomhigh );
      alpha_num >>= shift;
      denom >>= shift;
    }
    denomlow = (int64_t)denom;
    halfdenom = denomlow >> 1;
    hitx = imul128( alpha_num, l0p1x - l0p0x );
    signbit = MATH_INT128_GET_HIGH64( hitx ) >> 63;
    hitx += ( halfdenom ^ signbit ) - signbit;
    hitpt[0] = l0p0x + idiv128( hitx, denomlow );
    hity = imul128( alpha_num, l0p1y - l0p0y );
    signbit = MATH_INT128_GET_HIGH64( hity ) >> 63;
    hity += ( halfdenom ^ signbit ) - signbit;
    hitpt[1] = l0p0y + idiv128( hity, denomlow );
  }
  return 1;
}

The difference between the two is just rounding occurring after the division (PB5) or before (PB6).

alexisnaveros on Nov 19, 2022

Hey, I improved your version a little bit:

/* denom must be >= 0 for proper rounding behavior */
__attribute__ ((always_inline)) static inline int64_t idiv128_round( __int128 num, int64_t denom )
{
  int64_t res, rem;
  asm (
  "idivq %2\n\t"
  : "=a"(res), "=d"(rem) : "rm" (denom), "A" (num) );
  res += ( rem >= ( denom >> 1 ) );
  res -= ( rem <= -( denom >> 1 ) );
  return res;
}

__attribute__ ((always_inline)) static inline  __int128 imul128( int64_t a, int64_t b )
{
  __int128 res;
  asm (
  "imulq %2\n\t"
  : "=A"(res) : "a" (a), "rm" (b) );
  return res;
}

static inline int mathVertexLineLineIntersection_PB5( int64_t *hitpt, int64_t l0p0x, int64_t l0p0y, int64_t l0p1x, int64_t l0p1y, int64_t l1p0x, int64_t l1p0y, int64_t l1p1x, int64_t l1p1y )
{
  __int128 denom;
  __int128 alpha_num;
  denom = imul128( l0p1x - l0p0x, l1p1y - l1p0y ) - imul128( l0p1y - l0p0y, l1p1x - l1p0x );
  if( denom == 0 )
  {
    // segments lay on parallel lines
    // special case with segments on the same line -- may be considered if necessary
    return 0;
  }
  else
  {
    alpha_num = imul128( l1p0x - l0p0x, l1p1y - l1p0y ) - imul128( l1p0y - l0p0y, l1p1x - l1p0x );
    if( denom < 0 )
    {
      alpha_num = -alpha_num;
      denom = -denom;
    }
    int64_t *high = (int64_t *)(&denom)+1;
    if( *high )
    {
      int shift = 65 - __builtin_clzll( *high );
      alpha_num >>= shift; 
      denom >>= shift;
    }
    hitpt[0] = l0p0x + idiv128_round( imul128( alpha_num, l0p1x - l0p0x ), denom );
    hitpt[1] = l0p0y + idiv128_round( imul128( alpha_num, l0p1y - l0p0y ), denom );
  }
  return 1;
}

It’s not longer checking if the intersection point is within the two edges, I suspect Clipper2 wants the intersection point between the two vectors; it already decided that they intersect when that function is called.

The extra rounding corrects these tiny errors:

Line0: 12500000000000 24999999999981 to 18750000000000 12499999999990
Line1: 12499999999990 25000000000000 to 18749999999991 12500000000008
 -=Intersection =-
We should get : 18749999999910 12500000000170
          PB3 : 18749999999909 12500000000170 ~ error 1
          PB4 : 18749999999909 12500000000171 ~ error 2
          PB5 : 18749999999910 12500000000170 ~ error 0
       BN1024 : 18749999999910 12500000000170 ~ error 0

Benchmark A     : 51 ms
Benchmark PB3   : 212 ms
Benchmark PB4   : 203 ms
Benchmark PB5   : 205 ms
Benchmark BN1024: 32458 ms

Hey, don’t laugh at my 1024 bits solution! It’s fast, I’m telling you. 😁

alexisnaveros on Nov 16, 2022

As this saga still doesn’t want to end, as I was really tired of never knowing what the proper answers were, I wrote an intersection test using my 1024 bits math library (all written in amd64 assembly, yarr).

The BN1024 results are absolutely correct, the two cases above:

Line0: 12500000000000 24999999999981 to 18750000000000 12499999999990
Line1: 12499999999990 25000000000000 to 18749999999991 12500000000008
 -=Intersection =-
We should get : 18749999999910 12500000000170
            A : 18716242253290 12481001853171 ~ error 38736482110
            B : 18749067298232 12501865403526 ~ error 2085584355
            C : 18747573756714 12504808554489 ~ error 5385986510
            D : 18743051132788 12513688432033 ~ error 15351218882
            E : 18745878208877 12508243582235 ~ error 9216604938
            F : 18745878208877 12508243582235 ~ error 9216604938
            G : 18743051132788 12513688432033 ~ error 15351218882
            H : 18737564891724 12524870216543 ~ error 27805747213
          PB3 : 18749999999909 12500000000170 ~ error 1
       BN1024 : 18749999999910 12500000000170 ~ error 0

Only PB3 gets it right. The difference of 1 is due to a lack of rounding in PB3(), easily fixed.

Line0: 18749999999991 12500000000008 to 24999999999991 20
Line1: 18750000000000 12499999999990 to 25000000000000 1
 -=Intersection =-
We should get : 18749999999892 12500000000206
            A : 18800852960221 12516376449324 ~ error 53424822508
            B : 18747327264085 12505345471819 ~ error 5976418950
            C : 18750018496234 12499721864080 ~ error 278750461
            D : 18749999999892 12500000000206 ~ error 0
            E : 18744313563339 12511372873313 ~ error 12715258684
            F : 18744313563339 12511372873313 ~ error 12715258684
            G : 18750075295280 12499795466152 ~ error 217953148
            H : 18750006649306 12499986701378 ~ error 14868542
          PB3 : -1 -1 ~ error 22534695471676
       BN1024 : 18749999999892 12500000000206 ~ error 0

The intersection point between the two vectors is actually outside of the line0 segment, the segments don’t intersect. PB3 returned a failure because this check fails:

    if( alpha_num < 0 || beta_num < 0 || alpha_num > denom || beta_num > denom )
      return 0;

and that’s quite correct, the hit is actually outside the line0 range (alpha_num is negative).

alexisnaveros on Nov 16, 2022

My last example yields

S =     1.875060039840250e+27
S =     1.874999999997778e+27
S =     1.875347110620309e+27
S =     1.874999999997365e+27
S =     1.874999999997015e+27
S =     1.874999999995516e+27

Rounding the input coordinates can’t affect so much. Assuming that the last result is correct, the error of the first intersection area is 6e+22. For the initial data within +/- 1e+14 range, I consider this result to be inaccurate.

bahvalo on Nov 16, 2022

Well, I guess that’s the fastest version we got, with perfect accuracy:

#define CC_MIN(x,y) ((x)>(y)?(y):(x))
#define CC_MAX(x,y) ((x)<(y)?(y):(x))
#define DETECT_HIT_OVERFLOW (0)

static inline int mathVertexLineLineIntersection_F( int64_t *hitpt, int64_t l0p0x, int64_t l0p0y, int64_t l0p1x, int64_t l0p1y, int64_t l1p0x, int64_t l1p0y, int64_t l1p1x, int64_t l1p1y )
{
  int64_t originx, originy;
  int64_t bb0minx, bb0miny, bb0maxx, bb0maxy, bb1minx, bb1miny, bb1maxx, bb1maxy;
  double ln0a, ln0b, ln1a, ln1b;
  double ln0c, ln1c, det, hitx, hity;
  ln0a = (double)( l0p1y - l0p0y );
  ln0b = (double)( l0p0x - l0p1x );
  ln1a = (double)( l1p1y - l1p0y );
  ln1b = (double)( l1p0x - l1p1x );
  det = ( ln1a * ln0b ) - ( ln0a * ln1b );
  if( det == 0.0 )
    return 0;
  bb0minx = CC_MIN( l0p0x, l0p1x );
  bb0miny = CC_MIN( l0p0y, l0p1y );
  bb0maxx = CC_MAX( l0p0x, l0p1x );
  bb0maxy = CC_MAX( l0p0y, l0p1y );
  bb1minx = CC_MIN( l1p0x, l1p1x );
  bb1miny = CC_MIN( l1p0y, l1p1y );
  bb1maxx = CC_MAX( l1p0x, l1p1x );
  bb1maxy = CC_MAX( l1p0y, l1p1y );
  originx = ( CC_MIN( bb0maxx, bb1maxx ) + CC_MAX( bb0minx, bb1minx ) ) >> 1;
  originy = ( CC_MIN( bb0maxy, bb1maxy ) + CC_MAX( bb0miny, bb1miny ) ) >> 1;
  ln0c = ( ln0a * (double)( l0p0x - originx ) ) + ( ln0b * (double)( l0p0y - originy ) );
  ln1c = ( ln1a * (double)( l1p0x - originx ) ) + ( ln1b * (double)( l1p0y - originy ) );
  hitx = ( ( ln0b * ln1c ) - ( ln1b * ln0c ) ) / det;
  hity = ( ( ln1a * ln0c ) - ( ln0a * ln1c ) ) / det;
#if DETECT_HIT_OVERFLOW
  if( fmax( fabs( (double)originx + hitx ), fabs( (double)originy + hity ) ) >= (double)(INT64_MAX-1) )
    return 0;
#endif
  hitpt[0] = originx + (int64_t)nearbyint( hitx );
  hitpt[1] = originy + (int64_t)nearbyint( hity );
  return 1;
}

It requires input to be within INT64_MAX/2 range, and assumes that detecting output hit overflow is pointless as it can’t happen. My only concern is that some compilers might output a ton of branches instead of conditional moves.

I do have a SSE 4.2 version, but it’s only 38% faster (_mm_cmpgt_epi64/_mm_blendv_epi8 is not really better than cmp/cmov/cmov), assuming you compile with -ffast-math (to output one divsd instead of two).

EDIT: And the second place is only 5% slower, without any risk of a compiler emitting branches:

static inline int mathVertexLineLineIntersection_G( int64_t *hitpt, int64_t l0p0x, int64_t l0p0y, int64_t l0p1x, int64_t l0p1y, int64_t l1p0x, int64_t l1p0y, int64_t l1p1x, int64_t l1p1y )
{
  int64_t originx, originy;
  double ln0a, ln0b, ln1a, ln1b;
  double ln0c, ln1c, det, detinv;
  ln0a = (double)( l0p1y - l0p0y );
  ln0b = (double)( l0p0x - l0p1x );
  ln1a = (double)( l1p1y - l1p0y );
  ln1b = (double)( l1p0x - l1p1x );
  det = ( ln1a * ln0b ) - ( ln0a * ln1b );
  if( det == 0.0 )
    return 0;
  detinv = 1.0 / det;
  ln0c = ( ln0a * (double)l0p0x ) + ( ln0b * (double)l0p0y );
  ln1c = ( ln1a * (double)l1p0x ) + ( ln1b * (double)l1p0y );
  originx = (int64_t)nearbyint( ( ( ln0b * ln1c ) - ( ln1b * ln0c ) ) * detinv );
  originy = (int64_t)nearbyint( ( ( ln1a * ln0c ) - ( ln0a * ln1c ) ) * detinv );
  ln0c = ( ln0a * (double)( l0p0x - originx ) ) + ( ln0b * (double)( l0p0y - originy ) );
  ln1c = ( ln1a * (double)( l1p0x - originx ) ) + ( ln1b * (double)( l1p0y - originy ) );
  hitpt[0] = originx + (int64_t)nearbyint( ( ( ln0b * ln1c ) - ( ln1b * ln0c ) ) * detinv );
  hitpt[1] = originy + (int64_t)nearbyint( ( ( ln1a * ln0c ) - ( ln0a * ln1c ) ) * detinv );
  return 1;
}

alexisnaveros on Nov 14, 2022

Perhaps offer a compilation-time switch then? Some people need speed, other people need accuracy and robustness above all.

Robustness is a given regardless of the degree of precision required. And that’s why I was chasing so hard the overflow errors above. And precision is important too for everyone, up to a point it unacceptably compromises performance. And yes I am now considering a compiler switch as evidently some (?very few) like you do require extremely high (> 1.0e12) degrees of precision.

AngusJohnson on Nov 14, 2022

Hey Angus. Well, perfect accuracy is achievable without special floating point types, it just needs code that’s a bit more complicated… and slower, yes.

Perhaps offer a compilation-time switch then? Some people need speed, other people need accuracy and robustness above all (bahvalo above seems to require accuracy ~ so do I; my computations take hours/days, the result is useless if incorrect, and I sure want to switch to Clipper2 eventually).

Well, I think I’ll cook up a quick SSE optimized version, and without requiring AVX-512 (int64_t<->double is awkward without _mm_cvtepi64_pd() and _mm_cvtpd_epi64(), but I’ll figure out something). Perhaps a 2-3x speed boost will make you change your mind? 😉

alexisnaveros on Nov 13, 2022

I made the tweaks in a clipper.engine.cpp I had from a couple days ago, the header says Date : 28 October 2022. The numerical precision and results improved as described above. Here’s that actual tweaked file:

Thanks. I will have a good look again, I just need a day or two to enjoy the weekend and recharge.

AngusJohnson on Nov 12, 2022

@alexisnaveros

Infinities, like other abnormal numbers (including denormals), have abysmal performance on almost all CPUs. It varies between a couple times slower and several hundred times slower.

It is true for 387 ISA and not for SSE2. I believe you hardly would pick Pentium2 nowadays for these tasks.

#include <chrono>
#include <iomanip>
#include <iostream>
#include <numeric>
#include <utility>
#include <vector>
#include <algorithm>

 
int main()
{
    std::cout.imbue(std::locale(""));
    std::cout << std::fixed << std::setprecision(1);
    auto eval = [](auto fun) {
        const auto t1 = std::chrono::high_resolution_clock::now();
        const auto name = fun();
        const auto t2 = std::chrono::high_resolution_clock::now();
        const std::chrono::duration<double, std::milli> ms = t2 - t1;
        std::cout << std::setw(28) << std::left << name << "\t time: " << ms.count() << " ms\n";
    };
    {
        std::vector<double> v(100'000'007, 0.1);
 
        eval([&v]{
            double m = std::numeric_limits<double>::max();
	    std::for_each(v.begin(), v.end(), [m](double &d){ d *= m; });
            return "*= std::numeric_limits<double>.max()";
        });
    }
    {
        std::vector<double> v(100'000'007, 0.1);
 
        eval([&v]{
            double m = std::numeric_limits<double>::infinity();
	    std::for_each(v.begin(), v.end(), [m](double &d){ d *= m; });
            return "*= std::numeric_limits<double>.infinity()";
        });
    }
    {
        std::vector<double> v(100'000'007, 0.1);
 
        eval([&v]{
            double m = std::numeric_limits<double>::max();
	    std::for_each(v.begin(), v.end(), [m](double &d){ d += m; });
            return "+= std::numeric_limits<double>.max()";
        });
    }
    {
        std::vector<double> v(100'000'007, 0.1);
 
        eval([&v]{
            double m = std::numeric_limits<double>::infinity();
	    std::for_each(v.begin(), v.end(), [m](double &d){ d += m; });
            return "+= std::numeric_limits<double>.infinity()";
        });
    }
    {
        std::vector<double> v(100'000'007, 0.1);
 
        eval([&v]{
            double m = std::numeric_limits<double>::max();
	    std::for_each(v.begin(), v.end(), [m](double &d){ d /= m; });
            return "/= std::numeric_limits<double>.max()";
        });
    }
    {
        std::vector<double> v(100'000'007, 0.1);
 
        eval([&v]{
            double m = std::numeric_limits<double>::infinity();
	    std::for_each(v.begin(), v.end(), [m](double &d){ d /= m; });
            return "/= std::numeric_limits<double>.infinity()";
        });
    }
}

Compiled with g++ -mfpmath=sse -std=c++17 -O1 Output:

*= std::numeric_limits<double>.max()	 time: 87.0 ms
*= std::numeric_limits<double>.infinity()	 time: 111.7 ms
+= std::numeric_limits<double>.max()	 time: 86.0 ms
+= std::numeric_limits<double>.infinity()	 time: 79.1 ms
/= std::numeric_limits<double>.max()	 time: 5,227.6 ms
/= std::numeric_limits<double>.infinity()	 time: 143.5 ms

sergey-239 on Nov 12, 2022

It’s a property of FP types representation, not of a language.

Yep, see my edit above.😁

AngusJohnson on Nov 11, 2022

Thank you Alexis. That’s very helpful and might solve this problem.

AngusJohnson on Nov 11, 2022