bitcoin: memcmp with constants that contain zero bytes are broken in GCC

It appears that there is a bug in certain GCC releases (in the version 9 and 10 series) where an optimization step breaks correctness of memcmp when at least one of the arguments is a compile-time constant array that contains at least one zero byte.

GCC bug is here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189. It was stumbled upon by @roconnor-blockstream in https://github.com/bitcoin-core/secp256k1/pull/822. It is being tracked for libsecp256k1 in https://github.com/bitcoin-core/secp256k1/issues/823.

I have verified that in some instances it also affects C++, and may even affect std::lexicographical_compare.

This may be relevant in some of our code (in particular, the CNetAddr IP range checking does comparisons with constants that contain zeroes, but perhaps more).

Solutions:

  • Build with -fno-builtin-memcmp, but we should measure performance impact.
  • Very carefully inspect the codebase for potential cases, and use a custom memcmp for those.

TODO:

  • Verify if compiler-generated memcmp calls may be affected as well

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 5
  • Comments: 19 (19 by maintainers)

Commits related to this issue

Most upvoted comments

According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189#c33, this has been fixed in GCC 10.3.0, which we currently use to build releases. The relevant backported commit, on the releases/gcc-10 branch, is https://github.com/gcc-mirror/gcc/commit/e4c9aac98611f63847ef6c57916808d9a2d7abcb.

So the fix is in GCC master, 12, 11 and 10 branches. Currently we support GCC 8.1+.

Here is the example of the bug appearing when using the > operator on 2 arrays: https://godbolt.org/z/j5en9d Here it appears with std::lexicographical_compare: https://godbolt.org/z/a6scYK

-fno-builtin-memcmp Doesn’t do anything for either of these (as they didn’t call memcmp)

Technically my claim was, from my very limited understanding, the bug can only change non-zero results from memcpy to zero results. Still, even in this case if the result of memcmp(x,y) is changed from a 1 to a 0, the value of memcmp(...) <= 0 will change from false to true.

I worked up a clang AST query that can be used to detect the test-case. I’m not sure if it’s perfect though as I don’t know exactly what gcc will screw up (copies of pointers, de-references, etc.). If this is useful, it’d be trivial to hook it up to a clang-tidy check, or even a compiler plugin to make it easier to check depends as well.

memcmp.cpp

(exact test-case taken from gcc bug report):

using size_t = decltype(sizeof(0));
int memcmp(const void *s1, const void *s2, size_t n);

static const float z[1] = {0};
int f(float x)
{
    return memcmp(&x, z, sizeof(x));
}

In action:

$ clang-query ../memcmp.cpp -- -std=c++17
clang-query> m callExpr(hasAnyArgument(ignoringImpCasts(declRefExpr(to(varDecl(hasType(qualType(constantArrayType(),isConstQualified())),hasDescendant(integerLiteral(equals(0)))))))),callee(functionDecl(hasName("memcmp"))))

Match #1:

/home/cory/dev/bitcoin-tidy-experiments/build/../memcmp.cpp:7:12: note: "root" binds here
    return memcmp(&x, z, sizeof(x));
           ^~~~~~~~~~~~~~~~~~~~~~~~
1 match.

The query:

callExpr(
    hasAnyArgument(ignoringImpCasts(
        declRefExpr(to(varDecl(
            hasType(qualType(
                constantArrayType(),
                isConstQualified()
            )),
            hasDescendant(integerLiteral(equals(0)))
        ))))
    ),
    callee(functionDecl(hasName("memcmp")))
)

also indentation

I was curious to see how much work it would be to work around this, so I hacked up (and vendored) my libstdc++ headers in order to add detection for the guilty functions, and started a branch of workarounds.

The guilty v9 headers (which call __builtin_memcmp themselves) are:

  • bits/char_traits.h
  • bits/stl_algobase.h

v10 adds:

  • array (I haven’t looked into this one yet)

The workarounds mostly involve creating comparators for containers to avoid the default lexicographical_compare and getting rid of some std::equal usage. Those are pretty straightforward. But then I discovered that many of the boost string parsing functions end up being guilty.

For example: boost::split(vDeploymentParams, strDeployment, boost::is_any_of(":")); ends up internally calling std::__equal<true>::equal().

And while this might be a good excuse to get rid of some of boost usage, I’m afraid that the boost libs themselves are going to be a problem. And I don’t think we want to be patching boost to fix this.

Even worse than boost, glibc and libstdc++ themselves are likely affected as well (for different reasons).

I can push up my partial work if anyone is interested, but it’s not looking like this is a good way forward.

This is a patch against the broken GCC versions that emits a warning when the bug could be triggered: https://gist.github.com/real-or-random/1548239059158370416dfdc6a866329a

When I compile the project with this patch, I don’t get any warning. That’s good news. But unfortunately, this is not perfeclty sound. It catches all known examples but it fails to catch the std::lexicographical_compare example in this thread… Maybe the reason is buried somewhere in this function but I don’t know. https://gcc.gnu.org/onlinedocs/gcc-10.2.0/libstdc++/api/a00596_source.html#l01592