oqs-provider: Unable to run tests on OSX

Updated

To make some details clear, as previous overly-generic description invited useless overly-generic observations, like “could not replicate on Windows”.

Describe the bug ctest crashes with SIGTRAP.

To Reproduce Steps to reproduce the behavior:

  1. Build or get installed OpenSSL system-wide, e.g., in /opt/local/libexec/openssl3. For this test I used OpenSSL-3.1.0, and used Macports to get the binary installed.
  2. Build and install liboqs system-wide. I used liboqs master, and installed it in opt/local: /opt/local/lib for the shared library, /opt/local/include for the header files.
  3. Clone, build, and install this provider (don’t forget to edit openssl.cnf as appropriate).
  4. export OPENSSL_APP=/opt/local/libexec/openssl3/bin/openssl, export OPENSSL_MODULES=/opt/local/libexec/openssl3/lib/ossl-modules
  5. Optional? To make the environment closer to mine, install pkcs11-provider and GOST engine system-wide, and adjust openssl.cnf to point at them.
  6. Further complication Install oqs-provider and make it available system-wide by adding it to openssl.cnf.
  7. Go to _build and do ctest --output-on-failure
  8. Observe the error report.

Expected behavior Tests passing.

Screenshots Screenshot 2023-03-27 at 3 49 24 PM

Crash report

Translated Report (Full Report Below)
-------------------------------------

Process:               oqs_test_kems [17508]
Path:                  /Users/USER/*/oqs_test_kems
Identifier:            oqs_test_kems
Version:               ???
Code Type:             ARM-64 (Native)
Parent Process:        ctest [17500]
Responsible:           Terminal [983]
User ID:               501

Date/Time:             2023-03-27 15:39:40.6519 -0400
OS Version:            macOS 13.2.1 (22D68)
Report Version:        12
Anonymous UUID:        161C054B-E964-CDD3-5EBC-5A9DBE3E2AE2


Time Awake Since Boot: 66000 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BREAKPOINT (SIGTRAP)
Exception Codes:       0x0000000000000001, 0x00000001a6283108

Termination Reason:    Namespace SIGNAL, Code 5 Trace/BPT trap: 5
Terminating Process:   exc handler [17508]

Application Specific Information:
BUG IN CLIENT OF LIBPLATFORM: Trying to recursively lock an os_once_t
Abort Cause 259


Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_platform.dylib      	       0x1a6283108 _os_once_gate_recursive_abort + 36
1   libsystem_platform.dylib      	       0x1a627f710 _os_once_gate_wait + 348
2   libsystem_pthread.dylib       	       0x1a624dd84 pthread_once + 100
3   libcrypto.3.dylib             	       0x1012bdfcc CRYPTO_THREAD_run_once + 12
4   libcrypto.3.dylib             	       0x1012d0ea4 ossl_obj_add_object + 236
5   gostprov.dylib                	       0x1015bf5b4 populate_gost_engine + 116
6   gostprov.dylib                	       0x1015bd478 OSSL_provider_init + 116
7   libcrypto.3.dylib             	       0x1012bbddc provider_activate + 260
8   libcrypto.3.dylib             	       0x1012bbc48 ossl_provider_activate + 56
9   libcrypto.3.dylib             	       0x1012ba93c provider_conf_init + 608
10  libcrypto.3.dylib             	       0x101212c4c CONF_modules_load + 856
11  libcrypto.3.dylib             	       0x101212ee8 CONF_modules_load_file_ex + 120
12  libcrypto.3.dylib             	       0x101213738 ossl_config_int + 68
13  libcrypto.3.dylib             	       0x1012b2400 ossl_init_config_ossl_ + 16
14  libsystem_pthread.dylib       	       0x1a624ddec __pthread_once_handler + 76
15  libsystem_platform.dylib      	       0x1a627d7e0 _os_once_callout + 32
16  libsystem_pthread.dylib       	       0x1a624dd84 pthread_once + 100
17  libcrypto.3.dylib             	       0x1012bdfcc CRYPTO_THREAD_run_once + 12
18  libcrypto.3.dylib             	       0x1012b2208 OPENSSL_init_crypto + 1104
19  libcrypto.3.dylib             	       0x1012d1098 obj_lock_initialise_ossl_ + 20
20  libsystem_pthread.dylib       	       0x1a624ddec __pthread_once_handler + 76
21  libsystem_platform.dylib      	       0x1a627d7e0 _os_once_callout + 32
22  libsystem_pthread.dylib       	       0x1a624dd84 pthread_once + 100
23  libcrypto.3.dylib             	       0x1012bdfcc CRYPTO_THREAD_run_once + 12
24  libcrypto.3.dylib             	       0x1012d0408 OBJ_sn2nid + 112
25  libcrypto.3.dylib             	       0x1012d02f4 OBJ_txt2obj + 216
26  libcrypto.3.dylib             	       0x1012d0944 OBJ_txt2nid + 20
27  libcrypto.3.dylib             	       0x1012bd048 core_obj_create + 36
28  oqsprovider.0.5.0-dev.dylib   	       0x100f73678 OSSL_provider_init + 292
29  libcrypto.3.dylib             	       0x1012bbddc provider_activate + 260
30  libcrypto.3.dylib             	       0x1012bbc48 ossl_provider_activate + 56
31  libcrypto.3.dylib             	       0x1012ba93c provider_conf_init + 608
32  libcrypto.3.dylib             	       0x101212c4c CONF_modules_load + 856
33  libcrypto.3.dylib             	       0x101212ee8 CONF_modules_load_file_ex + 120
34  libcrypto.3.dylib             	       0x1012af1a4 OSSL_LIB_CTX_load_config + 20
35  oqs_test_kems                 	       0x100de7420 main + 80
36  dyld                          	       0x1a5f27e50 start + 2544


Thread 0 crashed with ARM Thread State (64-bit):
    x0: 0x0000000000000103   x1: 0x000000016f019b90   x2: 0x00000001a624dda0   x3: 0x0000000000000103
    x4: 0x000000000000000a   x5: 0x0000000024200000   x6: 0x0000000000000000   x7: 0x0000000000000500
    x8: 0x0000000000000103   x9: 0x0000000000000103  x10: 0x0000000000000103  x11: 0x0000600000fb8000
   x12: 0x0000000000000010  x13: 0x00000000fffffcee  x14: 0x00000000000007fb  x15: 0x00000000a4188ffb
   x16: 0x00000001a627d760  x17: 0x00000002066400a0  x18: 0x0000000000000000  x19: 0x0000000101438d58
   x20: 0x0000000000000103  x21: 0x00000001a624dda0  x22: 0x000000016f019b90  x23: 0x0000000000000103
   x24: 0x0000000000000103  x25: 0x0000000000000000  x26: 0x0000000000000002  x27: 0x0000000000000002
   x28: 0x0000600000fa4000   fp: 0x000000016f019b80   lr: 0x00000001a627f710
    sp: 0x000000016f019b50   pc: 0x00000001a6283108 cpsr: 0x60001000
   far: 0x00000001ff2bc0b8  esr: 0xf2000001 (Breakpoint) brk 1

Environment (please complete the following information):

  • OS: MacOS Ventura 13.2.1
  • OpenSSL version 3.1.0
  • This provider version: current master (0.5.0-dev)

Additional context This is on MacBook Pro - Apple Silicon M2 chip. Similar problem on Intel-based iMac (used same process as above).

Note: commenting out, e.g., GOST provider in openssl.cnf did not help.

$ openssl version
OpenSSL 3.1.0 14 Mar 2023 (Library: OpenSSL 3.1.0 14 Mar 2023)
$ openssl list -providers
Providers:
  base
    name: OpenSSL Base Provider
    version: 3.1.0
    status: active
  default
    name: OpenSSL Default Provider
    version: 3.1.0
    status: active
  gost
    name: OpenSSL GOST Provider
    status: active
  legacy
    name: OpenSSL Legacy Provider
    version: 3.1.0
    status: active
  oqs
    name: OpenSSL OQS Provider
    version: 0.5.0-dev
    status: active
  pkcs11
    name: PKCS#11 Provider
    version: 3.1.0
    status: active
$ 

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 69 (65 by maintainers)

Most upvoted comments

Finally finding time to work on oqsprovider again… 😕 So, thanks, @mouse07410 for the report above. I could reproduce and track in #160. If you agree this is the same (remaining) issue in this issue thread (?), let’s close this and continue in #160.

Yes, probably - but (a) where in your opinion the root cause is (aka, what component issues those improper calls, and why), and (b) how do we fix it, and where (in what component)?

This is a bug in the OpenSSL library, not in the config or providers.

Also, it looks like the fix was merged two-three weeks ago into 3.1, so should’ve been picked by Macports by now? I’m trying to understand why I don’t see the behavior change yet on my machines…

Because its only in git, not in a stable release yet. When 3.1.1 eventually gets released the fix will be included.

A comment at the same moment. Impressive 😃

Thus, if tests are supported at all, limiting “full test suite” (5 tests?) to the unreleased master only does not seem like a good idea.

Formulated that way, I’d agree. However, the tests I consider most important (openssl req|verify|ca|dgst|cms|x509|s_client|s_server for quite some params and all algorithms) are run for each version. The ones I’d limit are 3 (of the 5) additional ctest tests: Those are API tests with limited additional value over the tests mentioned above – limited knowing that these tests ran successfully in master over time/while master was 3.0 and then 3.1 and under the assumption that OpenSSL doesn’t change API functionality in non-master versions – which I’d consider a pretty safe bet.

But whatever, using the internal ssltestlib only was meant to eliminate the need to write my own separate test harness – I’m pretty much on my own doing this project, so need to be “economical”. A much cleaner way indeed would be to “break free” from this dependency. Added #137 to track.

I’d be willing to go with even a patch for test/helpers/ssltestlib.c in this repo (maybe just include it in the README), since OpenSSL does not want to incorporate it.

Agreed, that’d be a stopgap measure short of resolving #137. However, I’ve been there before (patching upstream code for this purpose in oqsprovider) and it was a never-ending problem: As soon as sth changed upstream I had to change the patch. “Doing” this in documentation is IMO too problematic for most users (who typically anyway don’t read documentation). I may give it a try again, though, considering that older (I know, I know…) OpenSSL versions shouldn’t change code so quickly such as to make the patch worthless equally quickly.

Thanks for the feedback and food for thought.

openssl/openssl#19326

Done: https://github.com/openssl/openssl/issues/20653

Another thought/question: Does sth change if you (also) disable PKCS#11 provider? And Legacy provider?

For the (perverted) fun of it, I tried it - with the very obvious and very much expected result: no, it makes no difference.

Which should’ve been obvious from the fact that just disabling the GOST provider remedied the problem, with all the other providers remaining enabled.

Summary: disabling GOST provider remedies the problem, regardless of the other providers. Enabling GOST provider manifests this problem, regardless of the other providers.

Recommendations?

I’ll work on #136 . As we always aimed to support all OpenSSL3.x versions that should solve the problem. Stay tuned.

Thanks very much for these thoughts, trials and very helpful comments!

First, I’d rather not do a full rebuild of everything (OpenSSL, liboqs) merely for the pleasure of being able to run the tests locally.

Completely understandable and logical. I just need(ed) a baseline to see whether oqsprovider works at all under OSX.

I am building this provider for the system-wide (Macports-installed) OpenSSL-3.1.0, which is binaries-only. So, I am trying to test it against that.

Very good goal and one I’d (also) like to see achieved for sure.

what would happen if I change it to activate = 0. Would the corresponding provider still load automatically when needed?

No. But I’d argue it’s not needed for this (oqsprovider) test. oqsprovider only needs the default provider to function properly (for classic/PQ hybrid algs). So setting all “activate=0” except for default and oqsprovider would be a(nother) baseline.

  • insufficient configurability of the oqs-provider

Well, we’re relying on the configuration capabilities of openssl, namely OPENSSL_MODULES env var. The test scripts set a default (sensible for a local build) if nothing else is set (which may be more sensible in a setup without a local openssl build).

But then again, I’m all ears for suggestions what additional config options you’d find helpful.

Also, note that the script thinks the all tests “passed”, despite evidence to the contrary.

That’s a clear mistake – not checking the retval of ctest. Noted for improvement.

The immediate cause of this failure is the insistence of the scripts to find openssl.cnf in the same directory as the openssl executable

Agreed, the scripts should not set them “hard” via the -conf parameter (but instead rely on the standard OPENSSL_CONF env var – which then needs to have contents needed for testing, i.e., something along the lines of “scripts/openssl-ca.cnf”). Noted as further point for improvement.

Certificate request self-signature did not match the contents
8096FA56F87F0000:error:4000000D:lib(128):oqs_sig_verify:reason(13):/Users/ur20980/src/oqs-provider/oqsprov/oqs_sig.c:400:
8096FA56F87F0000:error:06880006:asn1 encoding routines:ASN1_item_verify_ctx:EVP lib:crypto/asn1/a_verify.c:217:

Those are very helpful hints. But one I cannot understand (how it hits that error condition – short of maybe being unable to get the file input). Did any files get created in the tmp directory? The “interop.log” file should be a bit more informative…

BTW, and talking about “informative”: if/as debugging is a pain, if you build oqsprovider with -DCMAKE_BUILD_TYPE=Debug you can make it very chatty by enabling environment variables, the most relevant being listed here.

–> Would you have time/inclination to give this a try in your setup (and run the tests with all of them set (at least OQSPROV=1 OQSKM=1 OQSKEY=1)? interop.log then should become telltale as to what the problem is.

In the mean time, I’ll try to create a setup mirroring yours on the M1 I have remotely available…

Finally, confirmation that OSX builds and tests OK via CI: https://app.circleci.com/pipelines/github/open-quantum-safe/oqs-provider/418/workflows/2b807d0b-0dbb-4c93-ada0-b41242e02c63/jobs/440. Thanks for triggering us to support OSX, @mouse07410 : Looks like you were the first person caring to use (or actually test 😃 oqs-provider on that platform!