openssl: openssl hang at crypto/rand/rand_unix.c:494

Hello, After upgrade openssl to 1.1.1d, our sshd process hang at function wait_random_seeded, as the /dev/random (in our old system) is always unavailable.

compat_futex(0x2ad32eb0, 0x81 /* FUTEX_??? */, 2147483647) = 0
shmget(0x72, 1, 0)                      = -1 ENOENT (No such file or directory)
newuname({sys="Linux", node="qd02-s00c08h4", ...}) = 0
open("/dev/random", O_RDONLY)           = 3
compat_select(4, [3], NULL, NULL, NULL  /// hang here

Could we enhance here to break after timeout?

Thanks, Mark

            /* Open /dev/random and wait for it to be readable */
            if ((fd = open(DEVRANDOM_WAIT, O_RDONLY)) != -1) {
                if (DEVRANDM_WAIT_USE_SELECT && fd < FD_SETSIZE) {
                    FD_ZERO(&fds);
                    FD_SET(fd, &fds);
494---
                    while ((r = select(fd + 1, &fds, NULL, NULL, NULL)) < 0
                           && errno == EINTR);
                } else {
                    while ((r = read(fd, &c, 1)) < 0 && errno == EINTR);
                }
-->

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 31 (26 by maintainers)

Commits related to this issue

Most upvoted comments

The point is, that /dev/urandom will likely produce the same random looking numbers each time the system boots up. You basically had no security at all with 1.1.1b in a system that has no entropy source. What changed is only that openssl is now aware of the situation.

Adding a timeout here would be bad. Without a source of entropy, there is no security: it’s the same as running telnetd.

To get going again:

  1. Get /dev/random working.
  2. Run egd and configure OpenSSL to use it -with-rand-seed=egd.
  3. Run CPU Jitter to feed the kernel some entropy.
  4. Add --with-rand-seed=none to the configure line and accept that there will be no security.
  5. Define DEVRANDOM_WAIT to be something that won’t block (e.g. /dev/zero) and accept that there will be no security.

The change log need to be read from bottom up, so the newest entry is “DEVRANDOM … improved for older Linux systems” But since your kernel is 4.19.10, your linux has the getrandom syscall, and your system will most likely “hang” in this call: syscall(__NR_getrandom, buf, buflen, 0); as you said, that can take several minutes, but will only happen once, after that no further delay is expected.

While I respect your ability to assess risk of the potential insecurity in your specific situation, I will state for the broader audience that virtio-rng-pci is the generally recommended option for this sort of scenario, and the kernel PRNG is functioning as designed in the VM environment with poor entropy-collection characteristics.

A timeout is probably reasonable, but using SIGALRM to do so is not – the library shouldn’t be messing with the application’s signal handlers more than we already are.

@bernd-edlinger got it spot on. /dev/urandom isn’t going to provide security in this instance.

@beldmit, a diagnostic after a timeout is a good suggestion. Assuming the wait continues afterwards…

Actually producing the syslog message could theoretically even help in producing entropy in the kernel

@t8m you just invented the perfect perpetuum mobile which solves all our entropy source problems: let’s just send a few kBytes of Lorem ipsum to the syslog daemon before reading from /dev/random.

There could be a syslog message created after some timeout and then openssl could go back to waiting for entropy. Actually producing the syslog message could theoretically even help in producing entropy in the kernel if the syslog message is saved to a hard drive or sent over network somewhere.

I will state for the broader audience that virtio-rng-pci is the generally recommended option for this sort of scenario, and the kernel PRNG is functioning as designed in the VM environment with poor entropy-collection characteristics.

Noted. We’ll investigate more and see if this is something that we can reasonably fit into our configuration.

I do, however, think that the release notes should clarify the situation around DEVRANDOM_WAIT as they’re currently very contradictory. It’s explicitly stated that it’s disabled for Linux, yet the functionality remains enabled for Linux.

Edit: And thank you guys for taking the time to comment, I really appreciate it.

@bernd-edlinger From openssl-1.1.1d/NEWS (==> added by me):

  Major changes between OpenSSL 1.1.1c and OpenSSL 1.1.1d [10 Sep 2019]

      o Fixed a fork protection issue (CVE-2019-1549)
      o Fixed a padding oracle in PKCS7_dataDecode and CMS_decrypt_set1_pkey
        (CVE-2019-1563)
      o For built-in EC curves, ensure an EC_GROUP built from the curve name is
        used even when parsing explicit parameters
      o Compute ECC cofactors if not provided during EC_GROUP construction
        (CVE-2019-1547)
==>   o Early start up entropy quality from the DEVRANDOM seed source has been
        improved for older Linux systems
      o Correct the extended master secret constant on EBCDIC systems
      o Use Windows installation paths in the mingw builds (CVE-2019-1552)
      o Changed DH_check to accept parameters with order q and 2q subgroups
      o Significantly reduce secure memory usage by the randomness pools
==>   o Revert the DEVRANDOM_WAIT feature for Linux systems

From openssl-1.1.1d/CHANGES:

Changes between 1.1.1c and 1.1.1d [10 Sep 2019]
...
  *) Revert the DEVRANDOM_WAIT feature for Linux systems

     The DEVRANDOM_WAIT feature added a select() call to wait for the
     /dev/random device to become readable before reading from the
     /dev/urandom device.

     It turned out that this change had negative side effects on
     performance which were not acceptable. After some discussion it
     was decided to revert this feature and leave it up to the OS
     resp. the platform maintainer to ensure a proper initialization
     during early boot time.
     [Matthias St. Pierre]

This is a fresh installation of Gentoo (OpenSSL 1.1.1d built from source) on the 4.19.10 kernel (Yes, it’s a bit out of date, no, I unfortunately cannot change it). This system is running as a virtual machine and hangs until the crng successfully initializes (which can takes upwards of 6 minutes). The older VM we had that used OpenSSL 1.0.2o did not exhibit this issue. One of my coworkers was able to add a virtio-rng-pci device and get it to initialize in a more normal amount of time, but we’d prefer to not have to tweak our VM config as a work around to this issue. Our current plan is to rebuild OpenSSL 1.1.1d with DEVRANDOM_WAIT set to /dev/urandom, as we’re not overly worried about the potential insecurity in our specific situation.