openssl: Report out-of-memory and any other malloc failures already within CRYPTO_malloc and friends

As partly discussed already in #5781 and #6217:

Currently there are some 900 places within crypto/ and ssl/ where in case CRYPTO_malloc() and friends returned NULL an error report with the reason ERR_R_MALLOC_FAILURE is pushed on the ERR queue, rather than the allocation function reporting the error itself. This approach is not only pretty redundant and cumbersome but also more or less inaccurate and error-prone:

Assuming that the callers of the allocation function(s) are responsible for flagging memory allocation failure, it easily happens that callers simply forget (or do not care) to do so. There are (or at least were) many occurrences of such false negatives.
Indirect callers of functions that involve memory allocation can also report false positives regarding ERR_R_MALLOC_FAILURE where they misinterpret the reason for obtaining a NULL pointer in case this was not actually due to an malloc failure but due to some other issue (see, e.g…, two instances fixed in #6217).
It also happens that indirect callers of functions that involve memory allocation re-report ERR_R_MALLOC_FAILURE after obtaining a NULL pointer though the malloc failure has already been reported by a function in between (see, e.g…, two instances fixed in #6217 caused by the same calls PKCS12err(PKCS12_F_PKCS12_SAFEBAG_CREATE_PKCS8_ENCRYPT, ERR_R_MALLOC_FAILURE); as in the item before).
If a function like CRYPTO_malloc() does the error reporting itself this relieves all its callers from doing so.
A function like CRYPTO_malloc() can do the failure/error reporting more accurately than its callers because it knows, in contrast to them, whether the failure is due to memory shortage or due to other issues like unsuitable arguments.

When I asked why CRYPTO_malloc() does not do the error reporting itself I learned about the following reasons.

An malloc failure could either be because the system is running low on memory, or the call to CRYPTO_malloc() was erroneous (e.g. because it incorrectly attempts to allocate a vast amount of memory). In the second case this kind of problem would be difficult to find if all you have is the error code in CRYPTO_malloc().
In the same direction goes the desire to preserve the file/line numbers (and if possible also the function name) of the caller(s) for being able to trace.
A entirely different issue is that the ERR reporting may in turn need memory allocation (in case its internal queue allocated so far has become full), which may cause a loop in case memory is exhausted.

Here are some thoughts on these issues and suggestions how to deal with them.

In case of an actual out-of-memory situation it would not help to obtain information about the caller because this is a low-level issue related to the execution environment (rather than to the caller).
In case of other reasons for malloc failures the malloc functions should give specific error reasons (which the callers cannot do), such as an overly large amount of memory requested or any internal allocation function issue. In case the caller gave an unsuitable argument it may be helpful to know which caller it was, but this is a very general issue that would better be solved more generally (such as, by providing a stack trace as far as possible). When specific value is seen in reporting malloc failures within some calling functions of course this can be done in addition to a more low-level reporting by the malloc function itself.
In order to remove the obstacle of a potential loop with the ERR queue, it should be possible to adapt its implementation such that there is always an element pre-allocated and reserved for use by malloc functions in case there is actually shortage of memory, such that in case the malloc function adds an ERR_R_OUT_OF_MEMORY entry there is no need to (recursively) allocate it. When an out-of-memory error has already been reported (and thus the reserved cell has been filled) any further requests to add elements to the ERR queue should be ignored in case the queue buffer is full and would need re-allocation (supposedly this would be acceptable because the entry regarding the out-of-memory error is already in the queue and at least from the library’s point of view an out-of-memory condition is fatal anyway).

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 15 (15 by maintainers)

Commits related to this issue

ERR: Make CRYPTO_malloc() and friends report ERR_R_MALLOC_FAILURE Fixes #6251 — committed to siemens/openssl by DDvO 3 years ago
ERR: Make CRYPTO_malloc() and friends report ERR_R_MALLOC_FAILURE Fixes #6251 — committed to siemens/openssl by DDvO 3 years ago
ERR: Make CRYPTO_malloc() and friends report ERR_R_MALLOC_FAILURE Fixes #6251 — committed to siemens/openssl by DDvO 3 years ago
ERR: Make CRYPTO_malloc() and friends report ERR_R_MALLOC_FAILURE Fixes #6251 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: David von Oheimb <da... — committed to sftcd/openssl by DDvO 3 years ago
ERR: Make CRYPTO_malloc() and friends report ERR_R_MALLOC_FAILURE Fixes #6251 Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Paul Dale <pauli@openssl.org> Reviewed-by: David von Oheimb <da... — committed to beldmit/openssl by DDvO 3 years ago

Most upvoted comments

Yes, the interesting question is, how you want to make sure that the error state is always allocated whenever a new thread starts and how to prevent the new thread from running into an undefined state, when that allocation fails.

bernd-edlinger on May 14, 2018

Hmmm… but pushing all these file names and line numbers on the error stack after a malloc failure will most certainly fail. So unless you keep track of the call stacks before the error occurs (you would have to do it all the time, which is a waste) or rely on os methods to walk the stack, the best you can get is probably a single malloc error message (which is much better than no error message at all). And in that case, I don’t see any reason why this could not have been done by CRYPTO_malloc() itself.

mspncp on May 14, 2018

As pointed out in https://github.com/openssl/openssl/pull/6217#issuecomment-388477438, there was some initial discussion in ~~#6217~~ #5781 about doing the reporting in CRYPTO_malloc() itself, using file and line information provided by the caller. I don’t really remember, why that thought was discarded at that time. Maybe it was simply overlayed by the discussion about the memory allocation problem in ERR_put_error(), which started briefly after that. Looking back, it’s a pity that we did not follow that idea any further.

mspncp on May 14, 2018

It is often the case that malfunction starts at a single malloc failure, and it is good to have the complete call stack when that happens, becase you cannot tell where exactly the error handling is broken, it is usually not the direct caller of CRYPTO_malloc.

bernd-edlinger on May 14, 2018

In case the caller gave an unsuitable argument it may be helpful to know which caller it was,

As pointed out in #6217, CRYPTO_malloc() actually knows who the caller is because it gets passed the file and line number of the caller in its arguments. So it seems we could record information about the caller from inside the CRYPTO_malloc() implementation.

mattcaswell on May 14, 2018