magento2: Layout corruption with php-fpm (a Magento 1 problem returns)
Preconditions
- Run Magento 2 (any version) under php-fpm. Any valid php version - tested with php 5.6 to 7.0.11
Steps to reproduce
- Hit it with heavy traffic including xmlrpc calls
- Visit a page on frontend or backend
Expected result
Actual result
- Page displays without head tag (sometimes - intermittent)
This is fixed by clearing the layout cache. This displays in the same way as an issue in Magento 1 - see https://www.c3media.co.uk/blog/c3-news/security-fix-might-take-site
The fix in Magento 1 was to add the following to bootstrap.php:
if (function_exists('libxml_disable_entity_loader')) {
libxml_disable_entity_loader(false);
}
Why is this needed? Because there are a number of places in Magento 2 where $oldval = libxml_disable_entity_loader(true) is called and then later libxml_disable_entity_loader($oldval). In theory this should be fine, but the value is not thread-safe, so when running under php-fpm it only requires a call to occur in between these statements for the entire thread to end up set to TRUE. Why this then corrupts to cache it was never clear in Magento 1, but that was the upshot.
The above fix was added in Magento 1.9.2.0, but was not included in Magento 2. In one file (vendor/magento/framework/Xml/Security.php) there is an acknowledgment of the issue:
/**
* If running with PHP-FPM we perform an heuristic scan
* We cannot use libxml_disable_entity_loader because of this bug
* @see https://bugs.php.net/bug.php?id=64938
*/
…but sadly it still calls libxml_disable_entity_loader despite the warning. This is also called in some of the Zend and Symfony components, so it’s not really reasonable to handle all these situations. Thus the need for the suggested fix.
Not entirely sure how to fix this in the short-term, as unlike Magento 2 we cannot just edit the bootstrap.php file in Magento 2.1.0. Any suggestions?
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 32 (12 by maintainers)
Commits related to this issue
- magento2 issue #6942 added check for page layout loaded from cache (2.1-develop) — committed to mrIntegrator/magento2 by mrIntegrator 6 years ago
- magento2 issue #6942 fixed integration test (2.2-develop) — committed to mrIntegrator/magento2 by mrIntegrator 6 years ago
- magento2 issue #6942 added check for page layout loaded from cache — committed to mrIntegrator/magento2 by mrIntegrator 6 years ago
The real production case for which I’ve spent 40 hours debugging:
Symptoms Page with empty HTML. Literally any page can be empty. (we catched it on production on checkout/success page)
Full cache flush helps temporary.
Pre-requisites
How It Happens
How to easily reproduce
KEYS *LAYOUT*You may need to filter by layouts hash.Actual result: Empty page, no HTML code at all. Error in logs “before.body.end” doesn’t exists.
Root cause Magento does not check return of cache item with page layout value (for example: “1column”, “3columns”)
Permanent fix Store Layout XML and Page Layout items in one cache item (*_PAGE_CACHE_MERGED) It will eliminate the case when one item is lost and also it should give very tiny performance improvement, because 1 item is saved/loaded instead of two.
Pull requests
Other action items
Sorry, that it’s been a while. This bug came back to mind when I noticed one of sites still had layout cache disabled in production as a precaution. Lo and behold, the bug still exists in the latest version of Magento (currently 2.2.3). So here’s what I’ve discovered.
Steps to Reproduce:
Install Magento with Redis cache (I used db 0 for sessions, 1 for cache, 2 for FPC).
Navigate to a page not cached by FPC (I used customer/account/login).
Delete the page layout cache key in Redis. (Found by doing a var_dump of $cacheIdPageLayout in Magento\Framework\View\Model\Layout\Merge::load).
Refresh the page and behold the headlessness of your broken page.
Now this is just how to quickly reproduce what happens under heavy traffic as mentioned in the initial bug report. We have only experienced this happening under heavy traffic as well. The problem is in the logic of this function:
We can see here that there is a check as to whether the layout updates were loaded from cache, but if they were, it’s just assumed that page layout will be loaded from cache as well. The problem is that when Redis reaches its maxmemory setting (as it might under heavy traffic), it will begin evicting keys (see https://redis.io/topics/lru-cache). So if the page layout key is evicted while the layout key is not, the bug occurs. To fix this bug, we have to check for a successful load of both layout and page layout from the cache.
I will create a PR with a fix for this. Hopefully Magento can get this fixed quickly and we’ll all be able to turn layout cache back on!
@TomashKhamlai why was this issue closed while PR https://github.com/magento/magento2/pull/16428 and https://github.com/magento/magento2/pull/14129 have never been merged? It seems to me that https://github.com/mrIntegrator/magento2/blob/2.3-develop/lib/internal/Magento/Framework/View/Model/Layout/Merge.php#L441 hasn’t changed either and I assume this issue still exists.
We’ve been having the same problem with a M2.3 store when redis is set so
maxmemory-policy allkeys-lru. Currently testing withvolatile-lru.