magento2: Magento 2 suddenly does not work with REDIS anymore after upgrade to (M2.1.6)

Preconditions

  1. M2.1.4 > M2.1.6 - NGINX - phpREDIS

Steps to reproduce

  1. Run $php bin/magento setup:static-content:deploy"
  2. Getting: [Exception] Notice: unserialize(): Error at offset 0 of 1 bytes in /var/www/domain.com/vendor/magento/module-config/App/Config/Type/System.php on line 214

Expected result

  1. Able to deploy without errors

Actual result

  1. The newly updated System.php file (M2.1.6) “vendor/magento/module-config/App/Config/Type/System.php” was changed and breaking the connection with REDIS. It was however working fine with M2.1.4 and all earlier version of System.php
  2. Run $php bin/magento setup:static-content:deploy"
  3. Now getting: [Exception] Notice: unserialize(): Error at offset 0 of 1 bytes in /var/www/domain.com/vendor/magento/module-config/App/Config/Type/System.php on line 214

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 25 (14 by maintainers)

Most upvoted comments

Apologies. I did rush through to find a solution and didn’t spend the time to explain clearly. I’ll try my best to detail everything I found.

I was randomly receiving the following error like others:

[Exception] Notice: unserialize(): Error at offset 0 of 1 bytes in /var/www/domain.com/vendor/magento/module-config/App/Config/Type/System.php on line 214

My investigation started by inserting old school I-Dont-Know-What-A-Debugger-Is style error_log lines into this file around the cache loads: LOAD: [cache-entry] and then checking if the result is unserialisable or not before Magento attempts to do it, logging either LOAD SUCCESSFUL or LOAD FAILURE: [the-value-that-we-couldn't-unserialize].

From this I could see that 1 was returned from a load for system_defaultweb and Magento was expecting serialised data. Further from these logs I could see there were many other system_defaultweb loads happening that worked fine.

From this I added further error_log to Cm_Redis_Cache_Backend in the load() and decodeData() functions. It would log the first 3 characters of the decoded data.

The presence of multiple CMCACHE LOAD logs in a row without an immediately following CMCACHE DECODE (which the code dictates must happen) indicated an impossible code path explained only by parallelism. Investigating static-content:deploy we can see that it performs parallelism. The Magento\Deploy\Console\Command\DeployStaticContentCommand::execute method uses a Model to manage deployment of each area, which can be found at Magento\Deploy\Model\DeployManager.

In Magento\Deploy\Model\DeployManager::deploy there is a check to see if parallelism is enabled and available (-j set to > 1 and pctnl_fork function available.) If so, it calls runInParallel. This function uses Magento\Deploy\Model\ProcessQueueManager to queue a list of tasks to be run in parallel, and this in turn creates processes up to the amount specified by -j (default 3) by creating Magento\Deploy\Model\ProcessManager objects.

Magento\Deploy\Model\ProcessManager has a fork method that calls pctnl_fork. After forking, nothing is adjusted so all open descriptors, including MySQL connections, Redis connections, etc etc, are still held, and whilst held the process continues to run the requested processing, using the same connections.

This triggers wild behaviours with a variety of errors such as the unserialise problem, lost connections to redis, corrupt data errors from redis where data could not be uncompressed (if you have enabled compression) and various other random exceptions. This is caused because of a race condition between reads on the Redis connection from the different parallel processes. They essentially battle with each other to receive data, and will at times receive data that is a response to a request a completing different process made. In our case of the exception above, in App/Config/Type/System.php, the process was expecting a response for system_defaultweb but was actually receive a response for system_cache_exists that was made by a different process, which of course returns a single integer 1, and not serialised data as is expected. In fact, the process that wanted system_cache_exists received system_defaultweb response, but this did not cause an exception there was it was simply evaluated for trueness.

Best solution is to pcntl_exec to replace the current image for the child with a new PHP run starting from, say, magento static:deploy-content --child-process <info-it-can-run>, that then creates it’s own connections to MySQL / Redis as needed.

Another method would be to ensure that these child processes never use Redis. There’s also the possibility of attempting to open another Redis connection too which might get through this. In either case though, it would be extremely important not to close the existing Redis connection as this is most likely going to trigger a QUIT to be sent across the connection. This is why sometimes when one child finishes before others have you see Connection to Redis Lost, since one child will send QUIT when PHP exits, and the other children (and the parent) would be left with an open Redis connection where QUIT was sent, which would be an invalid state.

Overall, workaround is to disable the static-content:deploy parallelism if you are using Redis, by passing -j 1 to the arguments.

Hope this is enough detail.

Workaround for now which is deterministically safe:

magento setup:static-content:deploy -j 1

The -j 1 sets parallel processing to just a single process, which effectively undoes the parallelism change that must have been introduced in Magento 2.1.4 or thereabouts.

Yes, that’s enough detail. Lemme get my popcorn, though. And thank you very much for your time explaining.

I am using Redis version 3.0.6 and Magento 2.1.5. It is working smooth for me. Could you share the server confifuration ?

Hi @commcad.

Have you followed this steps while upgrading?

Please, pay attention for 5th step - clear var/ directories as well as run FLUSHALL command in redis cli.