magento2: Partial sitempas have wrong urls in sitemap index

Preface

I went through all the current open issues (5 at the time of writing) that come up when searching for ‘sitemap’ and none of them are related to this problem. There are several closed issues about this problem, but instead of reopening one of them, I decided to open a new one to clearly describe the issue and cause.

For example https://github.com/magento/magento2/issues/19565 describes this same problem but the issue has been closed even though it can be reproduced with the latest version (68c48d55231de6849195e090e2b3ce750db61825) of the codebase.

Preconditions (*)

  1. Magento 2.3-develop with products, installed to a folder one or more levels above the Magento user’s home folder (/home/magento/public_html for example).
  2. Webserver serving the installation folder.
  3. Nginx(reproduced only with nginx)

Steps to reproduce (*)

  1. Create a sitemap for the store (I used /pub/sitemaps folder and sitemap.xml as the configuration).
  2. Set the sitemap generation limits so that the sitemap will be split into parts (i.e. set the product limit to some number < number of products in the store).
  3. Set up a cron schedule to generate the sitemap.
  4. Either wait for the system cron to generate the sitemap, or run bin/magento cron:run to generate it manually. NB: to reproduce the issue, you need to run the cron:run command from the Magento user’s home directory like the system cron would.

Expected result (*)

  1. The partial sitemaps listed in the sitemap index have the correct url (e.g. storeurl/pub/sitemap-1-1.xml)

Actual result (*)

  1. The partial sitemap urls contain the folder structure between the Magento user home folder and the installation folder (in my case that would be storeurl/ public_html /pub/sitemap-1-1.xml).

Cause

The problem is caused by the _getStoreBaseDomain method in the Magento\Sitemap\Model\Sitemap class. The code initializes the variables $documentRoot and $baseDir so that the $baseDir variable holds the path to the directory where the Magento installation resides, and the $documentRoot contains the path to the directory where the code is run from. In my example these would be /home/magento/public_html/ and /home/magento

The method then checks if the $baseDir path contains $documentRoot and strips the $documentRoot from the $baseDir and sets the result to the $installationFolder variable. In my example the $installationFolder would therefore be 'public_html'. The $installationFolder is then appended to the store url. Because of this the partial sitemap urls contain the public_html part when they should not.

I’m not creating a pull request to fix this as I’m unsure what the correct way to handle this is, but the current code seems to work incorrectly when used with cron. When the sitemap is generated from the admin view it works as expected.

Hacky workaround

You can get around the issue by adding a cd /path/to/magento/folder && at the beginning of the cron:run line in your crontab.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 18 (12 by maintainers)

Most upvoted comments

Hi @mattijv.

Thank you for your report and collaboration!

The issue was fixed by Magento team. The fix was delivered into magento/magento2:2.3-develop branch(es). Related commit(s):

The fix will be available with the upcoming 2.3.5 release.

Hello @mattijv I have installed ngnix and reproduced your issue. Thank you for contribution and collaboration! image