magento2: Partial sitempas have wrong urls in sitemap index
Preface
I went through all the current open issues (5 at the time of writing) that come up when searching for ‘sitemap’ and none of them are related to this problem. There are several closed issues about this problem, but instead of reopening one of them, I decided to open a new one to clearly describe the issue and cause.
For example https://github.com/magento/magento2/issues/19565 describes this same problem but the issue has been closed even though it can be reproduced with the latest version (68c48d55231de6849195e090e2b3ce750db61825) of the codebase.
Preconditions (*)
- Magento 2.3-develop with products, installed to a folder one or more levels above the Magento user’s home folder (/home/magento/public_html for example).
- Webserver serving the installation folder.
- Nginx(reproduced only with nginx)
Steps to reproduce (*)
- Create a sitemap for the store (I used /pub/sitemaps folder and sitemap.xml as the configuration).
- Set the sitemap generation limits so that the sitemap will be split into parts (i.e. set the product limit to some number < number of products in the store).
- Set up a cron schedule to generate the sitemap.
- Either wait for the system cron to generate the sitemap, or run
bin/magento cron:run
to generate it manually. NB: to reproduce the issue, you need to run the cron:run command from the Magento user’s home directory like the system cron would.
Expected result (*)
- The partial sitemaps listed in the sitemap index have the correct url (e.g. storeurl/pub/sitemap-1-1.xml)
Actual result (*)
- The partial sitemap urls contain the folder structure between the Magento user home folder and the installation folder (in my case that would be storeurl/ public_html /pub/sitemap-1-1.xml).
Cause
The problem is caused by the _getStoreBaseDomain
method in the Magento\Sitemap\Model\Sitemap
class. The code initializes the variables $documentRoot
and $baseDir
so that the $baseDir
variable holds the path to the directory where the Magento installation resides, and the $documentRoot
contains the path to the directory where the code is run from. In my example these would be /home/magento/public_html/ and /home/magento
The method then checks if the $baseDir
path contains $documentRoot
and strips the $documentRoot
from the $baseDir
and sets the result to the $installationFolder
variable. In my example the $installationFolder
would therefore be 'public_html'
. The $installationFolder
is then appended to the store url. Because of this the partial sitemap urls contain the public_html part when they should not.
I’m not creating a pull request to fix this as I’m unsure what the correct way to handle this is, but the current code seems to work incorrectly when used with cron. When the sitemap is generated from the admin view it works as expected.
Hacky workaround
You can get around the issue by adding a cd /path/to/magento/folder &&
at the beginning of the cron:run
line in your crontab.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 18 (12 by maintainers)
Hi @mattijv.
Thank you for your report and collaboration!
The issue was fixed by Magento team. The fix was delivered into
magento/magento2:2.3-develop
branch(es). Related commit(s):The fix will be available with the upcoming
2.3.5
release.Hello @mattijv I have installed ngnix and reproduced your issue. Thank you for contribution and collaboration!