site-kit-wp: Permuting site URLs can fail for some domains and environments
Bug Description
I recently discovered a few tests that only fail locally with some versions of PHP related to testing Module::permute_site_url
.
After digging into it, I found the problem was related to parse_url
which was malforming parts of the URL when given certain UTF-8 multi-byte characters as used in some IDN domains. This goes back to an old bug in PHP and also seems to be related to the server’s configured locale.
Relevant test output
There were 2 failures:
1) Google\Site_Kit\Tests\Core\Modules\ModuleTest::test_permute_site_url with data set "http://éxämplę.test" ('http://éxämplę.test', array('http://éxämplę.test', 'https://éxämplę.test', 'http://www.éxämplę.test', 'https://www.éxämplę.test', 'http://xn--xmpl-loa2a55a.test', 'https://xn--xmpl-loa2a55a.test', 'http://www.xn--xmpl-loa2a55a.test', 'https://www.xn--xmpl-loa2a55a.test'))
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
Array (
- 0 => 'http://www.xn--xmpl-loa2a55a.test'
- 1 => 'http://www.éxämplę.test'
- 2 => 'http://xn--xmpl-loa2a55a.test'
- 3 => 'http://éxämplę.test'
- 4 => 'https://www.xn--xmpl-loa2a55a.test'
- 5 => 'https://www.éxämplę.test'
- 6 => 'https://xn--xmpl-loa2a55a.test'
- 7 => 'https://éxämplę.test'
+ 0 => 'http://www.xn--xmpl?_-bua2b.test'
+ 1 => 'http://www.éxämpl?_.test'
+ 2 => 'http://xn--xmpl?_-bua2b.test'
+ 3 => 'http://éxämpl?_.test'
+ 4 => 'https://www.xn--xmpl?_-bua2b.test'
+ 5 => 'https://www.éxämpl?_.test'
+ 6 => 'https://xn--xmpl?_-bua2b.test'
+ 7 => 'https://éxämpl?_.test'
)
/.../wordpress/wp-content/plugins/google-site-kit/vendor/wp-phpunit/wp-phpunit/includes/abstract-testcase.php:814
/.../wordpress/wp-content/plugins/google-site-kit/tests/phpunit/integration/Core/Modules/ModuleTest.php:214
/.../wordpress/wp-content/plugins/google-site-kit/vendor/phpunit/phpunit/phpunit:51
2) Google\Site_Kit\Tests\Core\Modules\ModuleTest::test_permute_site_url with data set "http://éxämplę.test/sub-directory" ('http://éxämplę.test/sub-directory', array('http://éxämplę.test/sub-directory', 'https://éxämplę.test/sub-directory', 'http://www.éxämplę.test/sub-directory', 'https://www.éxämplę.test/sub-directory', 'http://xn--xmpl-loa2a55a.test...ectory', 'https://xn--xmpl-loa2a55a.tes...ectory', 'http://www.xn--xmpl-loa2a55a....ectory', 'https://www.xn--xmpl-loa2a55a...ectory'))
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
Array (
- 0 => 'http://www.xn--xmpl-loa2a55a.test/sub-directory'
- 1 => 'http://www.éxämplę.test/sub-directory'
- 2 => 'http://xn--xmpl-loa2a55a.test/sub-directory'
- 3 => 'http://éxämplę.test/sub-directory'
- 4 => 'https://www.xn--xmpl-loa2a55a.test/sub-directory'
- 5 => 'https://www.éxämplę.test/sub-directory'
- 6 => 'https://xn--xmpl-loa2a55a.test/sub-directory'
- 7 => 'https://éxämplę.test/sub-directory'
+ 0 => 'http://www.xn--xmpl?_-bua2b.test/sub-directory'
+ 1 => 'http://www.éxämpl?_.test/sub-directory'
+ 2 => 'http://xn--xmpl?_-bua2b.test/sub-directory'
+ 3 => 'http://éxämpl?_.test/sub-directory'
+ 4 => 'https://www.xn--xmpl?_-bua2b.test/sub-directory'
+ 5 => 'https://www.éxämpl?_.test/sub-directory'
+ 6 => 'https://xn--xmpl?_-bua2b.test/sub-directory'
+ 7 => 'https://éxämpl?_.test/sub-directory'
)
/.../wordpress/wp-content/plugins/google-site-kit/vendor/wp-phpunit/wp-phpunit/includes/abstract-testcase.php:814
/.../wordpress/wp-content/plugins/google-site-kit/tests/phpunit/integration/Core/Modules/ModuleTest.php:214
/.../wordpress/wp-content/plugins/google-site-kit/vendor/phpunit/phpunit/phpunit:51
Do not alter or remove anything below. The following sections will be managed by moderators only.
Acceptance criteria
- A new
URL
class should be created in theCore\Util
namespace with a single staticparse
method for now- The signature and behavior should match that of
(wp_)parse_url
with the main difference being that it is safe to use with multi-byte UTF-8 IDN domains - Any given URLs that do not contain problematic characters could continue to use
wp_parse_url
internally
- The signature and behavior should match that of
- All usage of
(wp_)parse_url
should be refactored to use the new utility
Implementation Brief
- Create a new class named
URL
in theincludes/Core/Util
folder having theCore\Util
namespace as per the AC. - Create a static method
parse
with the same signature aswp_parse_url()
, i.e. it should accept$url
and$component
as parameters.- Check if
$url
contains multibyte chars, i.e. ifmb_strlen($url, 'UTF-8') != strlen($url)
. - If this is not the case, simply delegate the call to
wp_parse_url()
and return its output. - If the string does contain multibyte chars, use the same logic within
wp_parse_url()
to parse the$url
. But instead of calling PHP’s inbuiltparse_url()
method, create and call a function that encodes the URL and decodes the parts later. An example of this function can be found in this comment. - Find and replace the usage of
wp_parse_url()
andparse_url()
methods withURL::parse()
.
- Check if
Test Coverage
- Add phpunit tests for
URL::parse()
with similar URL’s used inGoogle\Site_Kit\Tests\Core\Modules\ModuleTest::test_permute_site_url
.
QA Brief
- This issue modifies the code that handles URLs in the “back-end” of the plugin. As such, smoke testing the plugin would be required with particular attention to the following areas. When doing these tests, use a site URL / domain name that has foreign UTF-8 multi-byte characters such as
https://www.éxämplę.com
.- Plugin setup (redirects from Google Authentication service to the application).
- User input survey.
- Entity Search box (verifying the entity URLs on the entity dashboard)
- AdSense Earnings Report
- Creating a new Analytics and Analytics 4 property from within the plugin.
- Creating a TagManager container.
QA:Eng
- Test this issue on a development machine where the
ModuleTest::test_permute_site_url()
test fails on the latest release but is fixed when this issue is merged in.
Changelog entry
- Fix issues in permutation site URLs with multi-byte UTF-8 IDN domains.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (2 by maintainers)
@wpdarren Have created a new issue #5868. Could you please check the Steps to reproduce. When I was testing this, I did have some inconsistencies in the redirects. I even managed to connect Analytics once (after a few failures). So not sure what is going on here.
@aaemnnosttv Would you like me to add ACs to this new issue?
@felixarntz updated 👍
@aaemnnosttv Yeah, I feel that would be better handled separately. Would you be okay removing that above note though from the ACs? I think we should aim to use the new helper throughout.
@felixarntz I was thinking URLs for the SK service for example but I’m also not opposed to always using the utility in all cases and letting it determine if special handling is necessary or not. In that case, it would be nice to update our PHPCS rules to require this is used instead of
parse_url
orwp_parse_url
but that might be too much to do here.cc: @felixarntz