got: Don't force query string normalization

What problem are you trying to solve?

In an url like http://example.org/random?param=SOMETHING~SOMETHING the special character ~ is percent-encoded before the request, resulting in http://example.org/random?param=SOMETHING%7ESOMETHING which is not supported (decoded) by some HTTP servers.

As described by RFC 3986 in section 2.3 " URI comparison implementations do not always perform normalization prior to comparison. For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers. "

Also in RFC 3986, section 6.2.2.2 " The percent-encoding mechanism is a frequent source of variance among otherwise identical URIs. In addition to the case normalization issue noted above, some URI producers percent-encode octets that do not require percent-encoding, resulting in URIs that are equivalent to their non-encoded counterparts. These URIs should be normalized by decoding any percent-encoded octet that corresponds to an unreserved character, as described in Section 2.3. "

The percent-encoding of ~ by got happen because NodeJS follows the “WHATWG URL API” (https://nodejs.org/api/url.html#url_the_whatwg_url_api) which misses ~ from the unreserved characters (https://url.spec.whatwg.org/#interface-urlsearchparams, the Note below the example, and https://url.spec.whatwg.org/#urlencoded-serializing) and, by the way, includes *.

Describe the feature

My proposal is to add a flag to the options to prevent the normalization by skipping the append and delete of “_GOT_INTERNAL_TRIGGER_NORMALIZATION”

Checklist

  • I have read the documentation and made sure this feature doesn’t already exist.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 34 (1 by maintainers)

Most upvoted comments

Invalid URL. The % character should be escaped.

It’s not invalid, your statement would be true if the format applied to the query string was application/x-www-form-urlencoded, which requires that % character should be escaped.

What about a custom format that don’t use percent escapes? It’s still into the specs, as described before HTTP do not have specs about the query string content (except for the # character, as it’s the terminator of the query string).

You can argue that application/x-www-form-urlencoded is a spec and Got follows it, but in the homepage I read

Human-friendly and powerful HTTP request library for Node.js

and not

Human-friendly and powerful HTTP (with query string as application/x-www-form-urlencoded) request library for Node.js

What i really meant with

The spec is fine

was that that’s not the problem.

I know this “Feature request” started as “I’ve a problem with ~, damn WHATWG spec”, but after this comment https://github.com/sindresorhus/got/issues/1234#issuecomment-625757221 , it became a “Bug report” on the incorrect handling of the query string.

I wanted to discuss exhaustively about this issue because, as you said, it would be a breaking change. In particular I think it is going to break the merging of the URL params.