go: net/http: RawPath shows inconsistent, case-sensitive behaviour with percentage encoded Unicode URI strings

What version of Go are you using (`go version`)?

$ go version
go version go1.12.7 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

Any environment

go env Output

GOARCH="amd64"
GOBIN=""
GOCACHE="/home/user/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/user/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build606111517=/tmp/go-build -gno-record-gcc-switches"

What did you do?

package main

import (
	"fmt"
	"html/template"
	"net/http"
	"net/url"
)

type content struct {
	TplEncoded      string
	ManuallyEncoded template.URL

	ShowPaths bool
	RawPath   string
	Path      string
}

func main() {
	tpl, _ := template.New("test").Parse(`<!doctype html>
	<head>
		<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
		<meta charset="utf-8" />
	</head>
	<body>
		{{ if .ShowPaths }}
			<p>RawPath = {{ .RawPath }}</p>
			<p>Path = {{ .Path }}</p>
		{{ else }}
			<a href="/link/{{ .TplEncoded }}">Template encoded link</a><br />
			<a href="/link/{{ .ManuallyEncoded }}">Manually encoded link</a>
			<br />
			<p>View this page's source to see the (lower/upper)case difference
			in the links</p>
		{{ end }}
	</body>
	</html>`)

	// Renders the root with good and bad links.
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		s := "😋" // Unicode emoji.
		tpl.Execute(w, content{
			// html/template encodes into lowercase characters.
			TplEncoded: s,

			// url.PathEscape encodes into uppercase characters.
			ManuallyEncoded: template.URL(url.PathEscape(s)),
		})
	})

	// This handler produces inconsistent RawPath based on (upper/lower)case encoding in the URI.
	http.HandleFunc("/link/", func(w http.ResponseWriter, r *http.Request) {
		tpl.Execute(w, content{
			ShowPaths: true,
			RawPath:   r.URL.RawPath,
			Path:      r.URL.Path,
		})
	})

	fmt.Println("Go to http://127.0.0.1:8080")
	http.ListenAndServe(":8080", nil)
}

What did you expect to see?

url.PathEscape("😋") => %F0%9F%98%8B

/link/%F0%9F%98%8B (A) and /link/%f0%9f%98%8b (B) (upper and lower case respectively) are equivalent as per RFC 3986. An http.HandlerFunc() handling either of the URLs is expected to show consistent behaviour.

What did you see instead?

An http handler that processes the identical URIs A and B behaves differently. B, which has uppercase characters, produces an empty http.Request.URL.RawPath where as A that has lowercase characters produces an http.Request.URL.RawPath with unescaped characters. This breaks Unicode URL handling in popular HTTP routers like chi and httprouter.

Discovered this inconsistency when using html/template that encodes Unicode strings in <a> to have lowercase characters as opposed to url.PathEscape that produces uppercase characters.

About this issue

Original URL
State: open
Created 5 years ago
Comments: 21 (8 by maintainers)

Commits related to this issue

https://github.com/golang/go/issues/33596 — committed to subnut/go by subnut 2 years ago
net/url: normalize hex values before comparision Fixes: https://github.com/golang/go/issues/33596 — committed to subnut/go by subnut 2 years ago
net/url: clarify RawPath documentation Consistently recommend using EscapedPath rather than RawPath directly. For #33596. Change-Id: Ibe5c2dfa7fe6b1fbc540efed6db1291fc6532726 Reviewed-on: https://g... — committed to golang/go by neild 2 years ago
net/url: clarify RawPath documentation Consistently recommend using EscapedPath rather than RawPath directly. For #33596. Change-Id: Ibe5c2dfa7fe6b1fbc540efed6db1291fc6532726 Reviewed-on: https://g... — committed to jproberts/go by neild 2 years ago

Most upvoted comments

The original implementation of URL did not preserve the original unescaped form of a URL. After parsing a URL string, there was no way to distinguish between a/b, a%2fb, and a%2Fb, since these all escape to the same string. The escaped form of Path was accessible via the EscapedPath method.

Go 1.15 changed the EscapedPath method to return the original path. It did this by adding the RawPath field, which is set only when the escaped path is different from the default escaping of Path.

I think there’s an argument that RawPath should have been an unexported field, to avoid confusion, but it’s far too late to worry about that now. The behavior of RawPath is fairly straightforward, however:

Code which parses raw URLs will set RawPath if the escaped path is different from Path.
EscapedPath will return RawPath if it contains a valid escaping of Path.

The RawPath field is not present for efficiency purposes. It exists to allow EscapedPath to return the original encoded form of a URL. Most code should use EscapedPath rather than accessing RawPath directly.

neild on Jul 18, 2022

IMO, the fact that u.EscapedPath() is returning a percent-encoded string that has lowercase hexadecimal digits is itself a bug.

For a URL constructed by url.Parse, EscapedPath returns the original path, preserving the exact escaping used in that original path.

For a URL constructed manually, such as url.URL{Path: "a?b"}, EscapedPath uses upper case hexadecimal digits to escape the path as recommended by RFC 3986.

neild on Jul 16, 2022