go: net/url: misleading error message when url has a leading space

What version of Go are you using (go version)?

$ go version
go version go1.11.3 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/stefanb/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/stefanb/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/Cellar/go/1.11.3/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.11.3/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/z2/3bdl__bs0xxd3kmf5c1pj0gm0000gn/T/go-build212726590=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

package main

import (
	"net/http"
)

func main() {
	_, err := http.Get(" http://example.org")
	if err != nil {
		panic(err)
	}
}

https://play.golang.org/p/jkcYSD6ZRcO

What did you expect to see?

Either

  • a generic error message of invalid URL or
  • a detailed error of invalid URL scheme or
  • a detailed error of leading space in URL

What did you see instead?

A misleading detailed error message:

parse  http://example.org: first path segment in URL cannot contain colon

In #24246 the proposed solution was to trim the URLs before using them, but from the given error message it is very hard to see what the real problem is.

I propose to either:

  • adjust the error message to eg:
parse  http://example.org: URL scheme cannot contain spaces

or

  • quote the problematic URL in the error message, so that it is more obvious even if the message itself remains misleading:
parse " http://example.org": first path segment in URL cannot contain colon

or

  • ideally both adjust the error message + quote the URL:
parse " http://example.org": URL scheme cannot contain spaces

The cannot contain spaces in error message can be changed for cannot contain whitespace or even contains invalid characters, depending how the check is implemented.

Current implementation: https://github.com/golang/go/blob/b50210f5719c15cd512857e2e29e1de152155b35/src/net/url/url.go#L540-L562

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Comments: 15 (8 by maintainers)

Commits related to this issue

Most upvoted comments

Technically, according to RFC3986-Sec3.1 -

Scheme names consist of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus (“+”), period (“.”), or hyphen (“-”).

So therefore, anything not starting with a letter should be rejected and that would be the right behavior. However, I had a look at the implementations out there in the wild.

NodeJS accepts it-

> require("url").parse('  https://example.org')
Url {
  protocol: 'https:',
  slashes: true,
  auth: null,
  host: 'example.org',
  port: null,
  hostname: 'example.org',
  hash: null,
  search: null,
  query: null,
  pathname: '/',
  path: '/',
  href: 'https://example.org/' }

Curl does not -

curl '   https://example.org' 
curl: (1) Protocol "   https" not supported or disabled in libcurl

Same with wget -

wget '   https://example.org' 
   https://example.org: Scheme missing.

Python also fails to parse-

from urllib.parse import urlparse
>>> o = urlparse('  https://example.org')
>>> o
ParseResult(scheme='', netloc='', path='  https://example.org', params='', query='', fragment='')

Note that scheme and netloc are empty.

In light of the above, let us just return a more descriptive error (with quotes around the url) if we see a space. How about -

parse " http://example.org": URL scheme does not begin with an alpha-numeric character.

That is an incorrect error. It should be alpha character, not alpha-numeric.

On Dec 20, 2018, at 11:27 PM, Agniva De Sarker notifications@github.com wrote:

Technically, according to RFC3986-Sec3.1 -

Scheme names consist of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus (“+”), period (“.”), or hyphen (“-”).

So therefore, anything not starting with a letter should be rejected and that would be the right behavior. However, I had a look at the implementations out there in the wild.

NodeJS accepts it-

require(“url”).parse(’ https://example.org’) Url { protocol: ‘https:’, slashes: true, auth: null, host: ‘example.org’, port: null, hostname: ‘example.org’, hash: null, search: null, query: null, pathname: ‘/’, path: ‘/’, href: ‘https://example.org/’ } Curl does not -

curl ’ https://example.org’ curl: (1) Protocol " https" not supported or disabled in libcurl Same with wget -

wget ’ https://example.orghttps://example.org: Scheme missing. Python also fails to parse-

from urllib.parse import urlparse

o = urlparse(’ https://example.org’) o ParseResult(scheme=‘’, netloc=‘’, path=’ https://example.org’, params=‘’, query=‘’, fragment=‘’) Note that scheme and netloc are empty.

In light of the above, let us just return a more descriptive error (with quotes around the url) if we see a space. How about -

parse " http://example.org": URL scheme does not begin with an alpha-numeric character.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.