go: x/net/html: unexpected whitespace rendering of html

What version of Go are you using (go version)?

$ go version
go version go1.12 darwin/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/tcurdt/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/tcurdt/.go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/Cellar/go/1.12/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.12/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/pf/7vhqx5bn41qddypw08w9jc4w0000gn/T/go-build451640899=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I am parsing and then rendering html.

package main

import (
	"bytes"
	"fmt"
	"os"
	"strings"

	"golang.org/x/net/html"
)

func main() {
	var input = `<!DOCTYPE html>
<html>
<head>
  <title>Title of the document</title>
</head>
<body>
   body content <p>more content</p>
</body>
</html>`
	doc, err := html.Parse(strings.NewReader(input))
	if err != nil {
		fmt.Fprintf(os.Stderr, "error parsing: %s\n", err.Error())
		os.Exit(1)
	}

	buf := bytes.NewBufferString("")
	html.Render(buf, doc)

	fmt.Println("--")
	fmt.Print(input)
	fmt.Println("--")

	fmt.Println("--")
	fmt.Print(buf.String())
	fmt.Println("--")
}

What did you expect to see?

With the docs saying:

Rendering is done on a ‘best effort’ basis: calling Parse on the output of Render will always result in something similar to the original tree, but it is not necessarily an exact clone unless the original tree was ‘well-formed’.

Given that the HTML is well-formed I’d expect the output be the same as the input.

What did you see instead?

Instead I am seeing changes in whitespace:

--
<!DOCTYPE html>
<html>
<head>
  <title>Title of the document</title>
</head>
<body>
   body content <p>more content</p>
</body>
</html>--
--
<!DOCTYPE html><html><head>
  <title>Title of the document</title>
</head>
<body>
   body content <p>more content</p>

</body></html>--

IMO there should be a test case verifying that the output matches in input for the documented case.

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Comments: 17 (2 by maintainers)

Most upvoted comments

@tcurdt lxml may work for you then; this maintains the whitespace between </body> and </html> (but ditches the whitespace after </html>):

import lxml.html

foo = """<!DOCTYPE html>
<html>
<head>
  <title>Title of the document</title>
</head>
<body>
   body content <p>more content</p>
</body>

</html>

	"""

bar = lxml.html.fromstring(foo)

print(lxml.html.tostring(bar))

Things more interesting with comment blocks, where go, Chrome and Firefox all disagree…

Input:

<html><head></head><body>
	<!--a-->

		<!--b-->
			
</body>
</html>

<!--c-->

Outputs in go, changes whitespace but keeps final comment in same relative place:

<html><head></head><body>
	<!--a-->

		<!--b-->





</body></html><!--c-->

Chrome moves the final comment inside the body:

<head></head><body>
	<!--a-->

		<!--b-->
			



<!--c--></body></html>

Firefox does same as Go with slightly different spacing:

<html><head></head><body>
	<!--a-->

		<!--b-->
			



</body></html>
<!--c-->

Nit: I think this is related to x/net/html, not x/net/template.