ox: Incorrect HTML dumped after being parsed

Dumping HTML data after parsing it via Ox.parse results in incorrect HTML

Ox::VERSION == "2.8.4"
require 'ox'

html = <<-HTML
<!DOCTYPE html >
<html>
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <title>Hello World</title>
  </head>
  <body>
    <h1>Hello World</h1>
    <p>Lorem Ipsum Dolor Sit</p>
  </body>
</html>
HTML

Ox.default_options = {
    mode:   :generic,
    effort: :tolerant,
    smart:  true
}
puts Ox.dump(Ox.parse(html))

The output being:

<!DOCTYPE html >
<html>
  <head>
    <meta charset="utf-8">
      <meta http-equiv="X-UA-Compatible" content="IE=edge">
        <title>Hello World</title>
      </meta>
    </meta>
  </head>
  <body>
    <h1>Hello World</h1>
    <p>Lorem Ipsum Dolor Sit</p>
  </body>
</html>

Either this is a bug or there’s documentation missing on how to parse, alter and re-constitute HTML like Nokogiri…

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 16 (9 by maintainers)

Most upvoted comments

So Ox would have to support HTML using the Ox.parse and Ox.dump method. Something to put on the requested feature list. I can see how it involves fewer steps even if the SAX/Builder combo has the potential to be a lot faster.

You know my weakness, benchmarks. Now I have to put together something. 😊

Can you give an example of what you mean by tags?

I suppose a set of examples with comments might be helpful. There are many use cases.