hugo: RSS feeds do not validate

The RSS 2.0 feeds generated by Hugo do not validate. For example, look at feedvalidator.org’s report on the spf13.com blog.

Sorry
This feed does not validate.

line 9, column 4: Undefined channel element: author [help]

    <author>Steve Francia</author>
    ^
line 18, column 27: Invalid email address: Steve Francia (15 occurrences) [help]

      <author>Steve Francia</author>
                           ^
line 165, column 44: pubDate must be an RFC-822 date-time: Tue, 01 Jul 2014 00:00:00 UTC (13 occurrences) [help]

      <pubDate>Tue, 01 Jul 2014 00:00:00 UTC</pubDate>
                                            ^
In addition, interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

Your feed appears to be encoded as "utf-8", but your server is reporting "US-ASCII" [help]

line 110, column 0: description should not contain relative URL references: /templates/list (14 occurrences) [help]

</description>
line 1619, column 0: Non-html tag: figcaption (4 occurrences) [help]

        &lt;img src=&#34;/media/pingdom-old.png&#34; alt=&#34;Pingdom of for ...

I spent several years working on a feed reader at Apple, and malformed feeds were the bane of my existence. It’s not hard to check that you’re generating a valid feed; please do so, for the sake of the feed readers!

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Comments: 19 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Thank you for reporting this issue, @snej.

For those who are interested, here is what I have learned thus far. According to the RSS 2.0 Specification:

  1. <channel> does not have an <author> element, but <managingEditor> and <webMaster> are available.
  2. <managingEditor>, <webMaster> and <author> all expect an email address followed by an optional full name in parentheses, e.g. geo@herald.com (George Matesky)
  3. <pubDate> etc. must strictly adhere to RFC-822 Date and Time Specification, which accepts UT, GMT or Z as valid and equivalent timezones, but not UTC.

{{ .Date.Format "MST" }} would generate UTC when no time is given in date in the front matter, e.g. date = "2015-01-12". Nowadays, however, hugo new post/test.md would automatically put in the current date/time with timezone, e.g. date = "2015-01-12T14:33:38-07:00", so most users won’t see this bug. However, with TZ=GMT hugo new post/test.md, the resulting timestamp would be date = "2015-01-12T14:33:38Z", which becomes UTC in the feed XML file.

I have modified the rss.xml template in hugo/tpl/template_embedded.go accordingly, committed as 700c2b8. Now, the W3 Feed Validation now comes out clean for the RSS feed on my simplistic personal website.

However, the other errors aren’t as trivial:

  • <span class="message">Your feed appears to be encoded as "utf-8", but your server is reporting "US-ASCII"</span> [help]

  • line 31, column 0: <span class="message">description should not contain iframe tag</span> (6 occurrences) [help]

  • line 207, column 0: <span class="message">description should not contain relative URL references: /templates/list</span> (4 occurrences) [help]

    </description>
  • line 2407, column 0: <span class="message">description should not contain script tag</span> (3 occurrences) [help]

It would seems that to get feeds of pages with extras like <iframe> and <script> to validate, tags like these will need to be filtered out, and all relative URL references will need to be converted to full URLs, a kind of sanitization run, so to speak.

I am too new to Go and to the Hugo team to tackle that. Should something like this (probably an enhancement rather than a bugfix) be added to Hugo? Please discuss. 😃