ArchiveBox: Link parsing: Pinboard private feeds don't seem to get parsed properly

I would love to have the cron job that monitors my Pocket feed also monitor my private Pinboard feed. However, no matter which method I use to pass the feed to bookmark-archiver using the instructions, all have their own unique failure.

If I pass a public feed, like http://feeds.pinboard.in/rss/u:username/, it works fine. But if I pass a private feed, like https://feeds.pinboard.in/rss/secret:xxxx/u:username/private/, it errors out. I have tried the RSS, JSON, and Text feeds, and none work.

Examples here: (I’ve simply replaced the actual feed I used to test, with the demo URL Pinboard provides) ./archive "https://feeds.pinboard.in/rss/secret:xxxx/u:username/private/"

[*] [2018-10-18 21:14:03] Downloadinghttps://feeds.pinboard.in/rss/secret:xxxx/u:username/private/ > output/sources/feeds.pinboard.in-1539897243.txt
[X] No links found :(

./archive "https://feeds.pinboard.in/json/secret:xxxx/u:username/private/"

[*] [2018-10-18 21:13:46] Downloading https://feeds.pinboard.in/json/secret:xxxx/u:username/private/ > output/sources/feeds.pinboard.in-1539897226.txt
Traceback (most recent call last):
  File "./archive", line 161, in <module>
    links = merge_links(archive_path=out_dir, import_path=source)
  File "./archive", line 53, in merge_links
    raw_links = parse_links(import_path)
  File "/home/USERNAME/datahoarding/bookmark-archiver/archiver/parse.py", line 54, in parse_links
    links += list(parser_func(file))
  File "/home/USERNAME/bookmark-archiver/archiver/parse.py", line 108, in parse_json_export
    url = erg['url']
KeyError: 'url'

./archive "https://feeds.pinboard.in/text/secret:xxxx/u:username/private/"

[*] [2018-10-18 21:17:57] Downloading https://feeds.pinboard.in/text/secret:xxxx/u:username/private/ > output/sources/feeds.pinboard.in-1539897477.txt
[X] No links found :(

Even though the script says that links are not found, they are definitely there, and simply pasting the URL into a browser outputs the feed in the proper format. I used this script successfully with other methods, like the Pinboard manual export, Pocket manual export AND RSS feed, and browser export. Is this just not a supported method for importing/monitoring?

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 19 (16 by maintainers)

Commits related to this issue

Most upvoted comments

From the settings->backup page:

Legacy HTML (seems to be broken HTML/XML?)

<!DOCTYPE NETSCAPE-Bookmark-file-1>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
<TITLE>Pinboard Bookmarks</TITLE>
<H1>Bookmarks</H1>
<DL>
<p>

<DT><A HREF="https://github.com/trailofbits/algo" ADD_DATE="1542616733" PRIVATE="1" TOREAD="1" TAGS="vpn,scripts,toread">Algo VPN scripts</A>
<DT><A HREF="http://www.ulisp.com/" ADD_DATE="1542374412" PRIVATE="1" TOREAD="1" TAGS="arduino,avr,embedded,lisp,toread">uLisp</A>

</DL>
</p>

XML

<?xml version="1.0" encoding="UTF-8"?>
	<posts user="aaronmueller">
<post href="https://github.com/trailofbits/algo" time="2018-11-19T08:38:53Z" description="Algo VPN scripts" extended="" tag="vpn scripts" hash="18d708f67bb26d843b1cac4530bb52aa"  shared="no" toread="yes" />
<post href="http://www.ulisp.com/" time="2018-11-16T13:20:12Z" description="uLisp" extended="" tag="arduino avr embedded lisp" hash="2a17ae95925a03a5b9bb38cf7f6c6f9b"  shared="no" toread="yes" />
</posts>

JSON

[{"href":"https:\/\/github.com\/trailofbits\/algo","description":"Algo VPN scripts","extended":"","meta":"62325ba3b577683aee854d7f191034dc","hash":"18d708f67bb26d843b1cac4530bb52aa","time":"2018-11-19T08:38:53Z","shared":"no","toread":"yes","tags":"vpn scripts"},
{"href":"http:\/\/www.ulisp.com\/","description":"uLisp","extended":"","meta":"7bd0c0ef31f69d1459e3d37366e742b3","hash":"2a17ae95925a03a5b9bb38cf7f6c6f9b","time":"2018-11-16T13:20:12Z","shared":"no","toread":"yes","tags":"arduino avr embedded lisp"}]

Private RSS feed:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://web.resource.org/cc/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://pinboard.in">
    <title>Pinboard (private aaronmueller)</title>
    <link>https://pinboard.in/u:aaronmueller/private/</link>
    <description></description>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="https://mehkee.com/"/>
        <rdf:li rdf:resource="https://qmk.fm/"/>
      </rdf:Seq>
    </items>
  </channel>

  <item rdf:about="https://mehkee.com/">
    <title>Mehkee - Mechanical Keyboard Parts &amp; Accessories</title>
    <dc:date>2018-11-08T21:29:32+00:00</dc:date>
    <link>https://mehkee.com/</link>
    <dc:creator>aaronmueller</dc:creator>
    <dc:subject>keyboard gadget diy</dc:subject>
    <dc:source>http://pinboard.in/</dc:source>
    <dc:identifier>http://pinboard.in/u:aaronmueller/b:xxx/</dc:identifier>
    <taxo:topics>
      <rdf:Bag>
        <rdf:li rdf:resource="http://pinboard.in/u:aaronmueller/t:keyboard"/>
        <rdf:li rdf:resource="http://pinboard.in/u:aaronmueller/t:gadget"/>
        <rdf:li rdf:resource="http://pinboard.in/u:aaronmueller/t:diy"/>
      </rdf:Bag>
    </taxo:topics>
  </item>
  <item rdf:about="https://qmk.fm/">
    <title>QMK Firmware - An open source firmware for AVR and ARM based keyboards</title>
    <dc:date>2018-11-06T22:36:21+00:00</dc:date>
    <link>https://qmk.fm/</link>
    <dc:creator>aaronmueller</dc:creator>
    <dc:subject>firmware keyboard</dc:subject>
    <dc:source>http://pinboard.in/</dc:source>
    <dc:identifier>http://pinboard.in/u:aaronmueller/b:xxx/</dc:identifier>
    <taxo:topics>
      <rdf:Bag>
        <rdf:li rdf:resource="http://pinboard.in/u:aaronmueller/t:firmware"/>
        <rdf:li rdf:resource="http://pinboard.in/u:aaronmueller/t:keyboard"/>
      </rdf:Bag>
    </taxo:topics>
  </item>
</rdf:RDF>

I’ve ran into the same problem. I solved this with a little go program which will login to pinboard and klick the actual “backup my bookmarks in legacy Netscape format” button – which works fine for me.

package main

import (
  "gopkg.in/headzoo/surf.v1"
  "os"
  "flag"
)

var username = flag.String("username", "", "pinboard username")
var password = flag.String("password", "", "pinboard password")

func main() {
  flag.Parse()

  bow := surf.NewBrowser()
  err := bow.Open("https://pinboard.in/")
  if err != nil {
    panic(err)
  }

  form, formErr := bow.Form("form[name=login]")
  if formErr != nil {
    panic(formErr)
  }

  form.Input("username", *username)
  form.Input("password", *password)
  if form.Submit() != nil {
    panic(err);
  }

  err = bow.Open("https://pinboard.in/export/format:html/")
  if err != nil {
    panic(err)
  }

  bow.Download(os.Stdout)
}
$ export GOPATH=.
$ go get gopkg.in/headzoo/surf.v1
$ go build src/aaron-fischer.net/fupin/main.go
$ ./fuPin -username=[USERNAME] -password=[PASSWORD] > bookmarks.html

Seems to work for me on the most recent master (ce257949b4468c77412c026b5987c3f37bad6443). 😃 Thanks a ton.

My original issue doesn’t seem to be the same problem that @f0086 is dealing with.

I am very sorry, but it does not work. You are using the wrong URLs. You need to use the URL in the <link></link> tag. I will have a look at this.

#123 seems related to this 😃

EDIT: Ok, I had a quick look at the code, but did not find a proper solution. The xml.etree.ElementTree component is not working as expected I think, but I am not a Python guy, so not sure about that. My setup (see above) works great for me, so I have no interest in spending an evening debugging this for now, sorry 😦 Maybe it is not worth it anyway, because of #123 ?!?