rdflib: Invalid URI

I got this:

File "ws23/ws23.py", line 35, in web_search_to_triples triples = g.serialize(format=RDFLIB_FORMAT).split("\n")
File "/usr/local/lib/python2.7/dist-packages/rdflib/graph.py", line 936, in serialize serializer.serialize(stream, base=base, encoding=encoding, **args)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/turtle.py", line 208, in serialize if self.statement(subject) and not firstTime:
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/n3.py", line 92, in statement or super(N3Serializer, self).statement(subject))
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/turtle.py", line 269, in statement return self.s_squared(subject) or self.s_default(subject)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/turtle.py", line 282, in s_squared self.predicateList(subject)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/turtle.py", line 373, in predicateList self.objectList(properties[propList[0]])
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/turtle.py", line 388, in objectList self.path(objects[0], OBJECT)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/n3.py", line 96, in path super(N3Serializer, self).path(node, position, newline)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/turtle.py", line 288, in path or self.p_default(node, position, newline)):
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/turtle.py", line 294, in p_default self.write(self.label(node, position))
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/turtle.py", line 310, in label return self.getQName(node, position == VERB) or node.n3()
File "/usr/local/lib/python2.7/dist-packages/rdflib/term.py", line 224, in n3 raise Exception('"%s" does not look like a valid URI, I cannot serialize this as N3/Turtle. Perhaps you wanted to urlencode it?'%self)
Exception: "http://www.imdb.com/title/tt0091369//search/title?locations=West Wycombe Park, West Wycombe, Buckinghamshire, England, UK&ref_=tt_dt_dt" does not look like a valid URI, I cannot serialize this as N3/Turtle. Perhaps you wanted to urlencode it?  

This is the full triple:

_:node81b1978fce492c4b779bdd9d709f9e7f <http://schema.org/Movie/url> <http://www.imdb.com/title/tt0091369//search/title?locations=West Wycombe Park, West Wycombe, Buckinghamshire, England, UK&ref_=tt_dt_dt> .

I could observe that the triple is correctly parsed but cannot be serialized:

from rdflib import Graph
t = "_:node81b1978fce492c4b779bdd9d709f9e7f <http://schema.org/Movie/url> <http://www.imdb.com/title/tt0091369//search/title?locations=West Wycombe Park, West Wycombe, Buckinghamshire, England, UK&ref_=tt_dt_dt> ."
g = Graph()
g.parse(data=t, format="n3")
for s, p, o in g:
    print s, p, o
g.serialize(format="n3")

Please, tell me if you need more details.

About this issue

  • Original URL
  • State: closed
  • Created 10 years ago
  • Reactions: 2
  • Comments: 24 (13 by maintainers)

Most upvoted comments

Is there a way to skip the triples (while parsing) with such invalid URI?

i object auto-correcting such things as long term it will introduce more errors than it solves.

Let’s extend your example a bit…:

https://allevents.in/santa rita
https://allevents.in/santa%20rita?query= foo bar&bla  # only query part unescaped? is the & part of the query value or a new param?
https://allevents.in/santa+rita?query= foo bar  # other common " " replacement, should query part do it similar?
https://allevents.in/santa_rita?query= foo bar  # wikipedia " " replacement
http://example.com/jörn  # did you actually mean the IRI (UTF-8 'ö') or URI ('%C3%B6')?
...

I see that it is tempting to say “auto-correct the simple and extremely common stuff”. However, we have to weigh this against providing a consistent, deterministic lib. I’m quite convinced that the way we handle this, namely expect the developer to give us correct URIs is the least problematic in the end.

Given that: warnings in early development are a good way to make a developer aware. If some developer uses invalid URIs as URIs, then they should definitely know about this. In production code without a configured logger that warning isn’t shown if i’m not mistaken. The other cases are:

  • you have a logger configured (then please configure it as you like, the rdflib logging messages are in the rdflib namespace, if you want to silence them, do so)
  • if you don’t have a logger configured:
    • if you’re in interactive mode, then you’re probably developing and should know
    • if you’re not, then the warnings aren’t shown