rdflib: Invalid URI
I got this:
File "ws23/ws23.py", line 35, in web_search_to_triples triples = g.serialize(format=RDFLIB_FORMAT).split("\n")
File "/usr/local/lib/python2.7/dist-packages/rdflib/graph.py", line 936, in serialize serializer.serialize(stream, base=base, encoding=encoding, **args)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/turtle.py", line 208, in serialize if self.statement(subject) and not firstTime:
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/n3.py", line 92, in statement or super(N3Serializer, self).statement(subject))
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/turtle.py", line 269, in statement return self.s_squared(subject) or self.s_default(subject)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/turtle.py", line 282, in s_squared self.predicateList(subject)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/turtle.py", line 373, in predicateList self.objectList(properties[propList[0]])
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/turtle.py", line 388, in objectList self.path(objects[0], OBJECT)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/n3.py", line 96, in path super(N3Serializer, self).path(node, position, newline)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/turtle.py", line 288, in path or self.p_default(node, position, newline)):
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/turtle.py", line 294, in p_default self.write(self.label(node, position))
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/serializers/turtle.py", line 310, in label return self.getQName(node, position == VERB) or node.n3()
File "/usr/local/lib/python2.7/dist-packages/rdflib/term.py", line 224, in n3 raise Exception('"%s" does not look like a valid URI, I cannot serialize this as N3/Turtle. Perhaps you wanted to urlencode it?'%self)
Exception: "http://www.imdb.com/title/tt0091369//search/title?locations=West Wycombe Park, West Wycombe, Buckinghamshire, England, UK&ref_=tt_dt_dt" does not look like a valid URI, I cannot serialize this as N3/Turtle. Perhaps you wanted to urlencode it?
This is the full triple:
_:node81b1978fce492c4b779bdd9d709f9e7f <http://schema.org/Movie/url> <http://www.imdb.com/title/tt0091369//search/title?locations=West Wycombe Park, West Wycombe, Buckinghamshire, England, UK&ref_=tt_dt_dt> .
I could observe that the triple is correctly parsed but cannot be serialized:
from rdflib import Graph
t = "_:node81b1978fce492c4b779bdd9d709f9e7f <http://schema.org/Movie/url> <http://www.imdb.com/title/tt0091369//search/title?locations=West Wycombe Park, West Wycombe, Buckinghamshire, England, UK&ref_=tt_dt_dt> ."
g = Graph()
g.parse(data=t, format="n3")
for s, p, o in g:
print s, p, o
g.serialize(format="n3")
Please, tell me if you need more details.
About this issue
- Original URL
- State: closed
- Created 10 years ago
- Reactions: 2
- Comments: 24 (13 by maintainers)
Is there a way to skip the triples (while parsing) with such invalid URI?
i object auto-correcting such things as long term it will introduce more errors than it solves.
Let’s extend your example a bit…:
I see that it is tempting to say “auto-correct the simple and extremely common stuff”. However, we have to weigh this against providing a consistent, deterministic lib. I’m quite convinced that the way we handle this, namely expect the developer to give us correct URIs is the least problematic in the end.
Given that: warnings in early development are a good way to make a developer aware. If some developer uses invalid URIs as URIs, then they should definitely know about this. In production code without a configured logger that warning isn’t shown if i’m not mistaken. The other cases are: