rdflib: RDFlib makes invalid Turtle cURIs

Some cURIs cannot be parsed with RDFlib, even those produced by RDFlib.

For example:

If we have the following RDF/TURTLE:

@prefix ex: <https://example.org/term#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix dbr: <http://dbpedia.org/resource/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://example.org/entity> ex:anchorOf "This press"^^xsd:string ;
            a <http://dbpedia.org/resource/This_(journal)> .

We load the TTL using g.parse(data=ttl) and we serialize it with g.serialize() we get:

@prefix dbr: <http://dbpedia.org/resource/> .
@prefix ex: <https://example.org/term#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://example.org/entity> a dbr:This_(journal) ;
    ex:anchorOf "This press"^^xsd:string .

Then, if we load the output again with g.parse(data=ttl, format="turtle"), we get the following errorr:

rdflib.plugins.parsers.notation3.BadSyntax: at line 5 of <>: Bad syntax (expected ‘.’ or ‘}’ or ‘]’ at end of statement) at ^ in: “…b’/2001/XMLSchema#> .\n\nhttp://example.org/entity a dbr:This_‘^b’(journal) ;\n ex:anchorOf “This press”^^xsd:string .'”

The error is cause by dbr:This_(journal), more presicely by the ‘(’ and ‘)’. It seems RDFlib does not like its own output.

Any ideas on what may be happing here?

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 1
  • Comments: 17 (6 by maintainers)

Most upvoted comments

It is important here (in parsing and serializing various forms) to carefully notice and comply with the appropriate specifications:

  • QNames (for XML elements and attributes)
  • PNames (for Turtle, TriG and SPARQL)
  • CURIEs (for RDFa, mainly)
  • Compact IRIs (for JSON-LD; mostly(?) identical to CURIEs)

This list is informally ordered from most to least restrictive in terms of what characters are allowed. In PNames, more are allowed than in QNames, but some have to be escaped; in CURIEs, no valid IRI character has to be escaped in the local part. The rules for how prefixes are defined and their forms also differ. (In all but QNames, the _ prefix is used for blank node identifiers. In RDFa and JSON-LD, CURIEs mainly share the same lexical space as regular IRIs, for better or worse.)