rdflib: SDO namespace does not match predicates from http

In namespace.py SDO is defined as SDO = Namespace("https://schema.org/"), and it won’t match predicates from HTTP.

For example if I create a graph like this:

import rdflib
graph = rdflib.Graph()
obj = rdflib.term.BNode()
pred = rdflib.term.URIRef('http://schema.org/availability')
subj = rdflib.term.Literal("https://schema.org/InStock")
graph.add((obj, pred, subj))

I would expect this to return a list of length 1, but it returns an empty list instead:

list(graph.subject_objects(rdflib.namespace.SDO.availability))

According to schema.org FAQ you can use either http://schema.org or https://schema.org in namespaces. So they should be equivalent.

I’m not sure whether there is a way to treat the two as equivalent in the library. In current usage from the Web Data Commons is seems like http://schema.org is more common, but both occur.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 16 (6 by maintainers)

Most upvoted comments

@nicholascar this approach can make unintended changes to data. For example, applying that code to the following would alter the description literal:

{
    "@context":"http://schema.org/",
    "@type":"Thing",
    "description":"http://schema.org/ is the start of this description"
}

Adding a test for the object type will help:

for s, p, o in g.triples(None):
    if str(s).startswith("http://schema.org"):
        g.remove((s, p, o))
        g.add((rdflib.URIRef(str(s).replace("http", "https")), p, o))
    
    if str(p).startswith("http://schema.org"):
        g.remove((s, p, o))
        g.add((s, rdflib.URIRef(str(p).replace("http", "https")), p, o))

    if isinstance(o, rdflib.URIRef):
        if str(o).startswith("http://schema.org"):
            g.remove((s, p, o))
            g.add((s, p, rdflib.URIRef(str(o).replace("http", "https"))))

Do you mean something like this:

for s, p, o in g.triples():
    if str(s).startswith("http://schema.org"):
        g.remove((s, p, o))
        g.add((URIRef(str(s).replace("http", "https")), p, o))
    
    if str(p).startswith("http://schema.org"):
        g.remove((s, p, o))
        g.add((s, URIRef(str(p).replace("http", "https")), p, o))

    if str(o).startswith("http://schema.org"):
        g.remove((s, p, o))
        g.add((s, p, URIRef(str(o).replace("http", "https"))))