jena: Lucene query with text:prop not working in some cases ?

Version

4.9.0

Question

This query works - searching all the fields:

select * where {
?s text:query ("beer" 10) .
}

However this query - which should search only in rdfs:label and mt:altLabel fields returns 0 hits :

select * where {
?s text:query (mt:defQuery "beer" 10) .
}

This query returns also 0 hits :

select * where {
?s text:query (mt:includeNotes "beer" 10) .
}

mytest.ttl excerption:

# Text index description
<#indexLucene> 
    a text:TextIndexLucene ;
    text:directory ".../indexes/mytest" ;
    text:entityMap <#entMap> ;
    text:storeValues true ;
    text:analyzer [
       a text:ConfigurableAnalyzer ;
       text:tokenizer text:StandardTokenizer ;
       text:filters (text:ASCIIFoldingFilter text:LowerCaseFilter)
       ] ;
    text:queryParser text:AnalyzingQueryParser ;
    text:multilingualSupport true ;
    text:propLists (
        [ text:propListProp mt:defQuery ;
          text:props ( 
             rdfs:label
             mt:altLabel
             ) ;
        ]
        [ text:propListProp mt:includeNotes ;
          text:props ( 
             rdfs:label
             mt:altLabel
             mt:note
             ) ;
        ]
    ) ;
     .

<#entMap> 
    a text:EntityMap ;
    text:defaultField     "ftext" ;
    text:entityField      "uri" ;
    text:uidField         "uid" ;
    text:langField        "lang" ;
    text:graphField       "graph" ;
    text:map (
         [ text:field "ftext" ; text:predicate rdfs:label ]
         [ text:field "ftext" ; text:predicate mt:altLabel ]
         [ text:field "ftext" ; text:predicate mt:note ]
         ) .

About this issue

  • Original URL
  • State: open
  • Created 8 months ago
  • Comments: 20 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Hi @filak and thanks for the precise examples, and thanks for the ping.

I have some problems with replicating the issues described.

One thing I notice in the test data is that the mx namespace isn’t mentioned. What is the prefix mx: in mx:alt_label, is it just a typo in the example?

I copied one of the existing tests using propLists to recreate the errors, and get the 3 expected results back when using the test-data, and no items back when I tried to replicate the other example.

I first got the warning message

23:03:36 WARN  TextQueryPF :: Predicate not indexed: http://id.example.test/vocab/#alt_label
23:03:36 WARN  TextQueryPF :: objectToStruct: props are not indexed [http://www.w3.org/2004/02/skos/core#prefLabel, http://www.w3.org/2004/02/skos/core#altLabel, http://www.w3.org/2000/01/rdf-schema#label, http://id.example.test/vocab/#alt_label]

during running the test, and had to add it to the text map, and rerun the test without a warning, to get the expected result back.

       "    text:map (",
                    "         [ text:field \"label\" ; text:predicate rdfs:label ; text:noIndex true ]",
                    "         [ text:field \"altLabel\" ; text:predicate skos:altLabel ]",
+                   "         [ text:field \"alt_Label\" ; text:predicate mt:alt_label ]",
                    "         [ text:field \"prefLabel\" ; text:predicate skos:prefLabel ]",
                    "         [ text:field \"comment\" ; text:predicate rdfs:comment ]",
                    "         [ text:field \"workAuthorshipStatement\" ; text:predicate spec:workAuthorshipStatement ]",
                    "         [ text:field \"workEditionStatement\" ; text:predicate spec:workEditionStatement ]",
                    "         [ text:field \"workColophon\" ; text:predicate spec:workColophon ]",
                    "         ) ."
             

Was the props are not indexed step above silent when running?

Not sure what happens with the second step, but one thing I thought of from the example above, was that maybe there was leftover documents in the lucene folder, if it wasn’t deleted during debugging.

I think that lucene deletions on documents aren’t part of running the java command for reindexing. My information might be outdated or wrong on this, but we still delete the lucene folder, before running indexing on an offline database, during CI-jobs.

See the two tests which pass at https://github.com/apache/jena/compare/main...OyvindLGjesdal:jena:debug-text-prop-not-working-in-some-cases

I didn’t replicate your configuration in the test, so it could also be other stuff that breaks, but hope this helps.