scryer-prolog: Much slower than SWI checking [wn_valid] from wordnet-prolog

I have forked ekaf/wordnet-prolog and made a few changes needed to get a validity check to run in Scryer.

Whereas swipl does [wn_valid]. in about 13 seconds, I find that scryer-prolog (“v0.9.1-51-gfd191285”) takes 1 hour. All but 24 seconds of this is spent in goal check_keys/0:

check_keys:-
  writeln('Searching for ambiguous sense keys'),
  findall(X, multikey(X), L),
  list_to_set(L,S),
  member(K,S),
  format("Ambiguous key: ~w\n",[K]),
  findall(I, sk(I,_,K), IL),
  list_to_set(IL,IS),
  member(I,IS),
  g(I,G),
  format("    ->~w (~w)\n",[I,G]),
  false.
check_keys:-
  ok,
  nl.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 2
  • Comments: 20 (6 by maintainers)

Most upvoted comments

With the above application of term_expansion/2, I can close this issue.

Cowabunga:

% Appended the following to 'scryer.pl'
:- discontiguous(sk_1/3).
:- discontiguous(sk_3/3).

term_expansion(sk(X,Y,Z),
               [sk_1(X,Y,Z),
                sk_3(Z,X,Y)]).

sk(X, Y, Z) :-
        (   nonvar(X) ->
            sk_1(X, Y, Z)
        ;   nonvar(Z) ->
            sk_3(Z, X, Y)
        ;   sk_1(X, Y, Z)
        ).

Result:

?- [wn_valid].
output/wn_valid.pl-Output-3.1
Semantic Relations: [at,cs,ent,hyp,ins,mm,mp,ms,sim,vgp]
... etc. etc. ...
   % CPU time: 30.251s
   true.
?- 

… or just compare ?- multikey(X), false.

May I suggest to use the GNU Prolog repository to file and discuss GNU Prolog-specific issues:

https://github.com/didoudiaz/gprolog ?

Would you mind reducing the size of your finding? For example, you could introduce false after some goal and compare that failure slice with both systems. It could be the first findall/3 alone that makes problems. Also, are you sure that you used the same apply/2 definition? If you can, directly rewrite any goal for apply/2 with a fixed list to the corresponding call/N. These apply/2 goals are really so 1970s

I have added a support for term_expansion/2 in gprolog which works with the above proposed workaround. However, here is an alternative solution based on findall/setof and keysort (which are generally efficiently implemented). It avoids to duplication of the large sk/3 database (207272 facts) as with the term_expansion/2 solution.

To try it: simply replace in wn_valid.pl the definitons of multikey/1 and checkeys/0 by the following code:

multikey(K, IL) :-
        findall(K-I, sk(I, _, K), L1),
        keysort(L1, L2),
        setof(I, member(K-I, L2), IL),
        length(IL, N),
        N > 1.

check_keys:-
        write('Searching for ambiguous sense keys'), nl,
        multikey(K, IL),
        format('Ambiguous key: ~w\n',[K]),
        member(I,IL),
        g(I,G),
        format('    ->~w (~w)\n',[I,G]),
        false.
check_keys:-
        ok,
        nl.

@dcnorris I had to do several changes in the original code from Eric Kafe (too much SWI-oriented). I also implemented the changes for the `term_expansion/2’ solution and for the above solution. I can share them with you if you want.

I find gprolog is taking a long time already with [wn_valid], but that like Scryer it has a term_expansion/2 mechanism. So perhaps this solution will apply uniformly across both of these engines.

From the GNU Prolog documentation:

The GNU Prolog compiler (section 4.4) automatically calls expand_term/2 on each Term1 read in. However, in the current release, only DCG transformation are done by the compiler (i.e. term_expansion/2 cannot be used). To use term_expansion/2, it is necessary to call expand_term/2 explicitly.

And building on that, you can extend sk/3 to dynamically switch between the different variants, for example (in the case above):

sk(X, Y, Z) :-
        (   nonvar(X) ->
            sk_1(X, Y, Z)
        ;   nonvar(Z) ->
            sk_3(Z, X, Y)
        ;   sk_1(X, Y, Z)
        ).

In this way, all your code can use sk/3, and when Scryer Prolog’s built-in indexing improves, you simply remove these manual workarounds.

@dcnorris: Improving ISO compliance for better portability would definitely help already, if only to make such comparisons between different systems easier.

Regarding the auxiliary facts specifically useful for Scryer Prolog and other systems with limited built-in indexing, you may be able to use Scryer Prolog’s term_expansion/2 mechanism to automatically generate, for each defined fact, also its corresponding alternative definition with a different argument order.

For example, try:

:- discontiguous(sk_1/3).
:- discontiguous(sk_3/3).

sk(X, Y, Z) :- sk_1(X, Y, Z).

term_expansion(sk(X,Y,Z),
               [sk_1(X,Y,Z),
                sk_3(Z,X,Y)]).

sk(100001740,1,'entity%1:03:00::').
sk(100001930,1,'physical_entity%1:03:00::').
sk(100002137,1,'abstraction%1:03:00::').
sk(100002137,2,'abstract_entity%1:03:00::').
sk(100002452,1,'thing%1:03:00::').
sk(100002684,1,'object%1:03:00::').
sk(100002684,2,'physical_object%1:03:00::').
...

In this way, you automatically get, for each sk/3 fact that occurs in the file, the corresponding sk_1/3 and sk_3/3 facts.

This seems like an issue with indexing. Specifically, we have:

multikey(K):-
  sk(I,_,K),
  sk(J,_,K),
  I\=J.

where sk/3 consists of hundreds of thousands of plain facts.

In the highlighted goal above, i.e., sk(J, _, K), J is a variable and K is instantiated. So, a Prolog system with good indexing can use K to quickly detect which facts apply in this case.

Scryer Prolog currently only indexes on the first instantiated argument that occurs in any clause of a predicate, so in this case, only the first argument of sk/3 is used for indexing. This leads to quadratic overhead multikey/1, because for each fact, all facts have to be checked.

Such an issue can be worked around “manually”, by adding (or generating) auxiliary facts, say sk_3/3, where the arguments are reordered to benefit from the engine’s argument indexing if they are instantiated at the time of the call.

Ideally of course, the system would automatically generate suitable indices on demand, depending on which arguments are instantiated at the time of the call, so that the fitting clauses can later be found much more quickly.

Related to #192.

It’s all in multikey/1, then:

?- time((multikey(X), false)).
   % CPU time: 3716.558s
   false.

SWI by comparison:

?- time((multikey(X), false)). % Using swipl
% 414,620 inferences, 0.078 CPU in 0.078 seconds (100% CPU, 5328386 Lips)
false.

Here is multikey/1 for quick reference:

multikey(K):-
  sk(I,_,K),
  sk(J,_,K),
  I\=J.

I am sold on failure slicing!