age: Index is not used in the WHERE clause

Describe the bug

Index is not used in the WHERE clause.

How are you accessing AGE (Command line, driver, etc.)?

  • JDBC

What data setup do we need to do?

LOAD 'age';
SET search_path = ag_catalog, "$user", public;

select ag_catalog.create_graph('test_graph');
select ag_catalog.create_vlabel('test_graph','profile');

-- works only for MATCH because GIN only makes sense for pattern matching what MATCH clause is
CREATE INDEX profile_gin_idx ON test_graph."profile" USING GIN (properties);

-- does not work in the where clause
CREATE INDEX profile_pet_btree_idx1 ON test_graph."profile" USING BTREE ((properties -> 'pet'));

-- does not work in the where clause
CREATE INDEX profile_pet_btree_idx2 ON test_graph."profile" USING BTREE (ag_catalog.agtype_access_operator(properties, '"pet"'::ag_catalog.agtype));

------ generate more data -------
DO
$do$
    BEGIN
        FOR i IN 1..10000 LOOP
            EXECUTE format('
                select * from ag_catalog.cypher(''test_graph'',
                $$
                    CREATE (any_vertex: profile { `id`: "%s", `pet`: "%s", `hidden`: %s })
                    RETURN any_vertex
                $$
            ) as (any_vertex ag_catalog.agtype)',
                   (SELECT uuid_in(md5(random()::text || now()::text)::cstring)),
                   (SELECT ('[0:2]={dog,cat,bird}'::text[])[floor(random()*3)]),
                   (SELECT ('[0:1]={true,false}'::text[])[floor(random()*2)])
                );
        END LOOP;
    END
$do$;

What is the command that caused the error?

select
    any_profile
from ag_catalog.cypher('test_graph',$$
    EXPLAIN ANALYZE MATCH (any_profile:`profile` { hidden: false })
    WHERE any_profile.pet = 'dog'
    RETURN any_profile
$$
) as (any_profile ag_catalog.agtype);

QUERY PLAN
Bitmap Heap Scan on profile any_profile (cost=20.08…52.40 rows=1 width=32) (actual time=1.111…22.752 rows=1616 loops=1)
Recheck Cond: (properties @> agtype_build_map(‘hidden’::text, ‘false’::agtype))
Filter: (agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(id, _label_name(‘17463’::oid, id), properties), ‘“pet”’::agtype]) = ‘“dog”’::agtype)
Rows Removed by Filter: 3384
Heap Blocks: exact=143
-> Bitmap Index Scan on profile_gin_idx (cost=0.00…20.08 rows=10 width=0) (actual time=1.010…1.010 rows=5000 loops=1)
Index Cond: (properties @> agtype_build_map(‘hidden’::text, ‘false’::agtype))
Planning Time: 0.733 ms
Execution Time: 22.869 ms

Expected behavior Either of two indexes should be used

Environment (please complete the following information):

  • Version: [e.g. 1.3.0]

Additional context

As we see from the plan the filter function is applied as:
Filter: (agtype_access_operator(VARIADIC ARRAY[_agtype_build_vertex(id, _label_name('17463'::oid, id), properties), '"pet"'::agtype]) = '"dog"'::agtype)

We also tried to simulate the functional index like this with no luck because there is a mutable function involved: \

CREATE INDEX profile_pet_btree_idx3 ON test_graph."profile" USING BTREE (
ag_catalog.agtype_access_operator(VARIADIC ARRAY[age_properties(_agtype_build_vertex(id, _label_name('16945'::oid, id), properties)), '"pet"'::ag_catalog.agtype])
);

Could it be possible to add an index support for the WHERE clause at least via any functional index by not involving mutable functions ?

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 44 (28 by maintainers)

Most upvoted comments

@jrgemignani, @rafsun42 ok I will create a new issue for the proposed solution for IN operator and link it to the this one. Thank you.

@rafsun42 thank you. We will check this next week and let you know.

@vladiksun We probably should keep the issues separate but link them. Really long threads can lose their readability, especially if they diverge too much. Also, having a long thread with many issues gives the impression that it is just one issue that is being worked on, when it can be many more.

@jrgemignani I believe @rafsun42 also suggested the solution for IN operator. I can move the issue with “IN” operator to another ticket. Let me know if it is convenient to do so or leave this issue open until the solution for “IN” operator is implemented.

@rafsun42 we checked. It works on my machine. This could be a good solution

@dehowef I believe for now only IN operator. But thanks to @jrgemignani the fix for “=” operator is already in version 1.4.0

@dehowef Thank you for paying attention to this. Right now, the “==” operator also involves functional calls, so the only way to kick out the index is to reverse engineer the execution plan and extract the function itself in order to create the index.

Ideally, the WHERE should not use functional indexes at all, like MATCH does not for GIN indexes, for two reasons:

  • It could look like magic, hence the need to extract the chain of functions calls from the plan.
  • If functions definitions change, the indexes stop working.

@dehowef thanks for taking a look at this.

@vladiksun I’ll be looking into the implementation of the support function for b-tree indexing. I’ve been looking at how indexing works in AGE for quite a bit on and off. Hopefully we can fix this up soon. Thank you for communicating with us~

@jrgemignani Looks like master moved to postgres 15, we compiled the master and started checking. https://github.com/apache/age/issues/1140 closed.

@jrgemignani thank you. We will try this fix any time soon. As for the graph OID I believe we could get this OID by graph name from apache age tables and dynamically construct index creation scripts.

@jrgemignani this looks promising, thank you. By the way can you get rid of using label id while creating the index definition? As I recall we’ve got this whole function call from the execution plan as it is. I assume this might involve changing the main logic so the code doesn’t use label ID either. But when used with liquebase in the kubernetes environment this means we should precalculate the label id before we use it which is not convinient for schema initialization scripts. Ideally the index creation scripts should use predefined data if possible.