gpdb: FOR over SELECT fails on temporary tablespace in PL/pgSQL
Bug Report
Greenplum version or build
6.18.2, 6X_STABLE (current top e3e95865e77)
The issue is very strange, different versions behave quite differently. See details in the following sections.
Expected behavior
The following code won’t fail:
create tablespace myspace location '/tmp/xxx123';
set temp_tablespaces = myspace; -- this is important to show the bug, without this it works.
DO
$$
DECLARE l_rec RECORD;
BEGIN
DROP TABLE IF EXISTS tmp_table;
CREATE TEMPORARY TABLE tmp_table
(
key VARCHAR NOT NULL,
value VARCHAR NOT NULL
) DISTRIBUTED RANDOMLY;
INSERT INTO tmp_table VALUES ('a',1),('b',2);
FOR l_rec IN ( SELECT key, value
FROM tmp_table
)
LOOP
RAISE NOTICE '%:%', l_rec.key, l_rec.value;
END LOOP;
END;
$$;
Actual behavior
Current 6X (top commit: e3e95865e), I hit two kinds of errors:
In a newly created cluster, run the case:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------
PostgreSQL 9.4.26 (Greenplum Database 6.18.2+dev.171.ge3e95865e7 build dev)
(1 row)
gpadmin=# DO
$$
DECLARE l_rec RECORD;
BEGIN
DROP TABLE IF EXISTS tmp_table;
CREATE TEMPORARY TABLE tmp_table
(
key VARCHAR NOT NULL,
value VARCHAR NOT NULL
) DISTRIBUTED RANDOMLY;
INSERT INTO tmp_table VALUES ('a',1),('b',2);
FOR l_rec IN ( SELECT key, value
FROM tmp_table
)
LOOP
RAISE NOTICE '%:%', l_rec.key, l_rec.value;
END LOOP;
END;
$$;
NOTICE: table "tmp_table" does not exist, skipping
CONTEXT: SQL statement "DROP TABLE IF EXISTS tmp_table"
PL/pgSQL function inline_code_block line 4 at SQL statement
ERROR: too many LWLocks taken (lwlock.c:559) (seg0 slice1 127.0.1.1:6002 pid=228592) (lwlock.c:559)
CONTEXT: PL/pgSQL function inline_code_block line 11 at FOR over SELECT rows
gpadmin=#
log out and then log in to re-run:
gpadmin=# set temp_tablespaces = myspace;
SET
gpadmin=# DO
$$
DECLARE l_rec RECORD;
BEGIN
DROP TABLE IF EXISTS tmp_table;
CREATE TEMPORARY TABLE tmp_table
(
key VARCHAR NOT NULL,
value VARCHAR NOT NULL
) DISTRIBUTED RANDOMLY;
INSERT INTO tmp_table VALUES ('a',1),('b',2);
FOR l_rec IN ( SELECT key, value
FROM tmp_table
)
LOOP
RAISE NOTICE '%:%', l_rec.key, l_rec.value;
END LOOP;
END;
$$;
NOTICE: table "tmp_table" does not exist, skipping
CONTEXT: SQL statement "DROP TABLE IF EXISTS tmp_table"
PL/pgSQL function inline_code_block line 4 at SQL statement
ERROR: could not create directory "pg_tblspc/16414/GPDB_6_301908232/16410": No such file or directory
CONTEXT: SQL statement "CREATE TEMPORARY TABLE tmp_table
(
key VARCHAR NOT NULL,
value VARCHAR NOT NULL
) DISTRIBUTED RANDOMLY"
PL/pgSQL function inline_code_block line 5 at SQL statement
Then drop tablespace and re-create and rerun:
gpadmin=# drop tablespace myspace ;
DROP TABLESPACE
gpadmin=# create tablespace myspace location '/tmp/xxx123';
CREATE TABLESPACE
gpadmin=# set temp_tablespaces = myspace;
SET
gpadmin=# DO
$$
DECLARE l_rec RECORD;
BEGIN
DROP TABLE IF EXISTS tmp_table;
CREATE TEMPORARY TABLE tmp_table
(
key VARCHAR NOT NULL,
value VARCHAR NOT NULL
) DISTRIBUTED RANDOMLY;
INSERT INTO tmp_table VALUES ('a',1),('b',2);
FOR l_rec IN ( SELECT key, value
FROM tmp_table
)
LOOP
RAISE NOTICE '%:%', l_rec.key, l_rec.value;
END LOOP;
END;
$$;
NOTICE: table "tmp_table" does not exist, skipping
CONTEXT: SQL statement "DROP TABLE IF EXISTS tmp_table"
PL/pgSQL function inline_code_block line 4 at SQL statement
ERROR: relation not found (OID 16453) (seg0 slice1 127.0.1.1:6002 pid=228752)
DETAIL: This can be validly caused by a concurrent delete operation on this object.
CONTEXT: PL/pgSQL function inline_code_block line 11 at FOR over SELECT rows
Test using 6.18.2:
psql:a.sql:25: ERROR: could not open existing temporary file "base/pgsql_tmp/pgsql_tmp_sess15_w3326427_qdxid7_sync8": No such file or directory (fd.c:1532) (seg0 slice1 10.117.190.60:6002 pid=3326433) (fd.c:1532)
Step to reproduce the behavior
Run the code:
create tablespace myspace location '/tmp/xxx123';
set temp_tablespaces = myspace; -- this is important to show the bug, without this it works.
DO
$$
DECLARE l_rec RECORD;
BEGIN
DROP TABLE IF EXISTS tmp_table;
CREATE TEMPORARY TABLE tmp_table
(
key VARCHAR NOT NULL,
value VARCHAR NOT NULL
) DISTRIBUTED RANDOMLY;
INSERT INTO tmp_table VALUES ('a',1),('b',2);
FOR l_rec IN ( SELECT key, value
FROM tmp_table
)
LOOP
RAISE NOTICE '%:%', l_rec.key, l_rec.value;
END LOOP;
END;
$$;
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 17 (14 by maintainers)
Commits related to this issue
- Not use temp tablespaces for snapshot buf-file passed to reader cursor. The buf-file of snapshot passed to reader cursor is not large so we do not need to put it in temp tablespaces even the GUC is s... — committed to kainwen/gpdb by kainwen 3 years ago
- Not use temp tablespaces for snapshot buf-file passed to reader cursor. The buf-file of snapshot passed to reader cursor is not large so we do not need to put it in temp tablespaces even the GUC is s... — committed to greenplum-db/gpdb by kainwen 3 years ago
To provide clarity on the issue with #12755, here is stack where recursion happens (will post other update for
relation not founderror)High-level
I am feeling for shared snapshot and for combocids for sure we should always place them in default locations.
temp_tablespacesare really more for spill files and such, not for such small meta-data files. Though need to see how that can possibly be achieved from code.One of the workarounds for this issue could be to avoid
PrepareTempTablespaces()if file outlives the transaction boundaries as in that case anywaysOpenNamedTemporaryFile()is going to place the file in database’s default location. For combocids we always passinterXactto true currently.Note: gpdb master branch doesn’t face this issue or challenge because we moved away from writing shared snapshot and combocid to on-disk file. Instead we use DSM segment shared by the QE writer and reader process for this.
Why 6X_STABLE (top fc39bc9cd35972) now will hit infinite loop?
Toast table and comb cid
See tmp_table’s definition:
The table’s fields is toast-able, so while creating the table, it will also create a related toast table, and a
heap_updatewill be invoked to set thepg_classentry’sreltoastrelidcolumn.In the same transaction, it creates a tuple and update a tuple, so the old tuple’s
cmin_cmaxwill be a comb cid. (You can use pageinspect andgp_select_invisibleto do some local experiment).Reader Gang of Cursor
It will invoke
readSharedLocalSnapshot_forCursorto try to fetch the buf-file and load the snapshot, and if the GUCtemp_tablespacesis set, it will try to get the information of the tablespace, this leads to catalog access (detailed please refer above comments call stack https://github.com/greenplum-db/gpdb/issues/12871#issuecomment-982205797):pg_tablespace(oid = 1213)pg_classusing seq scantmp_tableinpg_class(here by old, I mean the one beforeheap_update, the one that not set toastrelid), that tuple is created and updated by current transaction, so it will try to compare command idloadSharedComboCommandId(we are cursor, we are reader gang, let’s go to read the buf-file and load it)BufFileOpenNamedTempis invoked, chain:GetRealCmin -> getSharedComboCidEntry -> loadSharedComboCommandId -> BufFileOpenNamedTempMisc: not toast-able
So if the table cannot be toast-able, we can never hit this issue, because there is no comb cid.
I try this code and never hit the issue:
Un-answered Questions
ERROR: could not create directory "pg_tblspc/16414/GPDB_6_301908232/16410": No such file or directory?Let me for sake of completeness post the RCA for the original issue that was reported for 6.18.2 release, the error
psql:a.sql:25: ERROR: could not open existing temporary file "base/pgsql_tmp/pgsql_tmp_sess15_w3326427_qdxid7_sync8": No such file or directory (fd.c:In writer gang during
DefineRelation()whenGetDefaultTablespace()is called, it internally callsPrepareTempTablespaces()which sets up the tablespaces for writer gang. So, the shared snapshot file gets created intemp_tablespaceslocation. Though in reader gang, noPrepareTempTablespaces()call happens hence it continues to look for the file in default location causing the error.