ibis: bug: unexpected SQL generated for multiple join
What happened?
I run this code snippet:
lvl1 = ibis.table([('id', 'int64'), ('slug', 'string'), ('lvl2_key', 'int64')])
lvl2 = ibis.table([('id', 'int64'), ('slug', 'string'), ('lvl3_key', 'int64')])
lvl3 = ibis.table([('id', 'int64'), ('slug', 'string')])
expr = (
lvl1
.join(lvl2, lvl2.id == lvl1.lvl2_key)
.join(lvl3, lvl3.id == lvl2.lvl3_key)
.select(
lvl1.id.name('lvl1_id'),
lvl1.slug.name('lvl1_name'),
lvl2.slug.name('lvl2_name'),
lvl3.slug.name('lvl3_name'),
)
)
ibis.show_sql(expr)
which generates the following SQL:
SELECT
t3.id AS lvl1_id,
t3.slug AS lvl1_name,
t2.slug AS lvl2_name,
t1.slug AS lvl3_name
FROM unbound_table_1 AS t3, unbound_table_2 AS t2, (
SELECT
t2.id AS id_x,
t2.slug AS slug_x,
t2.lvl2_key AS lvl2_key,
t3.id AS id_y,
t3.slug AS slug_y,
t3.lvl3_key AS lvl3_key
FROM unbound_table_1 AS t2
JOIN unbound_table_2 AS t3
ON t3.id = t2.lvl2_key
) AS t0
JOIN unbound_table_3 AS t1
ON t1.id = t2.lvl3_key
when I would expect:
SELECT
t0.id AS lvl1_id,
t0.slug AS lvl1_name,
t1.slug AS lvl2_name,
t2.slug AS lvl3_name
FROM unbound_table_1 AS t0
JOIN unbound_table_2 AS t1
ON t1.id = t0.lvl2_key
JOIN unbound_table_3 AS t2
ON t2.id = t1.lvl3_key
The way the SQL is currently written causes problems when the expression is reused in subsequent joins.
What version of ibis are you using?
4.1.0
What backend(s) are you using, if any?
DuckDB, BigQuery
Relevant log output
No response
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 24 (9 by maintainers)
Honestly, I’d really prefer Ibis to behave like SQL and not try to guess at what I want. If there are duplicate columns which can’t be logically resolved then error out at query time and tell the user they need to resolve the duplicates. Making magic
_xand_ycolumns appear is what Pandas does but it isn’t very helpful in my experience.