ibis: bug: unexpected SQL generated for multiple join

What happened?

I run this code snippet:

lvl1 = ibis.table([('id', 'int64'), ('slug', 'string'), ('lvl2_key', 'int64')])
lvl2 = ibis.table([('id', 'int64'), ('slug', 'string'), ('lvl3_key', 'int64')])
lvl3 = ibis.table([('id', 'int64'), ('slug', 'string')])

expr = (
    lvl1
    .join(lvl2, lvl2.id == lvl1.lvl2_key)
    .join(lvl3, lvl3.id == lvl2.lvl3_key)
    .select(
        lvl1.id.name('lvl1_id'),
        lvl1.slug.name('lvl1_name'),
        lvl2.slug.name('lvl2_name'),
        lvl3.slug.name('lvl3_name'),
    )
)
ibis.show_sql(expr)

which generates the following SQL:

SELECT
  t3.id AS lvl1_id,
  t3.slug AS lvl1_name,
  t2.slug AS lvl2_name,
  t1.slug AS lvl3_name
FROM unbound_table_1 AS t3, unbound_table_2 AS t2, (
  SELECT
    t2.id AS id_x,
    t2.slug AS slug_x,
    t2.lvl2_key AS lvl2_key,
    t3.id AS id_y,
    t3.slug AS slug_y,
    t3.lvl3_key AS lvl3_key
  FROM unbound_table_1 AS t2
  JOIN unbound_table_2 AS t3
    ON t3.id = t2.lvl2_key
) AS t0
JOIN unbound_table_3 AS t1
  ON t1.id = t2.lvl3_key

when I would expect:

SELECT
  t0.id AS lvl1_id,
  t0.slug AS lvl1_name,
  t1.slug AS lvl2_name,
  t2.slug AS lvl3_name
FROM unbound_table_1 AS t0
JOIN unbound_table_2 AS t1
  ON t1.id = t0.lvl2_key
JOIN unbound_table_3 AS t2
  ON t2.id = t1.lvl3_key

The way the SQL is currently written causes problems when the expression is reused in subsequent joins.

What version of ibis are you using?

4.1.0

What backend(s) are you using, if any?

DuckDB, BigQuery

Relevant log output

No response

Code of Conduct

  • I agree to follow this project’s Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 24 (9 by maintainers)

Most upvoted comments

Honestly, I’d really prefer Ibis to behave like SQL and not try to guess at what I want. If there are duplicate columns which can’t be logically resolved then error out at query time and tell the user they need to resolve the duplicates. Making magic _x and _y columns appear is what Pandas does but it isn’t very helpful in my experience.