ibis: bug: filters not applied to joined tables

Given, two tables, if they are joined, and then a filter is applied, the filter is actually being ignored:

import ibis

ibis.options.interactive = True

connection = ibis.sqlite.connect('data/geography.db')
countries = connection.table('countries')
gdp = connection.table('gdp')

expr = (countries.inner_join(gdp,
                             predicates=countries['iso_alpha3'] == gdp['country_code'])
                 .filter(gdp['year'] == 2017))
expr

Intuitively, the result should only contain the rows with year == 2017, but the filter is not applied.

The database in this example can be found at: https://github.com/datapythonista/ibis/tree/tutorial/docs/source/tutorial/data

The SQL compiled from the expression, can be obtained with:

>>> str(expr.compile())
SELECT t0.iso_alpha2, t0.iso_alpha3, t0.iso_numeric, t0.fips, t0.name, t0.capital, t0.area_km2, t0.population, t0.continent, t0.country_code, t0.year, t0.value 
FROM (SELECT t1.iso_alpha2 AS iso_alpha2, t1.iso_alpha3 AS iso_alpha3, t1.iso_numeric AS iso_numeric, t1.fips AS fips, t1.name AS name, t1.capital AS capital, t1.area_km2 AS area_km2, t1.population AS population, t1.continent AS continent, t2.country_code AS country_code, t2.year AS year, t2.value AS value 
FROM base.countries AS t1 JOIN base.gdp AS t2 ON t1.iso_alpha3 = t2.country_code) AS t0, base.gdp 
WHERE base.gdp.year = ?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 20 (20 by maintainers)

Commits related to this issue

Most upvoted comments

A third alternative is to call materialize in the filter:

import ibis

connection = ibis.sqlite.connect("docs/source/tutorial/data/geography.db")
countries = connection.table("countries")
gdp = connection.table("gdp")

expr = countries.inner_join(
    gdp, predicates=countries["iso_alpha3"] == gdp["country_code"]
).filter(lambda t: t.materialize()["year"] == 2017)

I would like to release after finishing the refactoring of the backends. Whenever that happens, but I think it’ll probably be in one week or two. If this is implemented by then it can surely be released, but I don’t see this as a blocker.