ibis: bug: error on `pivot_longer` of more than one category, using Polars backend
What happened?
When attempting to pivot a table with more than one value in the ‘retained’ or category column, the error (below) is thrown when using Polar backend, but works fine with DuckDB backend. The same code works fine if there is only one value in the retained column.
Code to reproduce:
"""Polars backend throws an error if more than one category is pivoted."""
import ibis
from ibis import _
import ibis.expr.selectors as s
import pandas as pd
data = {
"product": ["apples", "oranges"],
"price": [4, 7],
"qty": [42, 84]
}
df = pd.DataFrame(data)
conn_duck = ibis.duckdb.connect()
conn_polars = ibis.polars.connect()
conn_duck.register(df, table_name="fruit")
conn_polars.register(df, table_name="fruit")
def pivot_duckdb():
"""Works fine for two categories."""
t = conn_duck.table("fruit")
return t.pivot_longer(~s.c("product"))
def pivot_polars_error():
"""ERROR !!!"""
t = conn_polars.table("fruit")
return t.pivot_longer(~s.c("product"))
def pivot_polars_ok():
"""Works fine if limited to a single category."""
t = conn_polars.table("fruit").filter(_["product"] == "apples")
return t.pivot_longer(~s.c("product"))
if __name__ == '__main__':
print(pivot_duckdb().execute()) # works OK, two categories
print(pivot_polars_ok().execute()) # works OK, single category
print(pivot_polars_error().execute()) # throws error for more than one category
What version of ibis are you using?
5.0.0
What backend(s) are you using, if any?
DuckDB, Polars
Relevant log output
Traceback (most recent call last):
File "/Users/redacted/PycharmProjects/ibis-demo/bugs/polars_pivot.py", line 41, in <module>
print(pivot_polars_error().execute())
File "/Users/redacted/.pyenv/versions/ibis-experiments/lib/python3.10/site-packages/ibis/expr/types/core.py", line 303, in execute
return self._find_backend(use_default=True).execute(
File "/Users/redacted/.pyenv/versions/ibis-experiments/lib/python3.10/site-packages/ibis/backends/polars/__init__.py", line 328, in execute
df = lf.collect()
File "/Users/redacted/.pyenv/versions/ibis-experiments/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 1443, in collect
return pli.wrap_df(ldf.collect())
exceptions.ComputeError: series length 2 doesn't match the dataframe height of 4
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 26 (15 by maintainers)
Ok, awesome! I think this gives me enough to finally be able to address this issue:
Will report back if I still need some more help!