datafusion: Don't ignore failed optimizer rules

Describe the bug Some logical optimizer rules like type coercion are actually essential for correct query execution, but are currently silently ignored when they failed.

To Reproduce TBD

Expected behavior Don’t skip failed rules (this probably requires us to remove the non-failing OptimizerRule::optimize method).

Workaround This behavior is controlled by a config option (that defaults to true meaning to silently skip failed errors).

You can see this in the datafusion cli:

DataFusion CLI v20.0.0
❯ show all;
+-----------------------------------------------------------+------------+
| name                                                      | setting    |
+-----------------------------------------------------------+------------+
...
| datafusion.optimizer.skip_failed_rules                    | true       |
...
+-----------------------------------------------------------+------------+
35 rows in set. Query took 0.038 seconds.

Datafusion can be configured to fail hard if an optimizer rule fails like

❯ -- default behavior
❯ set datafusion.optimizer.skip_failed_rules = false;
0 rows in set. Query took 0.023 seconds.

Additional context See this discussion.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 19 (19 by maintainers)

Most upvoted comments

Well I think the rationale / question is if SQL expression semantics allow / require eager evaluation or if delayed evaluation

These are the variants for runtime behavior, not related to compile time checking I think. I’ve done some tests on Postgres, and find the expression when FALSE then ... will be removed before type checking, so if you have the expr when FALSE then 1/0 in your query, it is always valid: https://onecompiler.com/postgresql/3yskvh35x

But if the invalid expression can not be removed at compile time, a Divide by zero error will always return before executing the query, even if the invalid code won’t be touched at runtime: https://onecompiler.com/postgresql/3yskvqaxf

I tried to remove the config skip_failed_rules and met several test failures tracked in #4685. Feel free to update the task-list if you find more and welcome to fix them if you have time!

panic!(…): there is a bug in the optimizer.

I would personally rather have an InternalError rather than a panic

Make sense to me. It’s a good discussion point. Some rule is essential, we shouldn’t tolerate them failed. BTW, I think we also should make optimizer rule more stable instead of ignoring error.
IMHO, I think it’s not good that tolerate error in optimizer. If a rule want to tolerate failed, I think it may be better handle inside itself.

The “when to check errors” is highly dependent on the database. I’ve worked with MS cloud SQL and Exasol before and in some cases, type errors (e.g. due to incompatible operators or functions that only accept a certain type) were only raised if at least a single row was passed throw it (post joins and where selections). These errors happened at query execution (not planning). From a user’s perspective this was highly confusing. I recommend we take a look at the SQL standard and if it allows for it we bail out of invalid queries as early as possible, no matter if there’s any data to query.

original “ignore optimizer rule if error” was added (maybe by @andygrove or @avantgardnerio ) because at that time there was no way to distinguish between “Optimizer didn’t error but could not handle the query” and “optimizer hit a real error”

That’s correct. Now that we have the try_optimize method I support failing on errors and we will soon fix cases where rules incorrectly return errors.