datafusion: Null constants can not be used in most expressions

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I want to be able to use null constants in my datafusion queries (e.g. NULL < 5). While at first this seems a silly usecase, we end up with such predicates during automated rewrites in IOx (where sometimes we replace column references with null constants – See more details on https://github.com/influxdata/influxdb_iox/issues/883)

While they are allowed in the SQL or expr language (e.g. Expr::Literal(ScalarValue::Utf8(None))) they typically generate some runtime error before execution

Describe the solution you’d like I want to be able to write expressions like NULL < 5 in all cases where I could write col < 5 and have DataFusion evaluate to the correct thing.

The proper fix for this ticket is likely first class support for DataType::Null and then proper coercion logic in the codebase as required. See the discussion by @Dandandan and @Jimexist on #1179 at https://github.com/apache/arrow-datafusion/issues/1179#issuecomment-952997737

Task List

Additional context This is a tracking ticket for a variety of null incorrectness issues I found while working on https://github.com/apache/arrow-datafusion/issues/1179

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 2
  • Comments: 17 (17 by maintainers)

Most upvoted comments

@alamb Maybe I can pick up some tickets as appropriate.

Thank you! @alamb

Seems #2253 is merged, I’ll try this suggestion ❤️

However, once #2253 is merged I think you can make your datafusion branch point directly at a revision on apache/arrow-rs rather than my arrow-s branch

thank you @alamb for making this branch, I haven’t committed pr to a private repo before, I will try this branch later in my spare time

I can’t remember how far I got with this one. If I were doing this project what I would do is try and change the code so that a Null constant (aka ScalarValue with an embedded None) as DataType::Null and then implement coercion rules for DataType::Null <–> All other types.

@alamb I commit a pr apache/arrow-rs#1572 to implement coercion rules for DataType::Null casting from and to all other types in arrow-rs. I think we can solve most of the NULL coercion issues in df based on it. Please have a review, thank you.

I have a draft up at https://github.com/apache/arrow-datafusion/pull/1199 please take an initial look @alamb