Support API for "pre-image" for pruning predicate evaluation #19722

sdf-jkl · 2026-01-09T21:34:58Z

Which issue does this PR close?

closes Support "pre-image" for pruning predicate evaluation #18320

Rationale for this change

Splitting the PR to make it more readable.

What changes are included in this PR?

Adding the udf_preimage logic without date_part implementation.

Are these changes tested?

Added unit tests for a test specific function

Are there any user-facing changes?

No

alamb

Thank you @sdf-jkl -- reviewed this PR carefully this morning and it looks great (thank you for splitting up the work), I found it well commented and well designed and a joy to read

I do think we need to add unit tests tests to for this feature, which I know you have lined up in #18789 but I think writing the unit tests in for the rewrite will make it easiest to validate.

I also have some questions about the rewrite for = (aka the boundary conditions)

datafusion/expr/src/udf.rs

alamb · 2026-01-18T12:31:33Z

datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

+            // NOTE: we only consider immutable UDFs with literal RHS values
+            Expr::BinaryExpr(BinaryExpr { left, op, right }) => {
+                use datafusion_expr::Operator::*;
+                let is_preimage_op = matches!(


it might be nice (as a follow on PR) to mention this list in the docs for preimage -- e.g. that it only applies to predicates =, !=, ...

datafusion/optimizer/src/simplify_expressions/udf_preimage.rs

sdf-jkl · 2026-01-19T18:52:46Z

I'll take a look at the failing tests

sdf-jkl · 2026-01-19T20:00:18Z

Should be good

I'll create a separate issue for date_part implementation

alamb

Thank you so much @sdf-jkl -- this is looking so close. I have one more comment on the API, but I think then we'll be good

Let me know if you would prefer me to make these changes directly rather than the fedback. I figured you would appreciate the reviews and back and forth.

datafusion/expr/src/udf.rs

alamb · 2026-01-19T21:31:05Z

datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

+    let Expr::ScalarFunction(ScalarFunction { func, args }) = left_expr else {
+        return Ok((None, None));
+    };
+    if !is_literal_or_literal_cast(right_expr) {


I wonder if there is a reason to limit this to literal ? It seems like the call to pre_image could handle this (and basically return if it wasn't a literal)

I think this is still an open question, but it is ok to handle as a follow on PR (aka widen the expressions)

Do you have an example where we could use a non-literal expr on rhs for a comparison with preimage? I can't come up with one, but if there is, we could move expression matching into preimage impl

alamb · 2026-01-19T21:32:17Z

datafusion/optimizer/src/simplify_expressions/udf_preimage.rs

+                return Ok(None);
+            }
+            match lit_expr {
+                Expr::Literal(ScalarValue::Int32(Some(500)), _) => {


Given this has to check for Expr::Literal anyways, I think the simplfy expression could just pass whatever argument in here, rather than only doing it with columns and literals

datafusion/optimizer/src/simplify_expressions/udf_preimage.rs

sdf-jkl · 2026-01-19T22:57:15Z

I definitely appreciate the feedback, and the back and forth. Thanks, I'll work on addressing it.

alamb

I am also working on some tests -- I'll make a PR to propose adding coverge to this PR

datafusion/optimizer/src/simplify_expressions/udf_preimage.rs

alamb · 2026-01-20T02:02:50Z

@sdf-jkl here are some tests and other small suggestions

Add tests for additional cases sdf-jkl/datafusion#1

Add tests for additional cases

sdf-jkl · 2026-01-20T15:53:57Z

Wow, this is much cleaner, thanks!

alamb · 2026-01-20T17:21:36Z

I think this PR needs two more things:

Fi the NULL handling (probably by not calling preimage with null constants)
Update the API to have only a single method

(I am trying to keep my review context under control, so trying to focus on getting stuff through before starting more)

sdf-jkl · 2026-01-20T19:02:19Z

Both done. Re-requested a review.

alamb

Thank you very much @sdf-jkl

This looks good to me. I would like to change the signature to use Interval rather than Box<Interval> and there are a few other small comments, but we can also do this as a follow on PR (or I can push some commits to this PR)

Thank you for hanging with this one

FYI @colinmarc -- once we get this in, I think @sdf-jkl plans to implement preimage for date_part. Perhaps you are interested in something similar for date_trunc

Also, FYI @jonahgao and @xudong963 / @zhuqi-lucas in case you are interested in this PR (the primary usecase is improving the handling of date/timestamp predicates)

datafusion/expr/src/preimage.rs

alamb · 2026-01-20T21:18:49Z

datafusion/expr/src/preimage.rs

+    None,
+    /// The expression always evaluates to the specified constant
+    /// given that `expr` is within the interval
+    Range { expr: Expr, interval: Box<Interval> },


Is there any reason to Box this? I think it might be simpler if it was Interval

Clippy suggested Boxing it because one enum variant is much bigger than the other. (threshold is 200bytes, None is 0, Range is 240bytes minimum)

warning: large size difference between variants --> datafusion/expr/src/preimage.rs:22:1 | 22 | / pub enum PreimageResult { 23 | | /// No preimage exists for the specified value 24 | | None, | | ---- the second-largest variant carries no data at all 25 | | /// The expression always evaluates to the specified constant 26 | | /// given that `expr` is within the interval 27 | | Range { expr: Expr, interval: Interval }, | | ---------------------------------------- the largest variant contains at least 240 bytes 28 | | } | |_^ the entire enum is at least 240 bytes | = help: for further information visit https://rust-lang.github.io/rust-clippy/rust-1.92.0/index.html#large_enum_variant = note: `#[warn(clippy::large_enum_variant)]` on by default help: consider boxing the large fields or introducing indirection in some other way to reduce the total size of the enum | 27 - Range { expr: Expr, interval: Interval }, 27 + Range { expr: Expr, interval: Box<Interval> }, |

datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

alamb · 2026-01-20T21:21:53Z

datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

+    let Expr::ScalarFunction(ScalarFunction { func, args }) = left_expr else {
+        return Ok((None, None));
+    };
+    if !is_literal_or_literal_cast(right_expr) {


I think this is still an open question, but it is ok to handle as a follow on PR (aka widen the expressions)

alamb · 2026-01-20T21:24:24Z

datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

+    if !is_literal_or_literal_cast(right_expr) {
+        return Ok(PreimageResult::None);
+    }
+    if func.signature().volatility != Volatility::Immutable {


Also for a follow on PR, I think it would be safe to rewrite stable functions (whose values don't change during the statement)

datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

datafusion/optimizer/src/simplify_expressions/udf_preimage.rs

alamb · 2026-01-20T21:27:17Z

datafusion/optimizer/src/simplify_expressions/udf_preimage.rs

+        Operator::LtEq => expr.lt(upper),
+        // <expr> = x ==> (<expr> >= lower) and (<expr> < upper)
+        //
+        // <expr> is not distinct from x ==> (<expr> is NULL and x is NULL) or ((<expr> >= lower) and (<expr> < upper))


Suggested change

// <expr> is not distinct from x ==> (<expr> is NULL and x is NULL) or ((<expr> >= lower) and (<expr> < upper))

// <expr> is not distinct from x ==> (<expr> is NULL) or ((<expr> >= lower) and (<expr> < upper))

I am not sure this IS NOT DISTICNT rewrite is correctas it is rewritten to just the range predicate. If expr is NULL and the literal is non-NULL, the original expression is FALSE, but the rewrite evaluates to NULL (x >= lower AND x < upper), which is not equivalent and violates the “same nullability” expectation for simplified expressions.

@alamb In a WHERE clause, both FALSE and NULL might behave similarly (both filter out the row), so here may be safety?

If we want to keep false:

Operator::IsNotDistinctFrom => { // expr IS NOT DISTINCT FROM x => must return FALSE if expr is NULL // because we know x is NOT NULL. expr.clone().is_not_null().and( and(expr.clone().gt_eq(lower), expr.lt(upper)) ) }

@xudong963 this solves the issue. Thanks!

xudong963 · 2026-01-21T02:55:07Z

I'll have a look at the PR today

xudong963 · 2026-01-21T13:36:28Z

datafusion/optimizer/src/simplify_expressions/udf_preimage.rs

+/// [preimage]: https://en.wikipedia.org/wiki/Image_(mathematics)#Inverse_image
+///
+pub(super) fn rewrite_with_preimage(
+    _info: &SimplifyContext,


Do we need this arg?

@alamb mentioned that we should keep it in #18789 (comment), but it was a while ago.

xudong963 · 2026-01-21T13:49:00Z

datafusion/optimizer/src/simplify_expressions/udf_preimage.rs

+        Operator::LtEq => expr.lt(upper),
+        // <expr> = x ==> (<expr> >= lower) and (<expr> < upper)
+        //
+        // <expr> is not distinct from x ==> (<expr> is NULL and x is NULL) or ((<expr> >= lower) and (<expr> < upper))


@alamb In a WHERE clause, both FALSE and NULL might behave similarly (both filter out the row), so here may be safety?

If we want to keep false:

Operator::IsNotDistinctFrom => { // expr IS NOT DISTINCT FROM x => must return FALSE if expr is NULL // because we know x is NOT NULL. expr.clone().is_not_null().and( and(expr.clone().gt_eq(lower), expr.lt(upper)) ) }

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Add udf_preimage logic

d94889a

github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules labels Jan 9, 2026

sdf-jkl and others added 7 commits January 9, 2026 20:37

Cargo fmt

4aa7f4e

Fix err in rewrite_with_preimage

2329c12

Rewrite the preimage_in_comparison

7ac8325

cargo fmt

7a3e8b3

Fix ci

fbd5dcc

Fix GtEq, Lt logic

d920735

Merge branch 'main' into smaller-preimage-pr-1

5ffb704

sdf-jkl mentioned this pull request Jan 10, 2026

Support "pre-image" for pruning predicate evaluation #18789

Closed

8 tasks

sdf-jkl mentioned this pull request Jan 18, 2026

Optimize the evaluation of DATE_TRUNC(<col>) == <constant>) when pushed down #18319

Open

alamb reviewed Jan 18, 2026

View reviewed changes

sdf-jkl force-pushed the smaller-preimage-pr-1 branch from f308662 to 5ffb704 Compare January 18, 2026 18:21

github-actions bot removed the documentation Improvements or additions to documentation label Jan 18, 2026

alamb changed the title ~~Support "pre-image" for pruning predicate evaluation #1~~ Support API for "pre-image" for pruning predicate evaluation Jan 19, 2026

Make test field nullable

9f845e7

alamb reviewed Jan 19, 2026

View reviewed changes

alamb reviewed Jan 20, 2026

View reviewed changes

datafusion/optimizer/src/simplify_expressions/udf_preimage.rs Show resolved Hide resolved

Add tests for additional cases

510b5bc

alamb mentioned this pull request Jan 20, 2026

Add tests for additional cases sdf-jkl/datafusion#1

Merged

alamb added 2 commits January 19, 2026 20:58

simplify

b9f5c2c

Simplfy

ec8cc7e

Merge pull request #1 from alamb/alamb/more_tests

47a18dc

Add tests for additional cases

sdf-jkl added 2 commits January 20, 2026 11:40

Add rhs Null guard

01b254b

Fix comment

d8b4f0f

sdf-jkl added 4 commits January 20, 2026 12:50

Update API

116d6e2

clippy

c0ed63c

Fix null handling unit test

5856150

Fix null handling test

c53a9fc

sdf-jkl requested a review from alamb January 20, 2026 19:01

alamb approved these changes Jan 20, 2026

View reviewed changes

xudong963 reviewed Jan 21, 2026

View reviewed changes

sdf-jkl and others added 4 commits January 21, 2026 09:27

Update datafusion/expr/src/preimage.rs

9b32843

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Fix docs

53f72ed

Fix comment

ba5be8a

Fix is_not_distinct_from rewrite

46a941f

	// <expr> is not distinct from x ==> (<expr> is NULL and x is NULL) or ((<expr> >= lower) and (<expr> < upper))
	// <expr> is not distinct from x ==> (<expr> is NULL) or ((<expr> >= lower) and (<expr> < upper))

Support API for "pre-image" for pruning predicate evaluation #19722

Are you sure you want to change the base?

Support API for "pre-image" for pruning predicate evaluation #19722

Conversation

sdf-jkl commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sdf-jkl commented Jan 19, 2026

Uh oh!

sdf-jkl commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sdf-jkl commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb commented Jan 20, 2026

Uh oh!

sdf-jkl commented Jan 20, 2026

Uh oh!

alamb commented Jan 20, 2026

Uh oh!

sdf-jkl commented Jan 20, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xudong963 commented Jan 21, 2026

Uh oh!

sdf-jkl commented Jan 9, 2026 •

edited

Loading

sdf-jkl commented Jan 19, 2026 •

edited

Loading

sdf-jkl commented Jan 19, 2026 •

edited

Loading