Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Jan 16, 2026

Summary

  • Adds native Comet support for Spark's date_format function
  • Uses DataFusion's to_char function for the underlying implementation
  • Supports a whitelist of common format strings that can be reliably mapped between Spark SimpleDateFormat patterns and strftime patterns

Supported formats include:

  • Date formats: yyyy-MM-dd, yyyy/MM/dd, yyyyMMdd, yyyyMM
  • Time formats: HH:mm:ss, HH:mm, HH, mm, ss
  • DateTime formats: yyyy-MM-dd HH:mm:ss, yyyy/MM/dd HH:mm:ss
  • Day/month names: EEEE, EEE, MMMM, MMM
  • 12-hour time: hh:mm:ss a, hh:mm a, h:mm a
  • ISO format: yyyy-MM-dd'T'HH:mm:ss
  • Single components: yyyy, yy, MM, dd

Unsupported format strings will fall back to Spark execution.

Timezone Support

Currently only UTC timezone is fully compatible. Non-UTC timezones are marked as Incompatible and fall back to Spark by default. Users can enable Comet execution for non-UTC timezones with spark.comet.expr.DateFormatClass.allowIncompatible=true, but results may differ from Spark.

See #3202 for tracking full timezone support.

Test Plan

  • Added comprehensive tests in CometTemporalExpressionSuite:
    • Tests all supported format strings with timestamp columns
    • Tests with literal timestamps (constant folding disabled)
    • Tests null handling
    • Tests fallback to Spark for unsupported formats
    • Tests non-UTC timezone fallback behavior
    • Tests allowIncompatible config for non-UTC timezones

Note: This PR was generated with AI assistance.

Closes #3088

andygrove and others added 2 commits January 16, 2026 07:59
Adds native Comet support for Spark's `date_format` function using
DataFusion's `to_char` function. Supports a whitelist of common format
strings that can be reliably mapped between Spark SimpleDateFormat
patterns and strftime patterns.

Supported formats include:
- yyyy-MM-dd, yyyy/MM/dd (date formats)
- HH:mm:ss, HH:mm (time formats)
- yyyy-MM-dd HH:mm:ss (datetime formats)
- EEEE, EEE, MMMM, MMM (day/month names)
- hh:mm:ss a, hh:mm a, h:mm a (12-hour time)
- yyyy-MM-dd'T'HH:mm:ss (ISO format)

Closes apache#3088

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds tests to verify date_format behavior with non-UTC timezones:
- Non-UTC timezones are marked as Incompatible and fall back to Spark
- Users can enable with spark.comet.expr.DateFormatClass.allowIncompatible=true

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@andygrove andygrove changed the title feat: add support for date_format expression feat: add partial support for date_format expression Jan 16, 2026
@codecov-commenter
Copy link

codecov-commenter commented Jan 16, 2026

Codecov Report

❌ Patch coverage is 86.20690% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.00%. Comparing base (f09f8af) to head (8d8e544).
⚠️ Report is 851 commits behind head on main.

Files with missing lines Patch % Lines
...c/main/scala/org/apache/comet/serde/datetime.scala 85.96% 4 Missing and 4 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3201      +/-   ##
============================================
+ Coverage     56.12%   60.00%   +3.88%     
- Complexity      976     1429     +453     
============================================
  Files           119      170      +51     
  Lines         11743    15746    +4003     
  Branches       2251     2603     +352     
============================================
+ Hits           6591     9449    +2858     
- Misses         4012     4977     +965     
- Partials       1140     1320     +180     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove requested a review from wForget January 16, 2026 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support Spark expression: date_format_class

2 participants