Skip to content

Conversation

@akravchukdremio
Copy link

@akravchukdremio akravchukdremio commented Jan 15, 2026

Rationale for this change

Fixes #48866. The Gandiva precompiled time functions castTIMESTAMP_utf8 and castTIME_utf8 currently reject timestamp and time string literals with more than 3 subsecond digits (beyond millisecond precision), throwing an "Invalid millis" error. This behavior is inconsistent with other implementations.

What changes are included in this PR?

  • Fixed castTIMESTAMP_utf8 and castTIME_utf8 functions to truncate subseconds beyond 3 digits instead of throwing an error
  • Updated tests. Replaced error-expecting tests with truncation verification tests and added edge cases

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added the awaiting review Awaiting review label Jan 15, 2026
@github-actions
Copy link

⚠️ GitHub issue #48866 has been automatically assigned in GitHub to PR creator.

@xxlaykxx
Copy link
Contributor

@kou Could you plz take a look?

Comment on lines 750 to 754
// Truncate to 3 digits (milliseconds precision) if more digits are provided
while (sub_seconds_len > 3) {
ts_fields[TimeFields::kSubSeconds] /= 10;
sub_seconds_len--;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we factor out this common code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review, moved common code in the separate inline function

@kou
Copy link
Member

kou commented Jan 22, 2026

@xxlaykxx Could you also review this? We don't have enough maintainers for Gandiva.

@kou
Copy link
Member

kou commented Jan 22, 2026

@akravchukdremio Could you rebase on main to fix CI failures?

@akravchukdremio
Copy link
Author

@akravchukdremio Could you rebase on main to fix CI failures?

Sure, rebased PR

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 23, 2026
@kou
Copy link
Member

kou commented Jan 25, 2026

@lriggs Could you review this?


// Normalize sub-seconds value to milliseconds precision (3 digits).
// Truncates if more than 3 digits are provided, pads with zeros if fewer than 3 digits
FORCE_INLINE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use ARROW_FORCE_INLINE instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done

// Normalize sub-seconds value to milliseconds precision (3 digits).
// Truncates if more than 3 digits are provided, pads with zeros if fewer than 3 digits
FORCE_INLINE
int32_t normalize_subseconds_to_millis(int32_t subseconds, int32_t num_digits) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define this in anonymous namespace? (If we can't use anonymous namespace here, can we use static to hide this symbol?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, make sense. We couldn't define here anonymous namespace since this file is put under extern "C", so I've used static

@akravchukdremio akravchukdremio force-pushed the GH-48866 branch 3 times, most recently from aa201ec to d233f19 Compare January 26, 2026 14:52
@akravchukdremio
Copy link
Author

Fixed such error:

/arrow/cpp/src/gandiva/precompiled/time.cc:573:16: error: 'always_inline' function might not be inlinable unless also declared 'inline' [-Werror=attributes]
  573 | static int32_t normalize_subseconds_to_millis(int32_t subseconds, int32_t num_digits) {
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors

by adding inline keyword for normalize_subseconds_to_millis function in this commit: d233f19


// Normalize sub-seconds value to milliseconds precision (3 digits).
// Truncates if more than 3 digits are provided, pads with zeros if fewer than 3 digits
ARROW_FORCE_INLINE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might use the regular 'inline'. I don't see ARROW_FORCE_INLINE really being used anywhere and I don't feel like I know enough that this function should be forced inline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that 'FORCE_INLINE' is used a lot, so maybe that is ok.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see ARROW_FORCE_INLINE really being used anywhere

That's right, ARROW_FORCE_INLINE is used only in this file: https://github.com/akravchukdremio/arrow/blob/GH-48866/cpp/src/arrow/compute/kernels/gather_internal.h#L55. Initially I've used FORCE_INLINE, which is used in many places as far as I can see (around 10 files in gandiva), but then I've changed it to ARROW_FORCE_INLINE to address this comment from committer: #48867 (comment).

I don't feel like I know enough that this function should be forced inline

This is just a helper function, initially I've done my changes directly in castTIMESTAMP_utf8 and castTIME_utf8 functions: b377b59, but then there was a comment to factor out a common code: #48867 (comment) - so I decided to create small function for that.

Why I've used FORCE_INLINE initially? Because there was already a function in this file with such macros, see here: https://github.com/akravchukdremio/arrow/blob/GH-48866/cpp/src/gandiva/precompiled/time.cc#L564-L568, and I've decided to use it too to make our code execution faster (gandiva is aiming for it, AFAIK), since invoking function is an action to put new variables in the call stack. But I agree, that probably just inline keyword would be enough, function is not too complex, so compiler seems anyway will inline it by just having inline keyword.

@kou what do you think, is it fine to just remove ARROW_FORCE_INLINE and use only inline keyword instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that 'FORCE_INLINE' is used a lot, so maybe that is ok.

It was initially there, then I've changed it to ARROW_FORCE_INLINE to address another review comment: #48867 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't need force inline, inline is OK.

For FORCE_INLINE and ARROW_FORCE_INLINE: I want to remove FORCE_INLINE and GDV_FORCE_INLINE and use ARROW_FORCE_INLINE instead of them. We don't need multiple similar definitions in the same project.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou Got it, thanks for explanation. I've removed ARROW_FORCE_INLINE since we don't need force inline

EXPECT_EQ(context.get_error(),
"Invalid millis for timestamp value 2000-01-01 00:00:00.1000");
context.Reset();
// Test truncation of subseconds to 3 digits (milliseconds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some test cases with less than three digits?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lriggs
Copy link
Contributor

lriggs commented Jan 27, 2026

Looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++][Gandiva] castTIMESTAMP_utf8 and castTIME_utf8 should truncate subseconds beyond milliseconds, not reject them

4 participants