-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-48866: [C++][Gandiva] Truncate subseconds beyond milliseconds in castTIMESTAMP_utf8 and castTIME_utf8
#48867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
|
@kou Could you plz take a look? |
cpp/src/gandiva/precompiled/time.cc
Outdated
| // Truncate to 3 digits (milliseconds precision) if more digits are provided | ||
| while (sub_seconds_len > 3) { | ||
| ts_fields[TimeFields::kSubSeconds] /= 10; | ||
| sub_seconds_len--; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we factor out this common code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for review, moved common code in the separate inline function
|
@xxlaykxx Could you also review this? We don't have enough maintainers for Gandiva. |
|
@akravchukdremio Could you rebase on main to fix CI failures? |
5aac113 to
3d7ef89
Compare
Sure, rebased PR |
3d7ef89 to
b818113
Compare
|
@lriggs Could you review this? |
cpp/src/gandiva/precompiled/time.cc
Outdated
|
|
||
| // Normalize sub-seconds value to milliseconds precision (3 digits). | ||
| // Truncates if more than 3 digits are provided, pads with zeros if fewer than 3 digits | ||
| FORCE_INLINE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use ARROW_FORCE_INLINE instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done
cpp/src/gandiva/precompiled/time.cc
Outdated
| // Normalize sub-seconds value to milliseconds precision (3 digits). | ||
| // Truncates if more than 3 digits are provided, pads with zeros if fewer than 3 digits | ||
| FORCE_INLINE | ||
| int32_t normalize_subseconds_to_millis(int32_t subseconds, int32_t num_digits) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we define this in anonymous namespace? (If we can't use anonymous namespace here, can we use static to hide this symbol?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, make sense. We couldn't define here anonymous namespace since this file is put under extern "C", so I've used static
aa201ec to
d233f19
Compare
|
Fixed such error: by adding |
cpp/src/gandiva/precompiled/time.cc
Outdated
|
|
||
| // Normalize sub-seconds value to milliseconds precision (3 digits). | ||
| // Truncates if more than 3 digits are provided, pads with zeros if fewer than 3 digits | ||
| ARROW_FORCE_INLINE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might use the regular 'inline'. I don't see ARROW_FORCE_INLINE really being used anywhere and I don't feel like I know enough that this function should be forced inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that 'FORCE_INLINE' is used a lot, so maybe that is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see ARROW_FORCE_INLINE really being used anywhere
That's right, ARROW_FORCE_INLINE is used only in this file: https://github.com/akravchukdremio/arrow/blob/GH-48866/cpp/src/arrow/compute/kernels/gather_internal.h#L55. Initially I've used FORCE_INLINE, which is used in many places as far as I can see (around 10 files in gandiva), but then I've changed it to ARROW_FORCE_INLINE to address this comment from committer: #48867 (comment).
I don't feel like I know enough that this function should be forced inline
This is just a helper function, initially I've done my changes directly in castTIMESTAMP_utf8 and castTIME_utf8 functions: b377b59, but then there was a comment to factor out a common code: #48867 (comment) - so I decided to create small function for that.
Why I've used FORCE_INLINE initially? Because there was already a function in this file with such macros, see here: https://github.com/akravchukdremio/arrow/blob/GH-48866/cpp/src/gandiva/precompiled/time.cc#L564-L568, and I've decided to use it too to make our code execution faster (gandiva is aiming for it, AFAIK), since invoking function is an action to put new variables in the call stack. But I agree, that probably just inline keyword would be enough, function is not too complex, so compiler seems anyway will inline it by just having inline keyword.
@kou what do you think, is it fine to just remove ARROW_FORCE_INLINE and use only inline keyword instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that 'FORCE_INLINE' is used a lot, so maybe that is ok.
It was initially there, then I've changed it to ARROW_FORCE_INLINE to address another review comment: #48867 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't need force inline, inline is OK.
For FORCE_INLINE and ARROW_FORCE_INLINE: I want to remove FORCE_INLINE and GDV_FORCE_INLINE and use ARROW_FORCE_INLINE instead of them. We don't need multiple similar definitions in the same project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kou Got it, thanks for explanation. I've removed ARROW_FORCE_INLINE since we don't need force inline
| EXPECT_EQ(context.get_error(), | ||
| "Invalid millis for timestamp value 2000-01-01 00:00:00.1000"); | ||
| context.Reset(); | ||
| // Test truncation of subseconds to 3 digits (milliseconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some test cases with less than three digits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have such test cases, see here: https://github.com/akravchukdremio/arrow/blob/GH-48866/cpp/src/gandiva/precompiled/time_test.cc#L100-L104 and https://github.com/akravchukdremio/arrow/blob/GH-48866/cpp/src/gandiva/precompiled/time_test.cc#L154-L158. Seems like enough, but if you think we need more - I can add them
|
Looks good. |
…s in `castTIMESTAMP_utf8` and `castTIME_utf8`
…ds_to_millis` by static keywoard
d233f19 to
0c431a3
Compare
0c431a3 to
58c988d
Compare
Rationale for this change
Fixes #48866. The Gandiva precompiled time functions
castTIMESTAMP_utf8andcastTIME_utf8currently reject timestamp and time string literals with more than 3 subsecond digits (beyond millisecond precision), throwing an "Invalid millis" error. This behavior is inconsistent with other implementations.What changes are included in this PR?
castTIMESTAMP_utf8andcastTIME_utf8functions to truncate subseconds beyond 3 digits instead of throwing an errorAre these changes tested?
Yes
Are there any user-facing changes?
No
castTIMESTAMP_utf8andcastTIME_utf8should truncate subseconds beyond milliseconds, not reject them #48866