Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@

5. The data.table test suite is a bit more robust to lacking UTF-8 support via a new `requires_utf8` argument to `test()` to skip tests when UTF-8 support is not available, [#7336](https://github.com/Rdatatable/data.table/issues/7336). Thanks @MichaelChirico for the suggestion and @ben-schwen for the implementation.

6. Grouping operations with constant `list()` expressions in `j` are now optimized to avoid per-group allocation overhead, [#712](https://github.com/Rdatatable/data.table/issues/712). Thanks @macrakis for the report and @ben-schwen for the fix.

## data.table [v1.18.0](https://github.com/Rdatatable/data.table/milestone/37?closed=1) 23 December 2025

### BREAKING CHANGE
Expand Down
16 changes: 16 additions & 0 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,16 @@ replace_dot_alias = function(e) {
list(jsub=jsub, jvnames=jvnames, funi=funi+1L)
}

# Optimize constant list() expressions to avoid per-group allocation overhead
# e.g., list(1) -> 1, where the value is a simple atomic constant, #712
# return NULL for no optimization possible
.optimize_constant_list = function(jsub) {
if (!jsub %iscall% "list") return(NULL)
if (length(jsub) != 2L) return(NULL)
if (!is_constantish(jsub[[2L]])) return(NULL)
jsub[[2L]]
}

# Optimize .SD subsetting patterns like .SD[1], head(.SD), first(.SD)
# return NULL for no optimization possible
.optimize_sd_subset = function(jsub, sdvars, SDenv, envir) {
Expand Down Expand Up @@ -505,6 +515,12 @@ replace_dot_alias = function(e) {
return(list(GForce=FALSE, jsub=jsub, jvnames=jvnames))
}

# Step 0: Unwrap constant list() to avoid per-group allocation, #712
if (!is.null(unwrapped_consts <- .optimize_constant_list(jsub))) {
if (verbose) catf("Optimized j from '%s' to bare constant '%s'\n", deparse(jsub), deparse(unwrapped_consts, width.cutoff=200L, nlines=1L))
jsub = unwrapped_consts
}

# Step 1: Apply lapply(.SD) optimization
lapply_result = .optimize_lapply(jsub, jvnames, sdvars, SDenv, verbose, envir)
jsub = lapply_result$jsub
Expand Down
7 changes: 7 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -21484,3 +21484,10 @@ dt = data.table(a=1:4, b=1:2)
test(2362.51, optimize=0:2, dt[, c(list()), b, verbose=TRUE], data.table(b=integer(0L)), output="GForce FALSE")
test(2362.52, optimize=0:2, dt[, c(lapply(.SD, sum), list()), b, verbose=TRUE], output=out)
test(2362.53, optimize=0:2, dt[, list(lapply(.SD, sum), list()), b, verbose=TRUE], output="GForce FALSE")

# dt[, j=list(var), by] is slower than dt[, j=var, by], #712
dt = data.table(x=rep(1:3, 2L), y=1L)
test(2363.1, dt[, .(1), by=x, verbose=TRUE], dt[, 1, by=x], output="Optimized j from.*to bare constant")
dt = data.table(x=1:5, key="x")
test(2363.2, dt[dt, list(1), by=.EACHI, verbose=TRUE], dt[dt, 1, by=.EACHI], output="Optimized j from.*to bare constant")
test(2363.3, dt[dt, list(x), by=.EACHI, verbose=TRUE], dt[dt, x, by=.EACHI], output="Optimized j from.*to bare constant")
Loading