Skip to content

Conversation

@HubertKrawczyk
Copy link
Contributor

@HubertKrawczyk HubertKrawczyk commented Oct 31, 2025

Removed old optimization for LibSpoofPrimitives.vectOuterMultAdd as LibMatrixMult.vectMultiplyAdd with its vectorized implementation is faster.

Benchmark (CPU: Ryzen 7600, 32GB RAM):
I modified test src/test/scripts/functions/codegen/rowAggPattern26.dml to run multiple times on bigger data:

X = matrix(seq(1,3000000), 60000, 50);
P = matrix(seq(1,600000), 60000, 10)
while(FALSE){}

for (i in 1:500) {
R = t(P) %*% X;
}
write(R, $1)

, and ran public void testCodegenRowAgg26SP() from RowAggTmplTest.java

Statistics before this change:

SystemDS Statistics:
Total elapsed time:		10,088 sec.
Total compilation time:		0,692 sec.
Total execution time:		9,396 sec.
Number of compiled Spark inst:	8.
Number of executed Spark inst:	507.
Cache hits (Mem/Li/WB/FS/HDFS):	1/0/0/0/0.
Cache writes (Li/WB/FS/HDFS):	0/1/0/0.
Cache times (ACQr/m, RLS, EXP):	0,302/0,003/0,004/0,000 sec.
HOP DAGs recompiled (PRED, SB):	0/503.
HOP DAGs recompile time:	0,277 sec.
Functions recompiled:		2.
Functions recompile time:	0,002 sec.
Codegen compile (DAG,CP,JC):	516/502/1.
Codegen enum (ALLt/p,EVALt/p):	516/0/0/0.
Codegen compile times (DAG,JC):	0,333/0,153 sec.
Codegen enum plan cache hits:	0/0.
Codegen op plan cache hits:	501/502.
Spark ctx create time (lazy):	0,672 sec.
Spark trans counts (par,bc,col):1/500/1.
Spark trans times (par,bc,col):	0,000/0,348/0,302 secs.
Total JIT compile time:		11.256 sec.
Total JVM GC count:		13.
Total JVM GC time:		0.047 sec.
Heavy hitter instructions:
  #  Instruction     Time(s)  Count
  1  sp_spoofRATMP1    7,636    500
  2  sp_chkpoint       0,847      2
  3  sp_write          0,383      1
  4  sp_seq            0,184      2
  5  sp_rshape         0,035      2
  6  createvar         0,015    506
  7  mvvar             0,007    500
  8  rmvar             0,004      7
  9  assignvar         0,000      4
 10  cpvar             0,000      4

After this change:

SystemDS Statistics:
Total elapsed time:		8,618 sec.
Total compilation time:		0,685 sec.
Total execution time:		7,933 sec.
Number of compiled Spark inst:	8.
Number of executed Spark inst:	507.
Cache hits (Mem/Li/WB/FS/HDFS):	1/0/0/0/0.
Cache writes (Li/WB/FS/HDFS):	0/1/0/0.
Cache times (ACQr/m, RLS, EXP):	0,349/0,003/0,004/0,000 sec.
HOP DAGs recompiled (PRED, SB):	0/503.
HOP DAGs recompile time:	0,269 sec.
Functions recompiled:		2.
Functions recompile time:	0,002 sec.
Codegen compile (DAG,CP,JC):	516/502/1.
Codegen enum (ALLt/p,EVALt/p):	516/0/0/0.
Codegen compile times (DAG,JC):	0,332/0,150 sec.
Codegen enum plan cache hits:	0/0.
Codegen op plan cache hits:	501/502.
Spark ctx create time (lazy):	0,686 sec.
Spark trans counts (par,bc,col):1/500/1.
Spark trans times (par,bc,col):	0,000/0,394/0,349 secs.
Total JIT compile time:		11.18 sec.
Total JVM GC count:		13.
Total JVM GC time:		0.047 sec.
Heavy hitter instructions:
  #  Instruction     Time(s)  Count
  1  sp_spoofRATMP1    6,216    500
  2  sp_chkpoint       0,849      2
  3  sp_write          0,362      1
  4  sp_seq            0,172      2
  5  sp_rshape         0,031      2
  6  createvar         0,014    506
  7  mvvar             0,007    500
  8  rmvar             0,004      7
  9  assignvar         0,000      4
 10  cpvar             0,000      4

@mboehm7

@github-project-automation github-project-automation bot moved this to In Progress in SystemDS PR Queue Oct 31, 2025
@HubertKrawczyk HubertKrawczyk changed the title vectOuterMultAdd use optimized LibMatrixMult.vectMultiplyAdd only LibSpoofPrimitives.vectOuterMultAdd use optimized LibMatrixMult.vectMultiplyAdd only Oct 31, 2025
@mboehm7
Copy link
Contributor

mboehm7 commented Nov 1, 2025

Thanks for the patch @HubertKrawczyk. The fact that even the runtime of spark instructions (with additional overhead compared to local kernels) improved indicate very robust performance improvements.

@mboehm7 mboehm7 closed this in e40bbfe Nov 1, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in SystemDS PR Queue Nov 1, 2025
@codecov
Copy link

codecov bot commented Nov 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.25%. Comparing base (67ff6be) to head (76c5cf1).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2349      +/-   ##
============================================
- Coverage     72.25%   72.25%   -0.01%     
+ Complexity    46746    46740       -6     
============================================
  Files          1504     1504              
  Lines        177334   177324      -10     
  Branches      34844    34842       -2     
============================================
- Hits         128138   128127      -11     
  Misses        39522    39522              
- Partials       9674     9675       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants