Skip to content

Conversation

@janniklinde
Copy link
Contributor

This patch introduces a new SourceStream which is produced by OOCReblockInstruction. This change aims to prevent redundant disk writes caused by CachingStreams evicting the direct output of an upstream reblock instruction. The SourceStream triggers custom cache handling in the CachingStream and maintains an index for future direct access of source tiles on disk.

These changes lead to significant speedups, allowing us to perform PCA on a 1Mx1000 input matrix in ~26s compared to the previous ~40s on our local machine (although some of the observed speedups came from the now parallel read in OOCReblockInstruction which was previously not possible due to OOM errors for 5GB of allowed RAM).
In addition, we are now able to run the same PCA script with 1GB of memory, which completes in ~31s.

5GB:
monitor

1GB:
monitor

Experiments on the scale-out node also show significant speedups of PCA, reaching a 3x speedup over CP.

mode conf run [s]
cp 100g 33.061
ooc 100g 10.502
cp 10g nan
ooc 10g 13.767
cp 1g nan
ooc 1g 19.031

@mboehm7
Copy link
Contributor

mboehm7 commented Jan 5, 2026

LGTM - thanks for the improvement @janniklinde. The performance numbers under different JVM sizes are already very promising. Great job.

@mboehm7 mboehm7 closed this in 0260387 Jan 5, 2026
@github-project-automation github-project-automation bot moved this from In Progress to Done in SystemDS PR Queue Jan 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants