Skip to content

Conversation

@j143
Copy link
Member

@j143 j143 commented Oct 26, 2025

Implemented a base OOC Eviction Manager & used it in ResettableStream instead of _cache.

@j143
Copy link
Member Author

j143 commented Oct 28, 2025

Hi @mboehm7 - could you please check if this is the implementation that's on your mind?

<the tests run fine on local, checking why it's failing on github actions>

Copy link
Contributor

@mboehm7 mboehm7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @j143. Overall, this eviction manager looks already very good. Once the minor comments are addressed we can already merge it in, and after some preliminary experiments than do the major performance improvements incrementally (e.g., not writing individual blocks but configurable partitions of blocks).

LocalFileUtils.createLocalFileIfNotExist(_spillDir);
}

public static synchronized OOCEvictionManager getInstance() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can simply make the methods static synchronized, and remove this getInstance method (right now every stream would work with the common object but use non-synchronized methods, and thus, are not thread-safe)

/**
* Store a block in the OOC cache (serialize once)
*/
public void put(String key, IndexedMatrixValue value) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use stream ID instead of the string key

Comment on lines 93 to 94
long size = estimateSerializedSize(mb);
ByteBuffer bbuff = new ByteBuffer(size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are deadline with many blocks, I would recommend to keep only shallow copies of the individual blocks without any serialization (which might increase the size but reduce the overhead)


// Spill to disk
String filename = _spillDir + "/" + key;
bbuff.evictBuffer(filename);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put a TODO in here, that we should improve the evictBuffer to avoid storing a file a second time if it already exists (I would have to double check if we correctly do that in our normal lazywritebuffer).

return mb.getExactSerializedSize();
}

private MatrixIndexes parseIndexesFromKey(String key) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's avoid this string concatenation and parsing by simply using the stream ID and blockID as is.

@j143
Copy link
Member Author

j143 commented Oct 31, 2025

Hi @mboehm7 , I have addressed most of the comments. The tests are failing in GitHub actions - I tried to test with similar memory configuration in my local and couldn't reproduce.

@mboehm7
Copy link
Contributor

mboehm7 commented Oct 31, 2025

Maybe test your eviction logic with very small buffer pool size (e.g., 4KB) in order to force eviction. The differences in results indicate that as soon as we're evicting we produce incorrect results.

@j143
Copy link
Member Author

j143 commented Oct 31, 2025

Now, I am able to reproduce this issue in a linux machine - it's retrieving incorrect blocks due to index mismatch. checking the root cause further.


_size + requiredSize: 4000009 + 4000009; _limit: 4718 bytes

I have decreased the buffer limit still the same. I checked that the eviction logic is triggered.

_size + requiredSize: 4000009 + 4000009; _limit: 4718
Evicting directory: target\testTemp\functions\ooc\lmDSTest\lmDS/localtmp/_p24484_10.120.110.127//ooc_stream/tmp0/0_0
_size + requiredSize: 4000009 + 4000009; _limit: 4718
Evicting directory: target\testTemp\functions\ooc\lmDSTest\lmDS/localtmp/_p24484_10.120.110.127//ooc_stream/tmp0/0_1
_size + requiredSize: 4000009 + 4000009; _limit: 4718
``

@mboehm7
Copy link
Contributor

mboehm7 commented Oct 31, 2025

alright, I'll have a look either today or tomorrow.

@mboehm7 mboehm7 closed this in 8c667b5 Oct 31, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in SystemDS PR Queue Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants