Skip to content

Conversation

@ankitsol
Copy link

@ankitsol ankitsol commented Dec 7, 2025

No description provided.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ HBASE-28957 Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for branch
+1 💚 mvninstall 3m 30s HBASE-28957 passed
+1 💚 compile 4m 3s HBASE-28957 passed
-0 ⚠️ checkstyle 0m 11s /buildtool-branch-checkstyle-hbase-backup.txt The patch fails to run checkstyle in hbase-backup
+1 💚 spotbugs 2m 10s HBASE-28957 passed
+1 💚 spotless 0m 51s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for patch
+1 💚 mvninstall 3m 10s the patch passed
+1 💚 compile 4m 2s the patch passed
+1 💚 javac 4m 2s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 9s /buildtool-patch-checkstyle-hbase-backup.txt The patch fails to run checkstyle in hbase-backup
+1 💚 spotbugs 2m 19s the patch passed
+1 💚 hadoopcheck 12m 30s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 45s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 18s The patch does not generate ASF License warnings.
44m 13s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7528/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7528
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 27679a0a2f2e 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-28957 / d2e1a17
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 85 (vs. ulimit of 30000)
modules C: hbase-server hbase-backup U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7528/1/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 34s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ HBASE-28957 Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for branch
+1 💚 mvninstall 3m 32s HBASE-28957 passed
+1 💚 compile 1m 20s HBASE-28957 passed
+1 💚 javadoc 0m 44s HBASE-28957 passed
+1 💚 shadedjars 6m 19s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 3m 7s the patch passed
+1 💚 compile 1m 19s the patch passed
+1 💚 javac 1m 19s the patch passed
+1 💚 javadoc 0m 41s the patch passed
+1 💚 shadedjars 6m 11s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 233m 6s /patch-unit-hbase-server.txt hbase-server in the patch failed.
+1 💚 unit 18m 51s hbase-backup in the patch passed.
281m 31s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7528/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7528
Optional Tests javac javadoc unit compile shadedjars
uname Linux ad59f18e01da 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-28957 / d2e1a17
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7528/1/testReport/
Max. process+thread count 4281 (vs. ulimit of 30000)
modules C: hbase-server hbase-backup U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7528/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ankitsol You might want to raise PR against the HBASE-28957_rebased branch, but otherwise it looks good to me.

replicateContext.entries.stream().map(WAL.Entry::getEdit).flatMap(e -> e.getCells().stream())
.forEach(this::checkCell);
return ReplicationResult.COMMITTED;
getReplicationSource().cleanupHFileRefsAndPersistOffsets(replicateContext.getEntries());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to call this here?

// replicate the batches to sink side.
parallelReplicate(replicateContext, batches);
return ReplicationResult.COMMITTED;
getReplicationSource().cleanupHFileRefsAndPersistOffsets(replicateContext.getEntries());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I think I get why you call this method here, since here we can make sure that the wal entries have been persistent, it is OK for us to persist the offset. But for me, I prefer we follow the old way where call this in ReplicationSourceShipper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for me, I prefer we follow the old way where call this in ReplicationSourceShipper.

I think it's doable. @ankitsol ?

/**
* Replicate the given set of entries (in the context) to the other cluster. Can block until all
* the given entries are replicated. Upon this method is returned, all entries that were passed in
* the context are assumed to be persisted in the target cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that, we should add a method may be called beforePersistingReplicationOffset, and call it before we call updateLogPosition in ReplicationSourceShipper method. For old implementation, we just do nothing as we can make sure that every thing is persistent, and for S3 based endpoint, we close the writer to persist the data on S3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot close the writer every time when something was shipped, because closing and re-opening the same stream is a costly operation if even supported. We have to wait for enough data to be shipped (file size limit) or the configured time spent (time limit) before closing the current stream and opening a new one. This is controlled by the replication endpoint itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you also need to change the logic in ReplicationSourceShipper, to not always record the offset after shipping. And I do not think this can 'ONLY' be controlled by replication endpoint, in ReplicationSourceShipper you know the size of the WALEntries, and you also know how much time has elapsed after the last recording, so it is easy to implement the logic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting idea. @vinayakphegde @ankitsol wdyt?

@Apache9 Let's say the ReplicationSourceShipper controls when to record the offset. How would it know which kind of replication endpoint is it working with? Need to record the offset after every shipment or use time/size limit? Shall we make it a new attribute of the endpoints?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer we control it by time/size limit.

Even if the endpoint can persist the data after every shipment, we do not need to record the offset every time right? We just need to make sure that once the ReplicationSourceShipper want to record the offset, all the data before this offset has been persistent. So we can introduce a beforePersistingReplicationOffset method for replication endpoint, if you persist the data after every shipment, you just need to do nothing. If it is S3 based endpoint, we close the output file to persist the data.

In this way, the ReplicationSourceShipper does not need to know whether the endpoint can persist the data or not after every shipment. And in the future, for HBaseInterClusterReplicationEndpoint, we could also introduce some asynchronous mechanism to increase performance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Apache9 This seems to be a good approach to me. We also want to move ahead with time/size, both the options.

For implementing time based approach, I need one clarity. Should this counter run in a separate thread since this would be independent of shipEdits() logic?

Another question I have is I need to save this context of time and size somewhere, do you recommend it to be inReplicationSourceShipper or ReplicationSource?

cc: @anmolnar @vinayakphegde

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the endpoint can persist the data after every shipment, we do not need to record the offset every time right?

Are we talking about modifying the behaviour of existing replication endpoints?
Currently both data and offsets are persisted at every shipment. Would you like to change this to be controlled by time and size limits generally?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Apache9 This seems like a good approach. We also want it to be both time and size based.

I have two questions regarding time based approach

  1. Should this time based count run on a separate thread. Currently in ContinuousBackupReplicationEndpoint we implemented it as a separate thread
  2. Where should be save time/size based context? ReplicationSourceShipper or ReplicationSource, considering 'ReplicationSourceShipper' is itself a thread

CC @anmolnar @vinayakphegde

}
}

private ReplicationResult getReplicationResult() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said above, we should keep these methods here, and before calling these methods, we call the method in ReplicationEndpoint out to flush data out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants