HBASE-29800 WAL logs are unprotected during first full backup#7717
HBASE-29800 WAL logs are unprotected during first full backup#7717DieterDP-ng wants to merge 1 commit intoapache:masterfrom
Conversation
The BackupLogCleaner prevents WAL files that are needed for future backups from being deleted. In the case where a backup root has a single running backup, there was a small timeframe where relevant files were unprotected because only completed backups were taken into consideration. This commit fixes this. The old mechanism relied on the "backup start code", which is a timestamp that denotes (per backup root) the lowest (earliest) log-roll timestamp that occurred for the backup. Because this concept had no added value, but is complex to reason about, it is removed. Usages are replaced with equal behavior based on timestamps stored in the backup info. (The backup start codes were calculated in the same way, just stored separately.) Note that the backup start code calculation suffers from HBASE-29628 (log-roll timestamps of decommissioned region servers are not cleaned up, causing the start code to be lower than it should be). That problem is still present in this commit.
2dc0d50 to
5863d06
Compare
| for (BackupInfo backup : backups) { | ||
| BackupInfo existingEntry = newestBackupPerRootDir.get(backup.getBackupRootDir()); | ||
| if (existingEntry == null || existingEntry.getStartTs() < backup.getStartTs()) { | ||
| if (existingEntry == null || existingEntry.getState() == BackupState.RUNNING) { |
There was a problem hiding this comment.
Are backups returned sorted latest --> oldest? If so this change makes sense, otherwise we may not be grabbing the latest completed backup with the first entry
There was a problem hiding this comment.
They are, because that's part of the contract of BackupSystemTable#getBackupHistory.
|
|
||
| Long storedTs = boundaries.get(regionServerAddress); | ||
| if (storedTs == null || logRollTs < storedTs) { | ||
| boundaries.put(regionServerAddress, logRollTs); |
There was a problem hiding this comment.
Side note. I've noticed that the current implementation of BackupBoundaries can lead to problems due to boundaries being a global map.
I lean towards having a separate PR to address those issues, but if you'd prefer we could group together both changes in the same patch
There was a problem hiding this comment.
Better to do separate PRs I think. I'll try to make time to review your PR asap.
|
This looks good to me; appreciate the code cleanups here as well |
If you're satisfied, can I get an official review checkmark? Let's put those committer privileges to use ;) |
The BackupLogCleaner prevents WAL files that are needed for future backups from being deleted. In the case where a backup root has a single running backup, there was a small timeframe where relevant files were unprotected because only completed backups were taken into consideration. This commit fixes this.
The old mechanism relied on the "backup start code", which is a timestamp that denotes (per backup root) the lowest (earliest) log-roll timestamp that occurred for the backup. Because this concept had no added value, but is complex to reason about, it is removed. Usages are replaced with equal behavior based on timestamps stored in the backup info. (The backup start codes were calculated in the same way, just stored separately.)
Note that the backup start code calculation suffers from HBASE-29628 (log-roll timestamps of decommissioned region servers are not cleaned up, causing the start code to be lower than it should be). That problem is still present in this commit.