Skip to content

HDDS 14618. Support including only specified containers in Container Balancer.#9839

Open
sravani-revuri wants to merge 2 commits intoapache:masterfrom
sravani-revuri:HDDS-14618
Open

HDDS 14618. Support including only specified containers in Container Balancer.#9839
sravani-revuri wants to merge 2 commits intoapache:masterfrom
sravani-revuri:HDDS-14618

Conversation

@sravani-revuri
Copy link
Contributor

What changes were proposed in this pull request?

We already have configurations for including and excluding certain datanodes and excluding certain containers. These are all helpful to focus the balancing on the specified datanodes or to exclude problematic containers.

It will also help to have to a configuration for only including certain containers. That means no containers other than the specified ones should be balanced. This can be used to target a certain disk on a specific Datanode.

Though we have a disk balancer in ozone now, this configuration of container balancer can be a secondary option or a workaround to disk balancer.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14618?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel

How was this patch tested?

Unit test and manual test.

Only include container command:

CLI Output:

bash-5.1$ ozone admin containerbalancer start --include-containers "1" -t 0.1 -d 100 -i 3 && ozone admin containerbalancer status --verbose
Container Balancer started successfully.
ContainerBalancer is Running.
Started at: 2026-02-26 06:14:15
Balancing duration: 1s

Container Balancer Configuration values:
Key                                                Value
Threshold                                          0.1
Max Datanodes to Involve per Iteration(percent)    100
Max Size to Move per Iteration                     0GB
Max Size Entering Target per Iteration             26GB
Max Size Leaving Source per Iteration              26GB
Number of Iterations                               3
Time Limit for Single Container's Movement         65min
Time Limit for Single Container's Replication      50min
Interval between each Iteration                    0min
Whether to Enable Network Topology                 false
Whether to Trigger Refresh Datanode Usage Info     false
Container IDs to Include to Balancing              1
Container IDs to Exclude from Balancing            None
Datanodes Specified to be Balanced                 None
Datanodes Excluded from Balancing                  None

Current iteration info:
Key                                                Value
Iteration number                                   1
Iteration duration                                 1s
Iteration result                                   -
Size scheduled to move                             100 MB
Moved data size                                    0 B
Scheduled to move containers                       1
Already moved containers                           0
Failed to move containers                          0
Failed to move containers by timeout               0
Entered data to nodes                              
96a5d009-1192-4ba0-baaf-fbf0a2da270f <- 100 MB
Exited data from nodes                             
c384030d-fad6-49cf-adb3-6a921685b1b9 -> 100 MB

Logs:

2026-02-26 06:15:12,457 [scm1-ContainerBalancerTask-2] INFO balancer.ContainerBalancerTask: ContainerBalancer is trying to move container #1 with size 104857600B from source datanode f23f9870-b23f-46d2-90a2-5a476e6e1745(ozone-balancer-datanode3-1.ozone-balancer_default/172.19.0.7) to target datanode b17dc72e-7008-47db-8b40-c55cd4458692(ozone-balancer-datanode1-1.ozone-balancer_default/172.19.0.15)

Include all containers and exclude all contianers:

CLI Output:

bash-5.1$ ozone admin containerbalancer start --include-containers "1,2,3" --exclude-containers "1,2,3" -t 0.1 -d 100 -i 3 && ozone admin containerbalancer status --verbose
Container Balancer started successfully.
ContainerBalancer is Not Running.

Overlap in include and exclude:

CLI Output:

bash-5.1$ ozone admin containerbalancer start --include-containers "1,2,3" --exclude-containers "1,2" -t 0.1 -d 100 -i 3 && ozone admin containerbalancer status --verbose
Container Balancer started successfully.
ContainerBalancer is Running.
Started at: 2026-02-26 06:36:22
Balancing duration: 1s

Container Balancer Configuration values:
Key                                                Value
Threshold                                          0.1
Max Datanodes to Involve per Iteration(percent)    100
Max Size to Move per Iteration                     0GB
Max Size Entering Target per Iteration             26GB
Max Size Leaving Source per Iteration              26GB
Number of Iterations                               3
Time Limit for Single Container's Movement         65min
Time Limit for Single Container's Replication      50min
Interval between each Iteration                    0min
Whether to Enable Network Topology                 false
Whether to Trigger Refresh Datanode Usage Info     false
Container IDs to Include to Balancing              1,2,3
Container IDs to Exclude from Balancing            1,2
Datanodes Specified to be Balanced                 None
Datanodes Excluded from Balancing                  None

Current iteration info:
Key                                                Value
Iteration number                                   1
Iteration duration                                 1s
Iteration result                                   -
Size scheduled to move                             100 MB
Moved data size                                    0 B
Scheduled to move containers                       1
Already moved containers                           0
Failed to move containers                          0
Failed to move containers by timeout               0
Entered data to nodes                              
9ded157d-25af-4b2f-865d-2e8faf8eb88e <- 100 MB
Exited data from nodes                             
c1b96b62-935c-4170-bef5-5a2e21496484 -> 100 MB

Logs:

2026-02-26 06:36:22,594 [scm1-ContainerBalancerTask-2] INFO balancer.ContainerBalancerTask: ContainerBalancer is trying to move container #3 with size 104857600B from source datanode c1b96b62-935c-4170-bef5-5a2e21496484(ozone-balancer-datanode5-1.ozone-balancer_default/172.19.0.3) to target datanode 9ded157d-25af-4b2f-865d-2e8faf8eb88e(ozone-balancer-datanode6-1.ozone-balancer_default/172.19.0.12)

No overlap in include and exclude:

CLI Output:

bash-5.1$ ozone admin containerbalancer start --include-containers "1,2" --exclude-containers "4,5" -t 0.1 -d 100 -i 3 && ozone admin containerbalancer status --verbose
Container Balancer started successfully.
ContainerBalancer is Running.
Started at: 2026-02-26 13:42:17
Balancing duration: 1s

Container Balancer Configuration values:
Key                                                Value
Threshold                                          0.1
Max Datanodes to Involve per Iteration(percent)    100
Max Size to Move per Iteration                     0GB
Max Size Entering Target per Iteration             26GB
Max Size Leaving Source per Iteration              26GB
Number of Iterations                               3
Time Limit for Single Container's Movement         65min
Time Limit for Single Container's Replication      50min
Interval between each Iteration                    0min
Whether to Enable Network Topology                 false
Whether to Trigger Refresh Datanode Usage Info     false
Container IDs to Include to Balancing              1,2
Container IDs to Exclude from Balancing            4,5
Datanodes Specified to be Balanced                 None
Datanodes Excluded from Balancing                  None

Current iteration info:
Key                                                Value
Iteration number                                   1
Iteration duration                                 1s
Iteration result                                   -
Size scheduled to move                             200 MB
Moved data size                                    0 B
Scheduled to move containers                       2
Already moved containers                           0
Failed to move containers                          0
Failed to move containers by timeout               0
Entered data to nodes                              
25d253f8-f876-42af-8e79-04023836381f <- 100 MB
8998b9a7-3626-47e4-829c-b475979130ac <- 100 MB
Exited data from nodes                             
d3345c9e-05b3-4b11-b13e-8a29d082e775 -> 100 MB
201a47c3-a26b-4648-bb17-da1b1d8214ee -> 100 MB

Logs:

2026-02-26 13:42:17,861 [scm1-ContainerBalancerTask-1] INFO balancer.ContainerBalancerTask: ContainerBalancer is trying to move container #2 with size 104857600B from source datanode d3345c9e-05b3-4b11-b13e-8a29d082e775(ozone-balancer-datanode3-1.ozone-balancer_default/172.19.0.11) to target datanode 25d253f8-f876-42af-8e79-04023836381f(ozone-balancer-datanode1-1.ozone-balancer_default/172.19.0.10)
2026-02-26 13:42:17,863 [scm1-ContainerBalancerTask-1] INFO balancer.ContainerBalancerTask: ContainerBalancer is trying to move container #1 with size 104857600B from source datanode 201a47c3-a26b-4648-bb17-da1b1d8214ee(ozone-balancer-datanode5-1.ozone-balancer_default/172.19.0.13) to target datanode 8998b9a7-3626-47e4-829c-b475979130ac(ozone-balancer-datanode4-1.ozone-balancer_default/172.19.0.12)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant