Skip to content

HDDS-14662. Container Balancer does not validate include/exclude datanode names#9812

Open
sravani-revuri wants to merge 7 commits intoapache:masterfrom
sravani-revuri:HDDS-14662
Open

HDDS-14662. Container Balancer does not validate include/exclude datanode names#9812
sravani-revuri wants to merge 7 commits intoapache:masterfrom
sravani-revuri:HDDS-14662

Conversation

@sravani-revuri
Copy link
Contributor

What changes were proposed in this pull request?

The ozone admin containerbalancer start command accepts invalid datanode names in -- include-datanodes- and --exclude-datanodes and reports success instead of failing with a validation error. This differs from other admin commands that validate hostnames and fail on invalid input.

bash-5.1$ ozone admin containerbalancer start --exclude-datanodes "55abc" -t 0.1 -d 100 -i 3
Container Balancer started successfully.

bash-5.1$ ozone admin containerbalancer start --include-datanodes="123!"
Container Balancer started successfully. 

// other admin commands behavior

bash-5.1$ ozone admin datanode decommission ozone-datanode-1
Started decommissioning datanode(s):
ozone-datanode-1
Error: ozone-datanode-1: ozone-datanode-1: Name or service not known
Some nodes could not enter the decommission workflow

bash-5.1$ ozone admin datanode diskbalancer status ozone-datanode-3 555hgkjmdhkj
Error on node [555hgkjmdhkj]: Invalid host name: local host is: "ca8e95ea2b71/172.18.0.2"; destination host is: "555hgkjmdhkj":19864; java.net.UnknownHostException: Invalid host name: local host is: "ca8e95ea2b71/172.18.0.2"; destination host is: "555hgkjmdhkj":19864; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost; For more details see: http://wiki.apache.org/hadoop/UnknownHost
Failed to get DiskBalancer status from nodes: [555hgkjmdhkj]
Status result:
Datanode              Status     Threshold(%)  BandwidthInMB  Threads   StopAfterDiskEven  SuccessMove FailureMove BytesMoved(MB) EstBytesToMove(MB) EstTimeLeft(min)
9e2ef66785cc            STOPPED     10.0000     10       5      true         0      0      0        0         0

Proposed Fix
Add validation for -- include-datanodes and -- exclude-datanodes before starting the balancer.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14662

How was this patch tested?

manual testing done.

Incorrect value given for include node:

bash-5.1$ ozone admin containerbalancer start --include-datanodes "wrongname" -t 0.1 -d 100 -i 3
Failed to start Container Balancer.
Failure reason: Invalid configuration: The included datanode 'wrongname' does not exist or is not registered with SCM. Please check the hostname/IP.
Failed to start Container Balancer. Invalid configuration: The included datanode 'wrongname' does not exist or is not registered with SCM. Please check the hostname/IP.

bash-5.1$ ozone admin containerbalancer start --include-datanodes "883.883.883.883" -t 0.1 -d 100 -i 3
Failed to start Container Balancer.
Failure reason: Invalid configuration: The included datanode '883.883.883.883' does not exist or is not registered with SCM. Please check the hostname/IP.
Failed to start Container Balancer. Invalid configuration: The included datanode '883.883.883.883' does not exist or is not registered with SCM. Please check the hostname/IP.

Incorrect value given for exclude node:

bash-5.1$ ozone admin containerbalancer start --exclude-datanodes "wrongname" -t 0.1 -d 100 -i 3
Failed to start Container Balancer.
Failure reason: Invalid configuration: The excluded datanode 'wrongname' does not exist or is not registered with SCM. Please check the hostname/IP.
Failed to start Container Balancer. Invalid configuration: The excluded datanode 'wrongname' does not exist or is not registered with SCM. Please check the hostname/IP.

bash-5.1$ ozone admin containerbalancer start --exclude-datanodes "883.883.883.883" -t 0.1 -d 100 -i 3
Failed to start Container Balancer.
Failure reason: Invalid configuration: The excluded datanode '883.883.883.883' does not exist or is not registered with SCM. Please check the hostname/IP.
Failed to start Container Balancer. Invalid configuration: The excluded datanode '883.883.883.883' does not exist or is not registered with SCM. Please check the hostname/IP.


Correct values given for exclude node:

bash-5.1$ ozone admin containerbalancer start --exclude-datanodes "ozone-balancer-datanode2-1.ozone-balancer_default,ozone-balancer-datanode3-1.ozone-balancer_default" -t 0.1 -d 100 -i 3
Container Balancer started successfully.

bash-5.1$ ozone admin containerbalancer start --exclude-datanodes "172.19.0.2" -t 0.1 -d 100 -i 3
Container Balancer started successfully.

Correct values given for include node:

bash-5.1$ ozone admin containerbalancer start --include-datanodes "ozone-balancer-datanode1-1.ozone-balancer_default,ozone-balancer-datanode2-1.ozone-balancer_default,ozone-balancer-datanode6-1.ozone-balancer_
default" -t 0.1 -d 100 -i 3
Container Balancer started successfully.

bash-5.1$ ozone admin containerbalancer start --include-datanodes "172.19.0.2" -t 0.1 -d 100 -i 3
Container Balancer started successfully.

Copy link
Contributor

@sarvekshayr sarvekshayr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for identifying and fixing this issue @sravani-revuri. Please add a test to verify the same.

Copy link
Contributor

@Gargi-jais11 Gargi-jais11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please find the inlined comments. Rest all looks good to me.

Copy link
Contributor

@sreejasahithi sreejasahithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sravani-revuri for working on this, overall LGTM
just left few minor comments.

Copy link
Contributor

@sreejasahithi sreejasahithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sravani-revuri for updating.

Copy link
Contributor

@Gargi-jais11 Gargi-jais11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@sreejasahithi sreejasahithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sravani-revuri
LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants