Skip to content

HDDS-14897. Add multiple S3 gateways to the rolling-upgrade suite#10028

Merged
errose28 merged 5 commits intoapache:HDDS-14496-zdufrom
dombizita:HDDS-14897
Apr 21, 2026
Merged

HDDS-14897. Add multiple S3 gateways to the rolling-upgrade suite#10028
errose28 merged 5 commits intoapache:HDDS-14496-zdufrom
dombizita:HDDS-14897

Conversation

@dombizita
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Use HA Proxy to load balance multiple S3 gateways. I did the necessary changes in docker-compose.yaml and adjusted the shell scripts for that. I didn't use the existing s3-haproxy.yaml, as the one in common was not working out of the box with the Ozone HA setup (also found HDDS-14956). As this suite always need to have multiple S3 gateways I think it's okay to have it in the docker-compose.yaml.

One outstanding change is in the hadoop-ozone/dist/src/main/compose/testlib.sh. Without that change I faced this error:

OCI runtime exec failed: exec failed: unable to start container process: exec: "bash": executable file not found in $PATH

Cursor help: "This is from reorder_om_nodes in testlib.sh. It iterates over ALL containers and runs docker exec ... bash -c "...". The HAProxy container (ha-s3g-1) uses haproxy:lts-alpine — Alpine Linux — which only has sh, not bash."

This is new, as Ozone HA suite never used S3 HAProxy setup before and if it's not Ozone HA we are not calling reorder_om_nodes. This fix will simply skip it and as the ha proxy container doesn't need ozone-site.xml, it's safe to do this. The downside is it would also silently swallow genuine bash failures. Another solution is to use sh instead of bash

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14897

How was this patch tested?

CI with the rolling upgrade test suite: https://github.com/dombizita/ozone/actions/runs/23846523428
With commenting out (current state on HDDS-14496-zdu): https://github.com/dombizita/ozone/actions/runs/23846562903

--- RESTARTING s3g1 WITH IMAGE 2.2.0 ---
Using Docker Compose v2
==============================================================================
2.2.0-2.2.0-2-s3g1-generate-generate-s3g1 :: Generate data                    
==============================================================================
Create a volume and bucket                                            | PASS |
------------------------------------------------------------------------------
Create key                                                            | PASS |
------------------------------------------------------------------------------
Create a bucket in s3v volume                                         | PASS |
------------------------------------------------------------------------------
Create key in the bucket in s3v volume                                | PASS |
------------------------------------------------------------------------------
Try to create a bucket using S3 API                                   | PASS |
------------------------------------------------------------------------------
Create key using S3 API                                               | PASS |
------------------------------------------------------------------------------
2.2.0-2.2.0-2-s3g1-generate-generate-s3g1 :: Generate data            | PASS |
6 tests, 6 passed, 0 failed
==============================================================================
Output:  /tmp/smoketest/upgrade/result/robot-2.2.0-2.2.0-2-s3g1-001.xml
Using Docker Compose v2
==============================================================================
2.2.0-2.2.0-2-s3g1-validate-generate-s3g1 :: Smoketest ozone cluster startup  
==============================================================================
Read data from previously created key                                 | PASS |
------------------------------------------------------------------------------
Read key created with Ozone Shell using S3 API                        | PASS |
------------------------------------------------------------------------------
Read key created with S3 API using S3 API                             | PASS |
------------------------------------------------------------------------------
2.2.0-2.2.0-2-s3g1-validate-generate-s3g1 :: Smoketest ozone clust... | PASS |
3 tests, 3 passed, 0 failed
==============================================================================
Output:  /tmp/smoketest/upgrade/result/robot-2.2.0-2.2.0-2-s3g1-002.xml
--- RESTARTING s3g2 WITH IMAGE 2.2.0 ---
Using Docker Compose v2
==============================================================================
2.2.0-2.2.0-2-s3g2-generate-generate-s3g2 :: Generate data                    
==============================================================================
Create a volume and bucket                                            | PASS |
------------------------------------------------------------------------------
Create key                                                            | PASS |
------------------------------------------------------------------------------
Create a bucket in s3v volume                                         | PASS |
------------------------------------------------------------------------------
Create key in the bucket in s3v volume                                | PASS |
------------------------------------------------------------------------------
Try to create a bucket using S3 API                                   | PASS |
------------------------------------------------------------------------------
Create key using S3 API                                               | PASS |
------------------------------------------------------------------------------
2.2.0-2.2.0-2-s3g2-generate-generate-s3g2 :: Generate data            | PASS |
6 tests, 6 passed, 0 failed
==============================================================================
Output:  /tmp/smoketest/upgrade/result/robot-2.2.0-2.2.0-2-s3g2-001.xml
Using Docker Compose v2
==============================================================================
2.2.0-2.2.0-2-s3g2-validate-generate-s3g2 :: Smoketest ozone cluster startup  
==============================================================================
Read data from previously created key                                 | PASS |
------------------------------------------------------------------------------
Read key created with Ozone Shell using S3 API                        | PASS |
------------------------------------------------------------------------------
Read key created with S3 API using S3 API                             | PASS |
------------------------------------------------------------------------------
2.2.0-2.2.0-2-s3g2-validate-generate-s3g2 :: Smoketest ozone clust... | PASS |
3 tests, 3 passed, 0 failed
==============================================================================
Output:  /tmp/smoketest/upgrade/result/robot-2.2.0-2.2.0-2-s3g2-002.xml
--- RESTARTING s3g3 WITH IMAGE 2.2.0 ---
Using Docker Compose v2
==============================================================================
2.2.0-2.2.0-2-s3g3-generate-generate-s3g3 :: Generate data                    
==============================================================================
Create a volume and bucket                                            | PASS |
------------------------------------------------------------------------------
Create key                                                            | PASS |
------------------------------------------------------------------------------
Create a bucket in s3v volume                                         | PASS |
------------------------------------------------------------------------------
Create key in the bucket in s3v volume                                | PASS |
------------------------------------------------------------------------------
Try to create a bucket using S3 API                                   | PASS |
------------------------------------------------------------------------------
Create key using S3 API                                               | PASS |
------------------------------------------------------------------------------
2.2.0-2.2.0-2-s3g3-generate-generate-s3g3 :: Generate data            | PASS |
6 tests, 6 passed, 0 failed
==============================================================================
Output:  /tmp/smoketest/upgrade/result/robot-2.2.0-2.2.0-2-s3g3-001.xml
Using Docker Compose v2
==============================================================================
2.2.0-2.2.0-2-s3g3-validate-generate-s3g3 :: Smoketest ozone cluster startup  
==============================================================================
Read data from previously created key                                 | PASS |
------------------------------------------------------------------------------
Read key created with Ozone Shell using S3 API                        | PASS |
------------------------------------------------------------------------------
Read key created with S3 API using S3 API                             | PASS |
------------------------------------------------------------------------------
2.2.0-2.2.0-2-s3g3-validate-generate-s3g3 :: Smoketest ozone clust... | PASS |
3 tests, 3 passed, 0 failed

@dombizita dombizita requested review from adoroszlai and errose28 April 2, 2026 05:48
@github-actions github-actions Bot added the zdu Pull requests for Zero Downtime Upgrade (ZDU) https://issues.apache.org/jira/browse/HDDS-14496 label Apr 2, 2026
Copy link
Copy Markdown
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dombizita, LGTM.

Comment thread hadoop-ozone/dist/src/main/compose/upgrade/compose/ha/load.sh Outdated
Comment thread hadoop-ozone/dist/src/main/compose/testlib.sh Outdated
Copy link
Copy Markdown
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @dombizita. I don't have any experience configuring HA proxy but Cursor found this potential issue:

s3-haproxy.cfg uses plain balance roundrobin with no check / option httpchk and no option redispatch / retries. While a backend is stopped during rolling_restart_service, HAProxy still has a 1-in-3 chance of selecting it on each new connection, so S3 calls can fail even though two gateways are up. That works against “constant uptime” and can make the upgraded callbacks flaky.

Comment thread hadoop-ozone/dist/src/main/compose/testlib.sh Outdated
@dombizita
Copy link
Copy Markdown
Contributor Author

Thanks for the review @adoroszlai and @errose28! I addressed your comments in the latest commits, also enabled the rolling-upgrade test suite in the upgrade/test.sh file and disabled the non-rolling-upgrade. I think it'd be good to do that, since we are on a feature branch, if there is some flakiness, we can comment it out any time and fix it.

Copy link
Copy Markdown
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@errose28 errose28 merged commit fca46ec into apache:HDDS-14496-zdu Apr 21, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

zdu Pull requests for Zero Downtime Upgrade (ZDU) https://issues.apache.org/jira/browse/HDDS-14496

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants