sai_test config: drop redundant bridge_id, env-gated port bring-up, fix v4/v6 NHG port-list aliasing#2301
Conversation
Signed-off-by: Nicholas Ching <nicholaslching@gmail.com>
…default option unchanged Signed-off-by: Nicholas Ching <nicholaslching@gmail.com>
Signed-off-by: Nicholas Ching <nicholaslching@gmail.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| for index, item in enumerate(port_list): | ||
| port_bp = sai_thrift_create_bridge_port( | ||
| self.client, | ||
| bridge_id=bridge_id, |
There was a problem hiding this comment.
How will this change affect other platforms? Can this be fixed in vpp platform?
There was a problem hiding this comment.
In the SAI spec, SAI_BRIDGE_PORT_ATTR_BRIDGE_ID is not valid for SAI_BRIDGE_PORT_TYPE_PORT.
Lines 182 to 190 in 0aba236
For this type, the bridge port is implicitly associated with the switch’s default 1Q bridge and bridge_id should not be passed on create.
VPP (via saiserver) was using the sonic-sairedis meta, enforcing the @condition above and rejected the extra bridge_id for PORT type. This was causing issues during the tests where create failed at SAI API validation. Since bridge_id should not be included for PORT type according to saibridge.h, I believe VPP's enforcement is correct and the fix would be relevant to other platforms as well; they were likely ignoring the extra attribute before.
There was a problem hiding this comment.
got it. I see create_bridge_ports has only one caller and the type is PORT. We can safely remove bridge id.
Context / motivation
Part of the SAIVPP unit-test framework (see PR 1 / our
docker-sai-test-vpp/devdocs/, esp. the 6-19 entry). Three independent correctness/robustness fixes in the OCPsai_testconfig helpers that the suite needs when run against the VPP SAI backend.What this change does
test/sai_test/config/port_configer.py— dropbridge_idoncreate_bridge_port. Passingbridge_idtosai_thrift_create_bridge_portcaused a create failure in our backend; the default 1Q bridge is used, so the argument is redundant. Removing it lets bridge-port creation succeed.test/sai_test/config/port_configer.py— env-gated, bounded port bring-up wait.turn_up_and_get_checked_ports()waited per port, serially (retries × sleep) for oper-status UP. On a 32-port topology where oper-status is slow to settle, this is ~60s+ of dead time per common-config build. The wait is now tunable via env (SAI_PORT_UP_RETRIES,SAI_PORT_UP_POLL_INTERVAL,SAI_PORT_UP_SHARED_WAIT) and can poll all ports together in one bounded window. Defaults preserve the original behavior (per-port wait), so real-HW/other OCP consumers are unchanged unless they opt in; only our harness sets the fast values.test/sai_test/config/route_configer.py— give v4/v6 NHGs independentmember_port_indexs.create_nexthop_group_by_nexthops()constructed the v4 and v6NexthopGroupobjects sharing one Python list object formember_port_indexs. A mutation on one group (member remove/re-add tests) then corrupted the other's port list (ValueError: list.remove(x): x not in list). Each group now gets its ownlist(...)copy. (Latent aliasing bug, independent of any harness specifics.)Scope / risk
bridge_idremoval relies on the default 1Q bridge (already how these tests are used).Dependencies
None. Related to #2299 and #2300, however, their edits are isolated and each target
master.