bgp_apply fixes#773
Conversation
Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
The bgp_apply API enpdoint is meant to be idempotent and to fully define the state of the world from a BGP config standpoint. This updates the bgp_apply API handler to delete all existing BGP routers except for any with an ASN matching the API request. This also adds missing logic to collect unnumbered peer groups, rather than comparing unnumbered peers against numbered peer groups. Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
The Nbr/Unbr types scoped to do_bgp_apply() were manually implementing PartialEq and Hash using just the IP/Interface fields from the peers in the API request, which discarded the ASN. In the case of an ASN update, a peer needs the ASN to distinguish between (Peer, old ASN) and (Peer, new ASN) so it doesn't get put into the to_modify bucket. This removes the manual trait impls in lieu of deriving them. With that in place, to_add gets (Peer, new ASN) and to_remove gets (Peer, old ASN). However, because deletes are processed last, the peer is removed in the final state rather than added to the new ASN. Moving the deletes ahead of the adds/changes resolves the order of operations problem, resulting in (Peer, new ASN) as the final state. Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Extract do_delete_router so router deletion prunes the router's neighbors instead of orphaning them, and have bgp_apply reuse it for stale-router cleanup rather than reaping routers inline. Add tests for ASN changes, empty applies, and router deletion. Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
jgallagher
left a comment
There was a problem hiding this comment.
The changes LGTM, but I'd generally prefer if someone more familiar with mgd gave a final approval.
I did try all the reproduction steps from #772 and they all look good, with one possible caveat I wanted to ask about: If I have an unnumbered peer in ASN 123 like this:
% curl -H 'api-version: 8.0.0' 'http://[::1]:8889/bgp/config/routers'
[{"asn":123,"id":123,"listen":"[::]:179","graceful_shutdown":false}]
% curl -H 'api-version: 8.0.0' 'http://[::1]:8889/bgp/config/unnumbered-neighbors?asn=123'
[{"asn":123,"name":"unnumbered-qsfp0","group":"qsfp0","interface":"tfportqsfp0_0","act_as_a_default_ipv6_router":1,"hold_time":6,"idle_hold_time":3,"delay_open":0,"connect_retry":3,"keepalive":2,"resolution":100,"passive":false,"remote_asn":null,"min_ttl":null,"md5_auth_key":null,"multi_exit_discriminator":null,"communities":[],"local_pref":null,"enforce_first_as":false,"vlan_id":null,"ipv4_unicast":{"nexthop":null,"import_policy":"NoFiltering","export_policy":"NoFiltering"},"ipv6_unicast":{"nexthop":null,"import_policy":"NoFiltering","export_policy":"NoFiltering"},"deterministic_collision_resolution":false,"idle_hold_jitter":null,"connect_retry_jitter":{"min":0.75,"max":1.0},"src_addr":null,"src_port":null}]
and I apply a config that has ASN 123 with no peers:
% cat req123-empty.json
{
"asn": 123,
"originate": [],
"peers": {},
"unnumbered_peers": {}
}
% curl -v -X POST -H 'content-type: application/json' -H 'api-version: 8.0.0' 'http://[::1]:8889/bgp/omicron/apply' --data @req123-empty.json
then it does remove the neighbors as expected, but I'm still left with a router entry for 123:
% curl -H 'api-version: 8.0.0' 'http://[::1]:8889/bgp/config/unnumbered-neighbors?asn=123'
[]
% curl -H 'api-version: 8.0.0' 'http://[::1]:8889/bgp/config/routers'
[{"asn":123,"id":123,"listen":"[::]:179","graceful_shutdown":false}]
I think that makes sense - I asked for a config to applied that has ASN 123 but has no peers. But is this a reasonable config at all?
The context for this is mostly: what should Nexus/sled-agent do if the desired config is "no BGP"? I think based on the above, instead of trying to apply a config (which requires an ASN), it should:
- Get the current routers (we expect to get back 0 or 1, given Nexus's current "at most one ASN per switch" constraint)
- Delete them all
Does that sound right?
| let log = ctx.log.clone(); | ||
|
|
||
| // Validate originate prefixes before processing | ||
| validate_prefixes(&rq.originate)?; |
There was a problem hiding this comment.
Why did this move to the end?
A general/vague concern with this function is: what happens if we bail out from a ? partway through? It looks like that will have applied whatever things happened before - will that leave us in a state where a future apply can still work, or could we be in some weird / broken intermediate state?
There was a problem hiding this comment.
I moved it to the end because it seemed relevant to group it with the application of the originated routes.
That said, I do think we have a lot of early returns without cleanup... I'll see about doing request validation up front rather than along the way.
Question for you: are you okay with changes to the shape of this API? e.g. if I wanted to push some of the validations into the ApplyRequest type rather than putting logic into the handler, would that be ok for nexus?
| ); | ||
| } | ||
|
|
||
| /// Regression test for https://github.com/oxidecomputer/maghemite/issues/772 |
There was a problem hiding this comment.
Thanks for these tests 👍
I don't think the shape of the API today really provides a way to remove a BGP instance. ApplyRequest has a mandatory ASN and no parameters for requesting deletion... plus there's no DELETE verb for it either. So does it sound "right"? It's working as expected, but I think it would be better to change the shape to allow for a clean deletion instead of leaving inert config behind. Thoughts? |
|
I've been working on switching the new sled-agent scrimlet reconcilers to use the REST API instead of the apply API. It's doable, but I don't think the reasons here are right, and I think in general we probably do want to prefer compound operations with atomic semantics over independent APIs. (We've had to do nontrivial work in Reconfigurator to change from individual imperative operations to larger, compound idempotent operations for correctness, similar to the REST vs apply question here.) A couple of the problems we've had with individual APIs that could also apply here are:
1 is my biggest concern with networking APIs generally (both in omicron and here). We need a way for a Nexus that's potentially operating on stale data to be unable to make changes that overwrite changes pushed by a Nexus operating on newer data. The most common pattern we use for that elsewhere is for Nexus to have a compound object with a generation number, and it sends that object to the server, which is responsible for (a) rejecting requests whose generation number is stale and (b) ensuring the settings described by the object are applied in an idempotent way. In this specific case, we can sidestep this problem entirely, because with the work to move reconciliation from Nexus to the scrimlet sled-agent, there will only be a single mgd client, so none of this applies. But I wanted to jot these notes down while the thoughts were fresh. |
Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Fixes: #783 Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
Updates entries stored in the persistent (sled) DB to use the BGP ASN as part of their key (prepended ahead of the &'static str key). This keeps config stored for a Router with ASN X from being used by a different Router with ASN Y, and also provides some of the namespacing necessary for multiple Routers to coexist. Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
a6346df to
bbec566
Compare
Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
|
I figure it is worth it to get the bgp_apply changes put in place even if we decide to go another route on the API endpoints for nexus. The code is already written so there's no additional cost there, and we can at least plug the gaps in the interim before larger changes come in. So I've pushed a few more commits that put the prefix validation back to the front of do_bgp_apply() and fix another bug related to old state not getting cleaned up upon BGP Router deletion (#783) |
Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
rcgoodfellow
left a comment
There was a problem hiding this comment.
For the ASN namspacing, while it's consistent with the current multi-instance router design that's keyed on ASN, I'm not sure if it's the direction we'll ultimately wind up heading in. From RFD 662 and in particular this section, I think the direction we are going in is toward router instances that are tied to user defined configurations (likely keyed by a UUID that is managed by Omicron) where the same ASN may be used across configurations.
I'm not opposed to these ASN-based namespacing changes landing, as they're consistent with the current design. Just want to be clear that the direction is likely to change in the relatively near future.
Adds a test case that validates the expected properties of a router del.
It ensures that neighbor, policy and origin{4,6} entries are cleaned up
and are not inherited by a router created after the deletion.
Signed-off-by: Trey Aspelund <trey@oxidecomputer.com>
|
Working through some more fun with a4x2 currently. All the zones deploy, nexus gets an external ip, nat entries are setup, but running snoop within the nexus zone doesn't show any packets arriving. BGP state looks good on both rack switches, cr1, cr2, and ce... so it's not clear yet why ingress traffic doesn't get to nexus. Possibly a firewall thing. |
Fixes a few issues in the bgp_apply API handler.
Centralizes router deletion logic into do_delete_router() method.
Adds unit tests for do_delete_router() and for do_bgp_apply().
Fixes: #772
Fixes: #783