Skip to content

fix: node dependent resources#353

Merged
jason-lynch merged 2 commits intomainfrom
fix/PLAT-547/node-dependent-resources-ii
Apr 22, 2026
Merged

fix: node dependent resources#353
jason-lynch merged 2 commits intomainfrom
fix/PLAT-547/node-dependent-resources-ii

Conversation

@jason-lynch
Copy link
Copy Markdown
Member

@jason-lynch jason-lynch commented Apr 20, 2026

Summary

Node resources aren't created until after all instances are available. This means that node-dependent resources, such as the pgBackRest stanza resource, cannot be created until after all instances are available.

This change adds a mechanism to treat node-dependent instance resources separately and ensure they're created after the node resource. This fixes a bug where you could not create a new node from a backup if the new node would have more than one instance.

Changes

  • Move common.InstancePaths to the database package so that it can be used by the database.Orchestrator interface.
  • Add a separate InstanceResources.NodeDependents field for node-dependent resources and rename the Resources field to InstanceDependencies to clarify that these are resources that the instance depends on.

Testing

There is a unit test for this case, and I've added a read replica to the TestS3CreateDBFromBackup E2E test to exercise this case.

# Only works on the lima and EC2 environments because of the S3 dependency.
make update-lima-fixture

use-lima

make test-e2e E2E_RUN=TestS3CreateDBFromBackup

PLAT-547

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

Warning

Rate limit exceeded

@jason-lynch has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 11 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 10 minutes and 11 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b696137c-744f-40b7-bb06-2d524752cd31

📥 Commits

Reviewing files that changed from the base of the PR and between d13512f and bab94e3.

📒 Files selected for processing (27)
  • e2e/backup_restore_test.go
  • server/internal/database/instance_resource.go
  • server/internal/database/operations/add_nodes.go
  • server/internal/database/operations/add_nodes_test.go
  • server/internal/database/operations/common.go
  • server/internal/database/operations/end.go
  • server/internal/database/operations/helpers_test.go
  • server/internal/database/operations/restore_database.go
  • server/internal/database/operations/restore_database_test.go
  • server/internal/database/operations/update_database_test.go
  • server/internal/database/operations/update_nodes.go
  • server/internal/database/operations/update_nodes_test.go
  • server/internal/database/orchestrator.go
  • server/internal/database/paths.go
  • server/internal/database/paths_test.go
  • server/internal/orchestrator/common/etcd_creds.go
  • server/internal/orchestrator/common/patroni_config_generator.go
  • server/internal/orchestrator/common/patroni_config_generator_test.go
  • server/internal/orchestrator/common/pgbackrest_config.go
  • server/internal/orchestrator/common/pgbackrest_stanza.go
  • server/internal/orchestrator/common/postgres_certs.go
  • server/internal/orchestrator/swarm/orchestrator.go
  • server/internal/orchestrator/systemd/orchestrator.go
  • server/internal/orchestrator/systemd/patroni_unit.go
  • server/internal/orchestrator/systemd/pgbackrest_restore.go
  • server/internal/orchestrator/systemd/unit_options_test.go
  • server/internal/pgbackrest/config.go
📝 Walkthrough

Walkthrough

This PR refactors instance resource management across database orchestration by splitting resources into instance dependencies, database dependencies, and node dependents categories. The change introduces InstanceState() and InstancePaths() methods on orchestrators, consolidates path-related constants into a centralized database package, and updates all operational logic to use these new abstractions.

Changes

Cohort / File(s) Summary
Core Instance Resource Model
server/internal/database/instance_resource.go, server/internal/database/orchestrator.go
Added InstanceID() and PostgresVersion() methods to InstanceResource. Restructured InstanceResources to split resources into InstanceDependencies, DatabaseDependencies (unchanged), and new NodeDependents field. Renamed State() to InstanceState() and added AddNodeDependents() method. Updated NewInstanceResources() signature to accept three resource categories and node dependents. Added InstancePaths() method to Orchestrator interface.
Path Management and Constants
server/internal/database/paths.go, server/internal/database/paths_test.go
Renamed package from common to database. Added 10 new exported constants for etcd and Postgres certificate/key file names. Updated PgBackRestConfig signature to use pgbackrest.ConfigType. Improved restore argument parsing with strings.CutPrefix(). Updated test package and type references accordingly.
Credential and Certificate Files
server/internal/orchestrator/common/etcd_creds.go, server/internal/orchestrator/common/postgres_certs.go
Removed local certificate filename constants and replaced with imported database.* constants for both etcd and Postgres certificate files in read and write operations.
Patroni Configuration
server/internal/orchestrator/common/patroni_config_generator.go, server/internal/orchestrator/common/patroni_config_generator_test.go
Updated PatroniConfigGeneratorOptions.Paths type from InstancePaths to database.InstancePaths. Switched all certificate filename constants from postgres.* to database.* variants. Updated test cases to use database package types for path configuration.
PgBackRest Configuration
server/internal/orchestrator/common/pgbackrest_config.go, server/internal/pgbackrest/config.go
Removed local PgBackRestConfigType alias from common. Added new ConfigType type and constants ConfigTypeBackup/ConfigTypeRestore to pgbackrest package. Updated PgBackRestConfigIdentifier() and PgBackRestConfig struct to use pgbackrest.ConfigType. Changed Paths field to database.InstancePaths.
PgBackRest Stanza
server/internal/orchestrator/common/pgbackrest_stanza.go
Removed Paths field from PgBackRestStanza struct. Changed Refresh() and Create() to fetch primary instance via database.GetPrimaryInstance and compute paths using orchestrator.InstancePaths() instead of stored paths. Updated error handling messages.
Swarm Orchestrator
server/internal/orchestrator/swarm/orchestrator.go
Updated instanceResources() to return three resource categories instead of one. Added public InstancePaths() method returning database.InstancePaths. Updated GenerateInstanceResources() and GenerateInstanceRestoreResources() to pass partitioned resources to database.NewInstanceResources(). Reorganized resource collection into instance dependencies, database dependencies, and node dependents.
Systemd Orchestrator
server/internal/orchestrator/systemd/orchestrator.go, server/internal/orchestrator/systemd/patroni_unit.go, server/internal/orchestrator/systemd/pgbackrest_restore.go, server/internal/orchestrator/systemd/unit_options_test.go
Renamed instancePaths() to public InstancePaths() and changed return type to database.InstancePaths. Updated GenerateInstanceResources() to partition resources into three categories. Changed pgBackRest config type constants from common.PgBackRestConfigType* to pgbackrest.ConfigType*. Updated PatroniUnitOptions() and PgBackRestRestore to use database.InstancePaths. Updated test to import database package types.
Database Operations: Core Logic
server/internal/database/operations/common.go, server/internal/database/operations/add_nodes.go, server/internal/database/operations/end.go, server/internal/database/operations/update_nodes.go, server/internal/database/operations/restore_database.go, server/internal/database/operations/update_database.go
Removed instanceState() helper function from common.go. Updated NodeResources.nodeResourceState() to include instance.NodeDependents. Changed all operations to call inst.InstanceState() method instead of helper function. Updated resource collection to use InstanceDependencies instead of Resources field.
Database Operations: Test Logic
server/internal/database/operations/add_nodes_test.go, server/internal/database/operations/restore_database_test.go, server/internal/database/operations/update_nodes_test.go, server/internal/database/operations/update_database_test.go, server/internal/database/operations/helpers_test.go
Updated all test state construction to use InstanceDependencies instead of Resources. Added new test case for "two instances with node dependent resource" in add_nodes_test. Extended makeInstance() helper signature. Added nodeDependentResource test stub type implementing resource.Resource with node dependencies. Updated expected state assertions across all operation tests.
E2E Test Data
e2e/backup_restore_test.go
Updated TestS3CreateDBFromBackup test data to change restored database node spec HostIds from single host to multiple hosts.

Poem

🐰 Hops with glee at resources refactored clear,
No more single bucket—three homes now appear!
Paths consolidated, constants pulled from the scattered dark,
Orchestrators dance with methods new, leaving their mark.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 19.23% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: node dependent resources' directly relates to the main change: adding a mechanism to handle node-dependent instance resources separately. It clearly summarizes the primary fix without unnecessary details.
Description check ✅ Passed The PR description covers the summary, changes, testing instructions, and issue reference. All key sections of the template are addressed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/PLAT-547/node-dependent-resources-ii

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 20, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 30 complexity · -3 duplication

Metric Results
Complexity 30
Duplication -3

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/internal/orchestrator/common/pgbackrest_stanza.go (1)

47-51: ⚠️ Potential issue | 🔴 Critical

PgBackRestStanza's Dependencies() should explicitly include InstanceResourceIdentifier.

resource.FromContext does not enforce declared dependency constraints—it retrieves any resource present in state using state.Get(identifier) without validation. Both Refresh() and Create() call database.GetPrimaryInstance, which internally executes resource.FromContext twice: first for NodeResource (declared), then for InstanceResource (not declared). While the current state-building order (end.go) ensures InstanceResource is in state, this dependency is implicit and fragile. If state loading, reconstruction, or orchestration order changes, the undeclared dependency could fail intermittently. Declare InstanceResourceIdentifier as an explicit dependency.

This applies to the other methods at lines 57–69 and 97–109 that similarly call GetPrimaryInstance or access instance data without declaring the dependency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/orchestrator/common/pgbackrest_stanza.go` around lines 47 -
51, PgBackRestStanza.Dependencies currently only returns NodeResourceIdentifier
but the code paths in Refresh() and Create() call database.GetPrimaryInstance
(and otherwise access instance data) which relies on InstanceResource being
present; update PgBackRestStanza.Dependencies to also include
database.InstanceResourceIdentifier(p.ClusterName, p.NodeName) (or the correct
InstanceResourceIdentifier constructor used elsewhere) so InstanceResource is an
explicit dependency; also audit the Refresh() and Create() methods to ensure any
other implicit instance accesses correspond to declared identifiers and add
InstanceResourceIdentifier entries where missing.
🧹 Nitpick comments (2)
server/internal/database/orchestrator.go (1)

63-71: Consider renaming AddResources to AddInstanceDependencies for API consistency.

Now that the backing field is InstanceDependencies and there are sibling methods AddDatabaseDependencies / AddNodeDependents, the name AddResources is misleading — it suggests it adds to "resources" generically but it only appends to InstanceDependencies. Renaming would make the three add-methods symmetric with the three fields and reduce confusion at call sites (e.g. line 41 in NewInstanceResources).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/database/orchestrator.go` around lines 63 - 71, Rename the
method AddResources on the InstanceResources type to AddInstanceDependencies to
match the backing field InstanceDependencies and sibling methods
AddDatabaseDependencies/AddNodeDependents; update any call sites (e.g.,
NewInstanceResources) to use AddInstanceDependencies, preserving the current
behavior of converting inputs via resource.ToResourceDataSlice and appending to
r.InstanceDependencies, and keep the original signature and error handling so
implementation logic does not change.
server/internal/orchestrator/swarm/orchestrator.go (1)

1026-1033: Hardcoded container paths and ignored pgVersion — worth a comment.

A few observations on the swarm InstancePaths implementation:

  1. pgVersion is accepted but intentionally unused (hence _), whereas the systemd implementation requires it (returns an error if empty). Since the interface contract now allows pgVersion to be significant, consider adding a brief comment explaining why it's irrelevant for swarm (paths are fixed by the container image layout, not by PG major version). This will spare future maintainers from guessing.

  2. The container-internal paths /opt/pgedge, /usr/bin/pgbackrest, and /usr/local/bin/patroni are hardcoded here. They're correct for the current pgedge image, but if the image layout ever changes they'll silently drift. Consider promoting them to package-level consts (e.g. near OverlayDriver on line 44) so they're discoverable alongside other swarm constants and easier to update.

🧹 Suggested cleanup
 const (
 	OverlayDriver = "overlay"
+
+	// Container-internal paths for the pgedge postgres image.
+	containerPgEdgeBaseDir = "/opt/pgedge"
+	containerPgBackRestPath = "/usr/bin/pgbackrest"
+	containerPatroniPath    = "/usr/local/bin/patroni"
 )
-func (o *Orchestrator) InstancePaths(_ *ds.Version, instanceID string) (database.InstancePaths, error) {
+// InstancePaths returns filesystem paths for a swarm-managed instance. pgVersion is
+// ignored because the pgedge container image has a fixed layout regardless of PG major.
+func (o *Orchestrator) InstancePaths(_ *ds.Version, instanceID string) (database.InstancePaths, error) {
 	return database.InstancePaths{
-		Instance:       database.Paths{BaseDir: "/opt/pgedge"},
+		Instance:       database.Paths{BaseDir: containerPgEdgeBaseDir},
 		Host:           database.Paths{BaseDir: filepath.Join(o.cfg.DataDir, "instances", instanceID)},
-		PgBackRestPath: "/usr/bin/pgbackrest",
-		PatroniPath:    "/usr/local/bin/patroni",
+		PgBackRestPath: containerPgBackRestPath,
+		PatroniPath:    containerPatroniPath,
 	}, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/internal/orchestrator/swarm/orchestrator.go` around lines 1026 - 1033,
The InstancePaths method on Orchestrator currently ignores the pgVersion
parameter and returns hardcoded container-internal paths; add a short comment
inside Orchestrator.InstancePaths explaining that pgVersion is intentionally
unused for swarm because container image layout fixes binary locations, and
replace the literal strings "/opt/pgedge", "/usr/bin/pgbackrest", and
"/usr/local/bin/patroni" with package-level consts (e.g. define PGEDGE_BASE_DIR,
PGBACKREST_PATH, PATRONI_PATH near OverlayDriver) so the values are discoverable
and maintainable across the package; update InstancePaths to reference those
consts and keep behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/internal/database/instance_resource.go`:
- Around line 161-165: InstanceResource.PostgresVersion currently returns an
empty ds.Version when Spec.PgEdgeVersion.PostgresVersion is nil, which hides the
real error; change PostgresVersion to return (*ds.Version, error) and return a
descriptive error when the version is missing, then update callers (notably the
callsite in pgbackrest_stanza.go that passes the result into
orchestrator.InstancePaths) to handle the error and avoid passing an empty
version into orchestrator.InstancePaths; ensure error messages clearly identify
the InstanceResource and missing PostgresVersion so the root cause is surfaced.

In `@server/internal/orchestrator/common/pgbackrest_stanza.go`:
- Around line 26-29: The PgBackRestStanza struct was changed to remove the Paths
field, which will cause diffs against stored resources that still include
`paths`; update the resource handling to avoid noisy updates by either
incrementing the resource version or ignoring the old field: bump the
ResourceVersion from "1" to "2" in the resource registration that exposes
PgBackRestStanza (so the controller performs a one-time migration), or add the
JSONPath `/paths` to the DiffIgnore() list used when comparing stored vs desired
state; locate the PgBackRestStanza type and the resource/version registration
code and update the ResourceVersion constant or update the DiffIgnore()
implementation to include `/paths` so existing deployments won’t generate
unnecessary Update operations.

---

Outside diff comments:
In `@server/internal/orchestrator/common/pgbackrest_stanza.go`:
- Around line 47-51: PgBackRestStanza.Dependencies currently only returns
NodeResourceIdentifier but the code paths in Refresh() and Create() call
database.GetPrimaryInstance (and otherwise access instance data) which relies on
InstanceResource being present; update PgBackRestStanza.Dependencies to also
include database.InstanceResourceIdentifier(p.ClusterName, p.NodeName) (or the
correct InstanceResourceIdentifier constructor used elsewhere) so
InstanceResource is an explicit dependency; also audit the Refresh() and
Create() methods to ensure any other implicit instance accesses correspond to
declared identifiers and add InstanceResourceIdentifier entries where missing.

---

Nitpick comments:
In `@server/internal/database/orchestrator.go`:
- Around line 63-71: Rename the method AddResources on the InstanceResources
type to AddInstanceDependencies to match the backing field InstanceDependencies
and sibling methods AddDatabaseDependencies/AddNodeDependents; update any call
sites (e.g., NewInstanceResources) to use AddInstanceDependencies, preserving
the current behavior of converting inputs via resource.ToResourceDataSlice and
appending to r.InstanceDependencies, and keep the original signature and error
handling so implementation logic does not change.

In `@server/internal/orchestrator/swarm/orchestrator.go`:
- Around line 1026-1033: The InstancePaths method on Orchestrator currently
ignores the pgVersion parameter and returns hardcoded container-internal paths;
add a short comment inside Orchestrator.InstancePaths explaining that pgVersion
is intentionally unused for swarm because container image layout fixes binary
locations, and replace the literal strings "/opt/pgedge", "/usr/bin/pgbackrest",
and "/usr/local/bin/patroni" with package-level consts (e.g. define
PGEDGE_BASE_DIR, PGBACKREST_PATH, PATRONI_PATH near OverlayDriver) so the values
are discoverable and maintainable across the package; update InstancePaths to
reference those consts and keep behavior unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dc2dbb2c-68c8-46e2-8af4-f7a3a1e7ff01

📥 Commits

Reviewing files that changed from the base of the PR and between 107121a and d13512f.

📒 Files selected for processing (27)
  • e2e/backup_restore_test.go
  • server/internal/database/instance_resource.go
  • server/internal/database/operations/add_nodes.go
  • server/internal/database/operations/add_nodes_test.go
  • server/internal/database/operations/common.go
  • server/internal/database/operations/end.go
  • server/internal/database/operations/helpers_test.go
  • server/internal/database/operations/restore_database.go
  • server/internal/database/operations/restore_database_test.go
  • server/internal/database/operations/update_database_test.go
  • server/internal/database/operations/update_nodes.go
  • server/internal/database/operations/update_nodes_test.go
  • server/internal/database/orchestrator.go
  • server/internal/database/paths.go
  • server/internal/database/paths_test.go
  • server/internal/orchestrator/common/etcd_creds.go
  • server/internal/orchestrator/common/patroni_config_generator.go
  • server/internal/orchestrator/common/patroni_config_generator_test.go
  • server/internal/orchestrator/common/pgbackrest_config.go
  • server/internal/orchestrator/common/pgbackrest_stanza.go
  • server/internal/orchestrator/common/postgres_certs.go
  • server/internal/orchestrator/swarm/orchestrator.go
  • server/internal/orchestrator/systemd/orchestrator.go
  • server/internal/orchestrator/systemd/patroni_unit.go
  • server/internal/orchestrator/systemd/pgbackrest_restore.go
  • server/internal/orchestrator/systemd/unit_options_test.go
  • server/internal/pgbackrest/config.go

Comment thread server/internal/database/instance_resource.go Outdated
Comment thread server/internal/orchestrator/common/pgbackrest_stanza.go
pgBackRest stanza is a per-node resource, but it had an instance paths
property, which can be unique per instance.

This commit refactors the recently-added instance paths types and makes
it so that the instance paths are computed at run time in the pgBackRest
stanza resource.

Note that we haven't yet refactored the swarm package to use the new
common resources, so this bug and change only affected the systemd
orchestrator.

PLAT-547
@jason-lynch jason-lynch force-pushed the fix/PLAT-547/node-dependent-resources-ii branch from d13512f to 880ff1b Compare April 20, 2026 14:21
Node resources aren't created until after all instances are available.
This means that node-dependent resources, such as the pgBackRest stanza
resource, cannot be created until after all instances are available.

This change adds a mechanism to treat node-dependent instance resources
separately and ensure they're created after the node resource. This
fixes a bug where you could not create a new node from a backup if the
new node would have more than one instance.

PLAT-547
@jason-lynch jason-lynch force-pushed the fix/PLAT-547/node-dependent-resources-ii branch from 880ff1b to bab94e3 Compare April 20, 2026 14:25
Copy link
Copy Markdown
Member

@mmols mmols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, E2E worked locally as well

@jason-lynch jason-lynch merged commit 8eb56a4 into main Apr 22, 2026
3 checks passed
@jason-lynch jason-lynch deleted the fix/PLAT-547/node-dependent-resources-ii branch April 22, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants