Skip to content

operator: KBS API for LUKS key registration#248

Open
iroykaufman wants to merge 5 commits into
trusted-execution-clusters:mainfrom
iroykaufman:luks-key
Open

operator: KBS API for LUKS key registration#248
iroykaufman wants to merge 5 commits into
trusted-execution-clusters:mainfrom
iroykaufman:luks-key

Conversation

@iroykaufman

@iroykaufman iroykaufman commented May 4, 2026

Copy link
Copy Markdown
Member

Currently, every time the LUKS key is updated, the operator patches the trustee deployment, which causes a restart of the pod. This PR introduces a way to avoid this by setting the LUKS key using the KBS API.

Core implementation points:

  • Right before the trustee first deploys the operator, create an auth key for the KBS API and save it as a kube secret.
  • Replacing the patch mechanism with API calls
  • Add a reconcile loop that watches for changes in the trustee deployment and sync LUKS key that was lost.

Tests:

  • test_luks_key_sync - this test checks that the luks key is first sent to tustee and validates that after trustee restarts, the keys are sent again. Also, at the end, I delete one of the machines and check that the secret has been deleted.

Summary by Sourcery

Integrate KBS API–based LUKS key management for trustee, replacing deployment patching with authenticated API calls and adding automatic resync on trustee restarts.

New Features:

  • Introduce authenticated KBS API integration to store and manage per-machine LUKS keys instead of mounting secrets into the trustee deployment.
  • Add a controller that watches the trustee deployment and re-synchronizes all machine LUKS keys to KBS when the trustee becomes ready.
  • Generate and manage an Ed25519 keypair as a Kubernetes secret for authenticating KBS API access from the operator.

Enhancements:

  • Label and configure the trustee (KBS) service and deployment consistently via a shared label selector constant and updated volume templates.
  • Switch KBS resource storage to the kvstorage plugin and configure KBS admin to use the public auth key for API access.

Build:

  • Pin compute-pcrs-lib to a specific Git revision for reproducible builds and add kbs-client as a new operator dependency, along with installing perl in the container image for the build tooling.

Tests:

  • Add an end-to-end test that verifies initial LUKS key upload, re-sync after trustee restart, and LUKS key deletion on machine removal, plus unit tests for Ed25519 key generation.

@Jakob-Naucke Jakob-Naucke left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for creating this, and thank you for already creating a test. Make sure that linting & build/unit tests pass.

Comment thread operator/src/trustee.rs Outdated
pub async fn launch_trustee_sync_controller(client: Client) {
let deployments: Api<Deployment> = Api::default_namespaced(client.clone());
let watcher_config = watcher::Config {
label_selector: Some("app=kbs".to_string()),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe a constant also used in generate_kbs_{service,deployment} is better

Comment thread operator/src/trustee.rs Outdated
use serde::{Serialize, Serializer};
use serde_json::{Value::String as JsonString, json};
use std::collections::BTreeMap;
use kbs_client;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

@iroykaufman iroykaufman force-pushed the luks-key branch 2 times, most recently from dc726d8 to c5ec81d Compare May 6, 2026 15:22
@iroykaufman

Copy link
Copy Markdown
Member Author

The CI is failing because we need to have some Perl dependency. I solved this by adding this RUN dnf install -y perl-FindBin perl-core to the containerfile. @Jakob-Naucke, any idea what the best approach is to solve it for the CI?

@Jakob-Naucke

Copy link
Copy Markdown
Member

@iroykaufman these tests run in (Ubuntu) GHA containers that aren't affected by the Containerfile (this would get picked up in integration tests once I manage to fix them). You'll need to add them to the steps in .github/rust.yml:

steps:
  - name: "Install OpenSSL dependencies"
    run: apt-get install -y …

@Jakob-Naucke

Copy link
Copy Markdown
Member

@iroykaufman whoops, you can set your own build container, and we do, and it's a Fedora container, so it should be dnf as usual. You did set it in the good place though afaict.

@Jakob-Naucke

Jakob-Naucke commented May 7, 2026

Copy link
Copy Markdown
Member

@iroykaufman thanks for the updates. integration tests are failing because these packages are missing there too. in my intuition, those shouldn't need to be built on the host at all so let me look into that first.

e: before the next push you can also still look into the lint failures

@iroykaufman

Copy link
Copy Markdown
Member Author

@iroykaufman thanks for the updates. integration tests are failing because these packages are missing there too. in my intuition, those shouldn't need to be built on the host at all so let me look into that first.

Thanks, let me know and then I'll push with the lint fix

@iroykaufman

iroykaufman commented May 7, 2026

Copy link
Copy Markdown
Member Author

@Jakob-Naucke maybe update the container file in buildroot with the openssl dependency is better then adding them in the github workflow files. WDYT?

@Jakob-Naucke

Copy link
Copy Markdown
Member

@Jakob-Naucke maybe update the container file in buildroot with the openssl dependency is better then adding them in the github workflow files. WDYT?

Yes. I got confused by first thinking we were on GHA's Ubuntu containers, but the buildroot is the better place.

@Jakob-Naucke

Copy link
Copy Markdown
Member

and the CI host does need these to compile the integration tests, which is slightly not how I thought cargo dependencies worked, but alas, these packages are now installed

@iroykaufman

Copy link
Copy Markdown
Member Author

I deleted the changes for the github workflow and opened PR#9. When this is merged, the CI should work

@alicefr

alicefr commented May 11, 2026

Copy link
Copy Markdown
Contributor

@sourcery-ai review

@sourcery-ai

sourcery-ai Bot commented May 11, 2026

Copy link
Copy Markdown

Reviewer's Guide

Implements KBS API–based LUKS key management so trustee no longer needs its deployment patched on each key update, introduces an auth keypair/secret for KBS, adds a controller to resync machine LUKS keys when trustee pods restart, switches KBS storage to kvstorage, and adds an end‑to‑end test covering LUKS key sync and deletion.

Sequence diagram for LUKS key registration via KBS API

sequenceDiagram
    actor User as ClusterOperator
    participant K8s as KubernetesAPI
    participant RegSrv as RegisterServerController
    participant Trustee as TrusteeModule
    participant KBS as KBSService

    User->>K8s: Create/Update Machine
    K8s-->>RegSrv: Machine event
    RegSrv->>RegSrv: keygen_reconcile
    RegSrv->>Trustee: generate_secret(client, id, owner_reference)
    Trustee->>K8s: Create Secret id (root key)
    K8s-->>Trustee: Secret id created

    RegSrv->>Trustee: send_secret(client, id)
    Trustee->>K8s: Get Secret id
    K8s-->>Trustee: Secret id (root data)
    Trustee->>K8s: Get Secret trustee-auth
    K8s-->>Trustee: Secret trustee-auth (private.key)
    Trustee->>KBS: set_resource(url, auth_key, resource_bytes, path, [])
    KBS-->>Trustee: 201 Created
    Trustee-->>RegSrv: Ok
    RegSrv-->>K8s: Reconcile result Action::await_change()
Loading

Sequence diagram for LUKS key resync on trustee restart

sequenceDiagram
    participant K8s as KubernetesAPI
    participant SyncCtl as TrusteeSyncController
    participant TrusteeDep as TrusteeDeployment
    participant Trustee as TrusteeModule
    participant KBS as KBSService

    Note over TrusteeDep,K8s: TrusteeDeployment restarted
    K8s-->>SyncCtl: Deployment event (label app=kbs)
    SyncCtl->>SyncCtl: trustee_deployment_reconcile
    SyncCtl->>TrusteeDep: Read status (ready_replicas, replicas)
    TrusteeDep-->>SyncCtl: Status ready
    SyncCtl->>Trustee: sync_all_machine_luks_key(client)

    Trustee->>K8s: List Secrets (default namespace)
    K8s-->>Trustee: Secrets owned by Machines
    loop For each machine Secret id
        Trustee->>Trustee: send_secret(client, id)
        Trustee->>K8s: Get Secret id
        K8s-->>Trustee: Secret id (root data)
        Trustee->>K8s: Get Secret trustee-auth
        K8s-->>Trustee: Secret trustee-auth (private.key)
        Trustee->>KBS: set_resource(url, auth_key, resource_bytes, path, [])
        KBS-->>Trustee: Response
    end
    Trustee-->>SyncCtl: Ok
    SyncCtl-->>K8s: Action::await_change()
Loading

Sequence diagram for LUKS key deletion via KBS API

sequenceDiagram
    participant K8s as KubernetesAPI
    participant RegSrv as RegisterServerController
    participant Trustee as TrusteeModule
    participant KBS as KBSService

    K8s-->>RegSrv: Machine deletion event
    RegSrv->>RegSrv: keygen_reconcile finalizer
    RegSrv->>Trustee: delete_secret(client, id)
    Trustee->>K8s: Get Secret trustee-auth
    K8s-->>Trustee: Secret trustee-auth (private.key)
    Trustee->>KBS: delete_resource(url, auth_key, path, [])
    KBS-->>Trustee: 200 OK
    Trustee-->>RegSrv: Ok
    RegSrv-->>K8s: Finalizer completed (Action::await_change())
Loading

Class diagram for new Ed25519 key pair and trustee auth secret generation

classDiagram
    class Ed25519KeyPair {
        +Vec~u8~ private_key_pem
        +Vec~u8~ public_key_pem
    }

    class TrusteeModule {
        +generate_ed25519_key_pair() Result~Ed25519KeyPair~
        +generate_trustee_auth_keys_secret(client, owner_reference) Result~()~
        +get_auth_key(client) Result~String~
        +send_secret(client, id) Result~()~
        +delete_secret(client, id) Result~()~
        +sync_all_machine_luks_key(client) Result~()~
        +launch_trustee_sync_controller(client) void
    }

    TrusteeModule --> Ed25519KeyPair : creates
Loading

File-Level Changes

Change Details Files
Replace trustee deployment patching with KBS API calls for LUKS key lifecycle (create/sync/delete).
  • Introduce get_auth_key helper to read the trustee auth private key from a namespaced Kubernetes Secret
  • Implement send_secret to fetch machine LUKS key Secret data and call kbs_client::set_resource against the KBS HTTP endpoint
  • Implement delete_secret to remove machine LUKS key resources via kbs_client::delete_resource
  • Update keygen_reconcile to use send_secret on create and delete_secret on cleanup instead of mount_secret/unmount_secret, and improve cleanup error reporting
operator/src/trustee.rs
operator/src/register_server.rs
operator/src/kbs-config.toml
operator/src/main.rs
operator/Cargo.toml
lib/src/endpoints.rs
Add trustee auth keypair generation and wiring so KBS API is authenticated via Ed25519 public key and key material is mounted into KBS pods.
  • Add Ed25519KeyPair type and generate_ed25519_key_pair helper with tests validating PEM format and uniqueness
  • Implement generate_trustee_auth_keys_secret to create the trustee-auth Secret with private and public keys owned by the cluster
  • Extend KBS volume templates to mount the trustee-auth Secret under /key, exposing only the public key file
  • Update kbs-config.toml admin section to use auth_public_key=/key/public.pub and switch the resource plugin type to kvstorage
  • Set the KBS deployment label selector to use the KBS_LABEL_SELECTOR constant and apply matching labels on the Deployment
operator/src/trustee.rs
operator/src/kbs-config.toml
operator/src/main.rs
lib/src/endpoints.rs
tests/Cargo.toml
Introduce a trustee deployment watcher/controller that resynchronizes all machine LUKS keys to KBS when trustee becomes ready.
  • Add sync_all_machine_luks_key to enumerate Machine-owned Secrets and push each to KBS via send_secret with logging and per-secret error handling
  • Implement trustee_deployment_reconcile to detect when the trustee Deployment has desired ready replicas and then trigger a full machine LUKS key resync
  • Add launch_trustee_sync_controller to start a kube-runtime Controller watching Deployments labeled with app=KBS_LABEL_SELECTOR, and wire it into main() startup
operator/src/trustee.rs
operator/src/main.rs
lib/src/endpoints.rs
Extend test coverage with an end-to-end LUKS key sync scenario and expose a helper for waiting on Deployment readiness.
  • Add test_luks_key_sync integration test that creates two Machines, waits for their Secrets, verifies both are sent to KBS from operator logs, restarts the trustee Deployment and checks resync logs, deletes one Machine and confirms its secret is removed from KBS, and performs cleanup
  • Expose TestContext::wait_for_deployment_ready as pub so tests can wait for trustee Deployment readiness
tests/trusted_execution_cluster.rs
test_utils/src/lib.rs
Tidy up dependencies and build/test support for the new KBS client and tests.
  • Pin compute-pcrs-lib to a specific git revision for reproducibility
  • Add kbs-client git dependency to the operator crate for calling the KBS API
  • Add chrono as a test dependency to support timestamped deployment restarts
  • Install perl in the Containerfile to satisfy new build or test requirements
Cargo.toml
operator/Cargo.toml
tests/Cargo.toml
Containerfile
Cargo.lock

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In launch_trustee_sync_controller, the watcher label_selector is set as "app={KBS_LABEL_SELECTOR}", which will not expand the constant and thus never match your KBS pods; this should be built with format!("app={KBS_LABEL_SELECTOR}") or similar.
  • The new test_luks_key_sync relies on grepping operator logs for specific hard-coded message substrings (including the exact count "Syncing 2 machine luks key to KBS"), which makes the test brittle to benign log or wording changes; consider asserting behavior via the KBS state or Kubernetes resources instead of log contents where possible.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `launch_trustee_sync_controller`, the watcher `label_selector` is set as `"app={KBS_LABEL_SELECTOR}"`, which will not expand the constant and thus never match your KBS pods; this should be built with `format!("app={KBS_LABEL_SELECTOR}")` or similar.
- The new `test_luks_key_sync` relies on grepping operator logs for specific hard-coded message substrings (including the exact count `"Syncing 2 machine luks key to KBS"`), which makes the test brittle to benign log or wording changes; consider asserting behavior via the KBS state or Kubernetes resources instead of log contents where possible.

## Individual Comments

### Comment 1
<location path="operator/src/trustee.rs" line_range="244" />
<code_context>
+pub async fn launch_trustee_sync_controller(client: Client) {
+    let deployments: Api<Deployment> = Api::default_namespaced(client.clone());
+    let watcher_config = watcher::Config {
+        label_selector: Some("app={KBS_LABEL_SELECTOR}".to_string()),
+        ..Default::default()
+    };
</code_context>
<issue_to_address>
**issue (bug_risk):** The label selector string is not interpolating `KBS_LABEL_SELECTOR` and will not match any pods.

Because the selector is the literal string `"app={KBS_LABEL_SELECTOR}"`, the controller will never match any `kbs`-labelled resources. Build the selector string dynamically (e.g. `format!("app={}", KBS_LABEL_SELECTOR)` or a const with the expanded value) so it actually aligns with the deployment and service labels.
</issue_to_address>

### Comment 2
<location path="tests/trusted_execution_cluster.rs" line_range="604-613" />
<code_context>
+    // Delete machine1 and verify its secret is removed from both K8s and KBS
</code_context>
<issue_to_address>
**issue (testing):** Also assert that the machine Secret is deleted from Kubernetes, not only from KBS logs

Right now the test only checks the operator logs for `"Secret {id1} deleted successfully"`, which covers the KBS side. To match the comment and fully validate behavior, please also assert that the corresponding Kubernetes Secret no longer exists (for example, `secrets_api.get(&machine1_uuid).await` returns `NOT_FOUND` or fails via the Poller).
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread operator/src/trustee.rs Outdated
Comment thread tests/trusted_execution_cluster.rs
@iroykaufman iroykaufman force-pushed the luks-key branch 2 times, most recently from a751ea3 to eeb4005 Compare May 14, 2026 14:03

@Jakob-Naucke Jakob-Naucke left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If changes to the base branch made a rebase necessary, the usual practice is to fix and keep the existing commits, so I think you can squash in this instance.
  • Pushing new buildroots got auto-disabled due to inactivity (the more you know). I re-enabled it so FindBin etc. are hopefully there on next attempt.

Comment thread operator/src/kbs-config.toml Outdated

[admin]
type = "DenyAll"
type = "Simple"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about this one, didn't see it coming but you fixed it :)

@iroykaufman

Copy link
Copy Markdown
Member Author

If changes to the base branch made a rebase necessary, the usual practice is to fix and keep the existing commits, so I think you can squash in this instance.

This is still WIP so I'll squash it into one commit after everything is fixed.

@iroykaufman iroykaufman force-pushed the luks-key branch 4 times, most recently from 9c68c4b to 7dd60c7 Compare May 21, 2026 11:03
@alicefr

alicefr commented May 21, 2026

Copy link
Copy Markdown
Contributor

@iroykaufman can you please split the git history into multiple commits to make the review easier

@openshift-ci openshift-ci Bot added the lgtm label Jun 30, 2026
)
.await?;

test_ctx.wait_for_deployment_ready(&deployments, TRUSTEE_DEPLOYMENT, 120).await?;

@alicefr alicefr Jun 30, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of polling here, can we replace it with await_condition?
Something like:

  let is_available = |obj: Option<&Deployment>| -> bool {
      obj.and_then(|d| d.status.as_ref())
          .map(|r| r >= 1)
          .unwrap_or(false)
  };
  let ready = await_condition(deployments.clone(), TRUSTEE_DEPLOYMENT, is_available);
  timeout(Duration::from_secs(120), ready).await.context("waiting for trustee to be ready")??;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jakob-Naucke probably we could replace several polling operation with that. WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, I missed these

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to look for a condition with ( type==Available && status == "True" ) ?
Alternatively we can check for r.available_replicas >= 1

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The poller here is inside the wait_for_deployment_ready this is what you mean?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I would completely remove the wait_for_deployment_ready

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove wait_for_deployment_ready and instead use await_condition? because I have to check that the object is ready.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, à la

let depl_ready = |depl: Option<&Deployment>| {
(which you could move up a level for reuse)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alicefr @Jakob-Naucke it is fine to just change wait_for_deployment_ready to use await_condition insted of polling?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, most preferably in a commit before the other ones, create the function and start using it from apply_cr_manifests, and then use it like you're already doing

.with_interval(Duration::from_millis(500))
.with_error_message("AttestationKeys not approved with secrets".to_string());

poller

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably can be refactor with the await_condition

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the same above using the polling

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polling where the condition uses the pod logs can't be replaced with await_condition. There are two poller that doesn't use logs so I'll refactor them.

@alicefr alicefr Jul 1, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't relay on logs for validating the test. That also needs to be changed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do get @iroykaufman's point here #248 (comment)

@alicefr alicefr Jul 1, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking the logs we could exec in the pod and verify that the secret is gone if that's really necessary. It seems to me more reliable.

@alicefr alicefr Jul 1, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does trustee save the secrets and AKs in its filesystem since the API introduction?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current kbs config the storage backend is configured to be on the filesystem.

Comment thread operator/src/trustee.rs
secret
.data
.as_ref()
.and_then(|d| String::from_utf8(d.get("public_key").unwrap().0.clone()).ok())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the secret doesn't have the field public_key, this will panic:

Suggested change
.and_then(|d| String::from_utf8(d.get("public_key").unwrap().0.clone()).ok())
.filter_map(|secret| {
secret
.data
.as_ref()
.and_then(|d| d.get("public_key"))
.and_then(|pk| String::from_utf8(pk.0.clone()).ok())
})

Comment thread operator/src/main.rs
Comment on lines +168 to +171
match reference_values::create_pcrs_config_map(client.clone()).await {
Ok(_) => info!("Created bare configmap for PCRs"),
Err(e) => error!("Failed to create the PCRs configmap: {e}"),
}

@alicefr alicefr Jun 30, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the previous behavior where we were returning an error

Comment thread operator/src/main.rs
Comment on lines +173 to +182
match trustee::generate_trustee_auth_keys_secret(client.clone(), owner_reference.clone()).await
{
Ok(_) => info!("Generate auth keys for the KBS API",),
Err(e) => error!("Failed to create the auth keys: {e}"),
}

match trustee::generate_rv_data(client.clone(), owner_reference.clone()).await {
Ok(_) => info!("Created configmap for reference values"),
Err(e) => error!("Failed to create the reference values configmap: {e}"),
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should here return an error instead of just printing it?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I know it wasn't out when this was raised but we changed this in #285

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix similarly the call to generate_trustee_auth_keys_secret() above

Comment thread operator/src/trustee.rs
Comment on lines +347 to +356
let c = Arc::unwrap_or_clone(client.clone());
if let Err(e) = sync_resource_policy(c.clone()).await {
warn!("Failed to sync resource policy to KBS: {e}");
}
if let Err(e) = sync_attestation_policy(c.clone()).await {
warn!("Failed to sync attestation policy to KBS: {e}");
}
if let Err(e) = sync_reference_values_from_configmap(&c).await {
warn!("Failed to sync reference values to KBS: {e}");
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why if those fails we only print a warning and not return an error?

Comment thread operator/src/trustee.rs
.into_iter()
pub async fn update_attestation_keys(client: Client) -> Result<()> {
let secrets: Api<Secret> = Api::default_namespaced(client.clone());
let secret_list = secrets.list(&Default::default()).await?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't use the secret store any longer. This was introduced by #267 to avoid direct APi calls to the API server

}

#[tokio::test]
async fn test_adopt_approved_images() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why those tests don't apply any longer?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we no longer need the function adopt_approved_images, we only use and test (above) adopt_approved_image

Comment thread operator/src/trustee.rs
Comment on lines +192 to +193
let secret_api: Api<Secret> = Api::default_namespaced(client.clone());
let auth_secret = secret_api.get(TRUSTEE_AUTH_SECRET).await?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this should use the secret store and not calling the API server directly

Comment thread operator/src/trustee.rs
Comment on lines +211 to +212
let secret_api: Api<Secret> = Api::default_namespaced(client.clone());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, please use the secret store

@alicefr

alicefr commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

@iroykaufman can you please also split the commit 774f99b into multiple It does too many things in one. I would at least split the bump of the kbs-client version and the configuration into a separate commit. Also I would separate the testing in its own commit. Then, it becomes easier to read

@alicefr

alicefr commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

I would also personally move the 512baf6 into a separate PR. It really solves a bug and it isn't part of this feature

@Jakob-Naucke

Copy link
Copy Markdown
Member

@iroykaufman can you please also split the commit 774f99b into multiple It does too many things in one. I would at least split the bump of the kbs-client version and the configuration into a separate commit.

@iroykaufman was the problem that the kbs-client won't work as used here with 17? if it does work, I agree. if it doesn't, I prefer big commits over breaking bisects @alicefr

Comment thread operator/src/main.rs
Comment on lines +173 to +182
match trustee::generate_trustee_auth_keys_secret(client.clone(), owner_reference.clone()).await
{
Ok(_) => info!("Generate auth keys for the KBS API",),
Err(e) => error!("Failed to create the auth keys: {e}"),
}

match trustee::generate_rv_data(client.clone(), owner_reference.clone()).await {
Ok(_) => info!("Created configmap for reference values"),
Err(e) => error!("Failed to create the reference values configmap: {e}"),
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix similarly the call to generate_trustee_auth_keys_secret() above

Comment thread operator/src/trustee.rs
}

fn generate_ed25519_key_pair() -> Result<Ed25519KeyPair> {
let key = openssl::pkey::PKey::generate_ed25519()?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now it's OK, but we should later prefer PQC keys (if available) over Ed25519

)
.await?;

test_ctx.wait_for_deployment_ready(&deployments, TRUSTEE_DEPLOYMENT, 120).await?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to look for a condition with ( type==Available && status == "True" ) ?
Alternatively we can check for r.available_replicas >= 1

Comment thread operator/Cargo.toml
kbs-client = {git = "https://github.com/confidential-containers/trustee.git", rev = "e65897a9ad4eb3ac69fa2ec75ed831200eb2acd7", default-features = false, features = ["native-tls"] }
jsonwebtoken = { version = "10.4.0", default-features = false, features = ["use_pem"] }
jsonwebtoken-openssl = "1.0.0"
jiff = "0.2.29"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we use chrono instead of jiff (for deletion_timestamp) -- it's already required.

@iroykaufman

Copy link
Copy Markdown
Member Author

@iroykaufman can you please also split the commit 774f99b into multiple It does too many things in one. I would at least split the bump of the kbs-client version and the configuration into a separate commit.

@iroykaufman was the problem that the kbs-client won't work as used here with 17? if it does work, I agree. if it doesn't, I prefer big commits over breaking bisects @alicefr

Correct. When bumping to trustee v0.20.0, it didn't work as is because the trustee storage backend changed. The idea was to use the API as an abstraction layer that could make it much easier to bump trustee to a later version or change the storage backend in the future.

@iroykaufman

Copy link
Copy Markdown
Member Author

I would also personally move the 512baf6 into a separate PR. It really solves a bug and it isn't part of this feature

I created this PR#302 for it. Once it gets merged, I will rebase this PR

Comment thread tests/trusted_execution_cluster.rs
@alicefr

alicefr commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@iroykaufman do you know where trustee stores the secrets and attestations keys after they have been configured via the API? I hope it stores it somewhere in its filesystem. If that's the case we need to persist them, otherwise if trustee restarts we are going to lose them

@uril

uril commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@alicefr The current solution does not persist the trustee-data on the filesystem - when the Trustee pod restarts, the operator uploads all the keys (and other configuration) to the new pod.

Currently Trustee does keep those on its file-system, but that may change in the future. For example there is an idea of storing the data in a database.

@Jakob-Naucke Jakob-Naucke self-requested a review July 1, 2026 11:05
@Jakob-Naucke Jakob-Naucke dismissed their stale review July 1, 2026 11:06

valid comments have been raised and I would like to re-review and prevent an accidental merge

@openshift-ci

openshift-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: iroykaufman

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alicefr

alicefr commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@iroykaufman can you please also split the commit 774f99b into multiple It does too many things in one. I would at least split the bump of the kbs-client version and the configuration into a separate commit.

@iroykaufman was the problem that the kbs-client won't work as used here with 17? if it does work, I agree. if it doesn't, I prefer big commits over breaking bisects @alicefr

Correct. When bumping to trustee v0.20.0, it didn't work as is because the trustee storage backend changed. The idea was to use the API as an abstraction layer that could make it much easier to bump trustee to a later version or change the storage backend in the future.

But is the old behavior of using config map for secrets, AK and RV still supported by 0.20? Or did they suddenly move and only support adding the various objects via API?

@Jakob-Naucke

Copy link
Copy Markdown
Member

But is the old behavior of using config map for secrets, AK and RV still supported by 0.20? Or did they suddenly move and only support adding the various objects via API?

no, file system (NB it doesn't matter if it's a configmap) is no longer supported AFAICT, see confidential-containers/trustee#1190

@iroykaufman

Copy link
Copy Markdown
Member Author

@iroykaufman can you please also split the commit 774f99b into multiple It does too many things in one. I would at least split the bump of the kbs-client version and the configuration into a separate commit.

@iroykaufman was the problem that the kbs-client won't work as used here with 17? if it does work, I agree. if it doesn't, I prefer big commits over breaking bisects @alicefr

Correct. When bumping to trustee v0.20.0, it didn't work as is because the trustee storage backend changed. The idea was to use the API as an abstraction layer that could make it much easier to bump trustee to a later version or change the storage backend in the future.

But is the old behavior of using config map for secrets, AK and RV still supported by 0.20? Or did they suddenly move and only support adding the various objects via API?

Trustee v0.20.0 replaced file-based configuration with a KV storage backend that Trustee manages internally it reads and writes to its own storage directories (e.g. /opt/trustee/storage/kbs/, /opt/trustee/storage/reference_value/). While it is still possible to mount these files with some extra effort, doing so in this version and moving forward does not appear to be the correct approach.

@alicefr

alicefr commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@iroykaufman can you please also split the commit 774f99b into multiple It does too many things in one. I would at least split the bump of the kbs-client version and the configuration into a separate commit.

@iroykaufman was the problem that the kbs-client won't work as used here with 17? if it does work, I agree. if it doesn't, I prefer big commits over breaking bisects @alicefr

Correct. When bumping to trustee v0.20.0, it didn't work as is because the trustee storage backend changed. The idea was to use the API as an abstraction layer that could make it much easier to bump trustee to a later version or change the storage backend in the future.

But is the old behavior of using config map for secrets, AK and RV still supported by 0.20? Or did they suddenly move and only support adding the various objects via API?

Trustee v0.20.0 replaced file-based configuration with a KV storage backend that Trustee manages internally it reads and writes to its own storage directories (e.g. /opt/trustee/storage/kbs/, /opt/trustee/storage/reference_value/). While it is still possible to mount these files with some extra effort, doing so in this version and moving forward does not appear to be the correct approach.

I didn't ask because I want to keep the old behavior, but if the mounting of the secrets and other objects still works, we could gradually add each component one by one in separate commits. In this way, we keep the git bisect working but the PR becomes a bit more readable

@iroykaufman

Copy link
Copy Markdown
Member Author

@iroykaufman can you please also split the commit 774f99b into multiple It does too many things in one. I would at least split the bump of the kbs-client version and the configuration into a separate commit.

@iroykaufman was the problem that the kbs-client won't work as used here with 17? if it does work, I agree. if it doesn't, I prefer big commits over breaking bisects @alicefr

Correct. When bumping to trustee v0.20.0, it didn't work as is because the trustee storage backend changed. The idea was to use the API as an abstraction layer that could make it much easier to bump trustee to a later version or change the storage backend in the future.

But is the old behavior of using config map for secrets, AK and RV still supported by 0.20? Or did they suddenly move and only support adding the various objects via API?

Trustee v0.20.0 replaced file-based configuration with a KV storage backend that Trustee manages internally it reads and writes to its own storage directories (e.g. /opt/trustee/storage/kbs/, /opt/trustee/storage/reference_value/). While it is still possible to mount these files with some extra effort, doing so in this version and moving forward does not appear to be the correct approach.

I didn't ask because I want to keep the old behavior, but if the mounting of the secrets and other objects still works, we could gradually add each component one by one in separate commits. In this way, we keep the git bisect working but the PR becomes a bit more readable

I understand, initially I tried to do exactly that, but the file formats also changed (you can choose to use either JSON or individual files). If I remember correctly, we use both formats: JSON for reference values and individual files for secrets. Because of this, it will create some extra work that will need to be reverted later. Maybe I missed things but this is what I stumbled upon.

@uril

uril commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Trustee v0.20.0 can still keep configuration files on its file-system, so it is possible to update to 0.20.0 first
As Roy mentioned it would take some adjustments of configuration files -- for example reference-values from json to files - each PCR has a separate file, instead of a single json file, and those adjustments will be overwritten by following patches.

@alicefr

alicefr commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

@uril @iroykaufman thanks for the explication! No, I don't want you to do useless extra work. Let's keep it in this way then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants