Skip to content

fix: bypass SageMakerClient singleton for cross-region model package resolution#5924

Open
lucasjia-aws wants to merge 8 commits into
aws:masterfrom
lucasjia-aws:nova_tests
Open

fix: bypass SageMakerClient singleton for cross-region model package resolution#5924
lucasjia-aws wants to merge 8 commits into
aws:masterfrom
lucasjia-aws:nova_tests

Conversation

@lucasjia-aws
Copy link
Copy Markdown
Collaborator

The SageMakerClient singleton caches the first region it is initialized with and ignores subsequent region parameters. This causes Nova integ tests (which run in us-east-1) to fail when the singleton was already created with us-west-2 by an earlier test in the same process.

Errors observed:

  • ModelPackageGroup arn:aws:sagemaker:us-west-2:784379639078:model-package-group/sdk-test-finetuned-models does not exist
  • DescribeModelPackage: ARN should be scoped to correct region: us-west-2

Fix: use session.boto_session.client("sagemaker") directly instead of ModelPackageGroup.get() / ModelPackage.get() in the three call sites that resolve model package resources. This respects the session's actual region without depending on the singleton's cached state.

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…resolution

The SageMakerClient singleton caches the first region it is initialized with and ignores subsequent region parameters. This causes Nova integ tests (which run in us-east-1) to fail when the singleton was already created with us-west-2 by an earlier test in the same process.

Errors observed:
- ModelPackageGroup arn:aws:sagemaker:us-west-2:784379639078:model-package-group/sdk-test-finetuned-models does not exist
- DescribeModelPackage: ARN should be scoped to correct region: us-west-2

Fix: use session.boto_session.client("sagemaker") directly instead of ModelPackageGroup.get() / ModelPackage.get() in the three call sites that resolve model package resources. This respects the session's actual region without depending on the singleton's cached state.
_update_pipeline_lineage assumed the version context always exists.
When it's been deleted or never created (e.g. prior run failure),
DescribeContext throws ResourceNotFound. Now catches the error and
recreates the version context with proper associations.
…ates app

Replace hard-coded MLflow app ARN with a conftest fixture that finds an
existing ready app or creates a temporary one (cleaned up after tests).
Prevents failures when the hard-coded app is deleted or quota is full.

X-AI-Prompt: add self-healing mlflow fixture for llm_as_judge integ tests
X-AI-Tool: kiro-cli
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant