fix(remote): remove -U flag from auto-injected sagemaker requirements install#5922
fix(remote): remove -U flag from auto-injected sagemaker requirements install#5922nileshpatil6 wants to merge 1 commit into
Conversation
The @Remote decorator injects sagemaker>=3.2.0,<4.0.0 into a requirements file and installed it with pip install -r ... -U. The -U flag forces pip to upgrade sagemaker to the latest version within the range even if a compatible version is already installed. This created a version mismatch between the client (which serialized the function at version 3.5.0) and the container (which deserialized at 3.11.0), causing DeserializationError. Remove the -U flag from _install_requirements_txt and _install_req_txt_in_conda_env in both sagemaker-core and sagemaker-train. The version constraint already ensures compatibility; forced upgrades are not needed and actively harmful when the serialization format changes between minor versions. Fixes aws#5872 Signed-off-by: nileshpatil6 <technil6436@gmail.com>
There was a problem hiding this comment.
Thanks for addressing the forced upgrade issue. That makes sense given the DeserializationError.. I am trying to fugure out and learn something here from your PR,
One question: without the -U flag, how do we ensure environments don’t silently stick to an older but still compatible 3.x version that might miss important bug fixes? Curious if that trade‑off was considered during testing.
I wonder about the long‑term maintenance angle: do we risk jobs running with older 3.x versions that technically satisfy the constraint but lack fixes? The contributing guidelines emphasize guarding against future breaking changes; would an integration test around version pinning help catch regressions?
|
Good question. The trade-off is intentional, and it comes down to what the auto-injected constraint is for. The With If a user wants a specific newer version in the container, they can still pin it explicitly via |
Fixes #5872
What happened
The
@remote(and@step) decorator injectssagemaker>=3.2.0,<4.0.0into a temporary requirements file, then installs it with:The
-Uflag forces pip to upgrade sagemaker to the latest version within that range, even when a compatible version is already installed. The logs show this clearly:After the forced upgrade, the container tries to deserialize a payload serialized by the 3.5.0 client using a different format, which throws
DeserializationError.Fix
Remove the
-Uflag from_install_requirements_txtand_install_req_txt_in_conda_envin bothsagemaker-coreandsagemaker-train. The version constraint (>=3.2.0,<4.0.0) already guarantees a compatible version will be present; there is no reason to force an upgrade.Files changed
sagemaker-core/src/sagemaker/core/remote_function/runtime_environment/runtime_environment_manager.pysagemaker-train/src/sagemaker/train/remote_function/runtime_environment/runtime_environment_manager.pyTesting
Existing unit tests pass (
47 passedintest_runtime_environment_manager.pyfor sagemaker-core). No test changes needed since the tests mock_run_shell_cmdand verify it is called, which still holds.