Fix PySparkProcessor V3 ProcessingInput construction by Evan-W-ang · Pull Request #5759 · aws/sagemaker-python-sdk

Evan-W-ang · 2026-04-15T08:32:47Z

Use V3-compatible ProcessingInput construction in PySparkProcessor.

PySparkProcessor still built internal ProcessingInput objects with the
legacy source/destination fields in _stage_configuration() and
_stage_submit_deps(). In V3, ProcessingInput now expects s3_input, so
those internal code paths can fail during pipeline definition or upsert
with validation errors.

This change updates both code paths to build ProcessingInput with
ProcessingS3Input while preserving the same staged S3 URIs and local
mount paths. It also adds regression tests covering configuration
staging and local dependency staging

Evan-W-ang · 2026-04-15T08:36:21Z

Summary

This PR updates PySparkProcessor to construct ProcessingInput using the
V3-compatible s3_input=ProcessingS3Input(...) shape instead of the legacy
source / destination fields.

Problem

In V3, sagemaker.core.processing.ProcessingInput no longer accepts:

source
destination

and instead expects V3 fields such as input_name and s3_input.

However, PySparkProcessor still used the legacy constructor internally in:

_stage_configuration()
_stage_submit_deps()

This can cause validation failures during pipeline definition / upsert.

Fix

This change:

replaces internal legacy ProcessingInput(...) construction with
V3-style ProcessingS3Input(...)
preserves the existing S3 staging behavior
preserves the existing local mount path behavior
avoids relying on legacy .destination access where an explicit local path is sufficient

Tests

Added regression tests covering:

_stage_configuration() building a V3-compatible ProcessingInput
_stage_submit_deps() building a V3-compatible ProcessingInput for local dependencies

Example failure before this change

ValidationError: 2 validation errors for ProcessingInput
source
  Extra inputs are not permitted
destination
  Extra inputs are not permitted

Motivation

Users migrating to V3 naturally update their own processing inputs/outputs to the new schema, but Spark processing can still fail because of internal legacy construction in 
PySparkProcessor. This patch makes that internal behavior consistent with the V3 processing models.


**Test command**
```bash
cd ~/sagemaker-python-sdk/sagemaker-core
. .venv/bin/activate
python -m pytest tests/unit/spark/test_processing.py tests/unit/test_processing.py -q

Files to include

sagemaker-core/src/sagemaker/core/spark/processing.py
sagemaker-core/tests/unit/spark/test_processing.py

Fix PySparkProcessor V3 ProcessingInput construction

96f4200

Evan-W-ang had a problem deploying to manual-approval April 15, 2026 08:32 — with GitHub Actions Error

Evan-W-ang had a problem deploying to manual-approval April 15, 2026 08:33 — with GitHub Actions Error

Evan-W-ang closed this Apr 15, 2026

Evan-W-ang reopened this Apr 15, 2026

Evan-W-ang requested a deployment to manual-approval April 15, 2026 08:36 — with GitHub Actions Waiting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PySparkProcessor V3 ProcessingInput construction#5759

Fix PySparkProcessor V3 ProcessingInput construction#5759
Evan-W-ang wants to merge 1 commit intoaws:masterfrom
Evan-W-ang:fix/pysparkprocessor-v3-processinginput

Evan-W-ang commented Apr 15, 2026

Uh oh!

Evan-W-ang commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Evan-W-ang commented Apr 15, 2026

Uh oh!

Evan-W-ang commented Apr 15, 2026

Summary

Problem

Fix

Tests

Example failure before this change

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant