Context
Tracking follow-up work surfaced by a third-party (Gemini) technical review of s3proxy. The review verdict is production-ready; the items below are evolution, not release blockers. Each is intended to ship as its own PR so review and bisect stay clean.
Source review: internal — S3Proxy Technical Review & Recommendations.
Proposed follow-up PRs (suggested order)
PR A — Sign S3 object metadata (DEK tag) — [security / medium]
Problem. An attacker with S3 write access (but no KEK) can swap ciphertexts between objects or tamper with the DEK metadata header. The proxy will then happily wrap/unwrap the wrong DEK with a valid KEK and return attacker-chosen plaintext for a victim key, or fail in confusing ways.
Sketch.
- HMAC-SHA256 over a canonical encoding of
(bucket, key, DEK ciphertext, KEK version tag, any other auth-relevant metadata).
- HMAC key derived from the KEK seed via HKDF with a fresh
info label (do not reuse the DEK-wrapping key).
- Store the tag in a new
x-amz-meta-<dektag>-mac header.
- Backward compat. Legacy objects have no MAC. Either: (a) accept-unsigned read mode behind a flag with a deprecation timeline, or (b) lazy rewrite on read. Needs an explicit migration plan; do not break existing deployments.
Acceptance. Tampered DEK or swapped ciphertext is rejected before any decryption attempt. Legacy unsigned objects keep reading under the compat flag. New writes always include the MAC.
PR B — Decouple HTTP routing from S3 protocol logic — [code-quality / medium]
Problem. router.go handles both HTTP routing and S3 semantics. This conflicts with the layered architecture documented in AGENTS.md.
Sketch.
- Extract S3 protocol logic into a
service (or usecase) package: GetObject, PutObject, HeadObject, list ops, multipart gate.
- Router becomes a thin HTTP→service mapper: parse request → call service → serialize response.
- Pure refactor; no behavior change. Tests should continue to pass without modification (or with mechanical import updates).
Acceptance. Zero behavior diff, lint clean, tests green. Sets the stage for PR E (streaming) by giving it a clean seam to plug into.
PR C — Redact upstream error details from client responses — [security / low]
Problem. Raw AWS XML responses or internal error chains may reach end-users, leaking backend topology, bucket names, request IDs, or library internals.
Sketch.
- Audit every error path that flows into an HTTP response.
- Split: structured
slog line gets the full error (with %+v / wrapped chain); HTTP response body gets a sanitized, generic message + a correlation ID.
- Re-use the existing tracing/request ID if present so operators can still link client report → log.
Acceptance. No internal error string leaks to the response body in production mode. Logs remain verbose. Snapshot tests cover the redaction layer.
PR D — Perf polish: buffer pool + idle connection tuning — [performance / medium]
Problem. Under high throughput the proxy allocates fresh []byte buffers per request and uses default http.Transport idle settings.
Sketch.
sync.Pool of []byte buffers sized for the dominant body class (e.g. MaxPutBodySize tier). Wire into the encrypt/decrypt path and any io.Copy staging buffer.
- Expose
MaxIdleConnsPerHost (and MaxIdleConns, IdleConnTimeout) as env-configurable settings on forwardHTTPClient. Pick sane defaults but do not bake them.
- Must-have. Benchmark before/after with a representative workload (concurrent PUT of N-MB objects). Numbers in the PR description or this PR does not land.
Acceptance. Measured drop in allocations / GC pause / p99 latency. No regression on small-object workload.
PR E (epic) — Streaming AEAD encryption — [performance / high]
Problem. Today the proxy buffers entire object bodies in RAM to encrypt/decrypt. RSS is bounded by object_size × concurrent_requests, which caps the workload before CPU does.
Sketch.
- Switch encryption path from one-shot AES-GCM-SIV to a streaming AEAD (e.g. Tink's
AES256_GCM_HKDF_4KB / _1MB streaming primitives).
- Process in constant-memory chunks (64 KiB – 1 MiB), wire directly between
r.Body and the upstream PUT request body.
- Format incompatibility. Streaming AEAD ciphertext layout is not interchangeable with AES-GCM-SIV. Existing objects cannot be read by the new path and vice-versa. Options: (1) dual-read fallback that tries SIV then streaming, (2) per-object version marker in the metadata header, (3) explicit one-shot migration job. Needs a design doc before any code lands.
- Depends on PR B for a clean service-layer seam.
- Depends on PR D for the baseline benchmarks that justify the cost.
Acceptance. Constant RAM per request regardless of object size. Existing objects remain readable under the chosen compat strategy. New writes use the streaming format. Bench numbers attached.
Out of scope for this issue
- TLS certificate lifecycle (cert-manager / Vault sidecar): operational / Helm chart concern. Tracked in deployment docs, not in proxy code.
Notes
- Order is suggested, not mandatory. PR A (security) and PR B (refactor) are independent and can be done in parallel.
- PR E (streaming) is the biggest payoff but also the biggest design effort — do not start until D's benchmarks confirm the RAM bottleneck on a real workload.
Context
Tracking follow-up work surfaced by a third-party (Gemini) technical review of
s3proxy. The review verdict is production-ready; the items below are evolution, not release blockers. Each is intended to ship as its own PR so review and bisect stay clean.Source review: internal — S3Proxy Technical Review & Recommendations.
Proposed follow-up PRs (suggested order)
PR A — Sign S3 object metadata (DEK tag) —
[security / medium]Problem. An attacker with S3 write access (but no KEK) can swap ciphertexts between objects or tamper with the DEK metadata header. The proxy will then happily wrap/unwrap the wrong DEK with a valid KEK and return attacker-chosen plaintext for a victim key, or fail in confusing ways.
Sketch.
(bucket, key, DEK ciphertext, KEK version tag, any other auth-relevant metadata).infolabel (do not reuse the DEK-wrapping key).x-amz-meta-<dektag>-macheader.Acceptance. Tampered DEK or swapped ciphertext is rejected before any decryption attempt. Legacy unsigned objects keep reading under the compat flag. New writes always include the MAC.
PR B — Decouple HTTP routing from S3 protocol logic —
[code-quality / medium]Problem.
router.gohandles both HTTP routing and S3 semantics. This conflicts with the layered architecture documented inAGENTS.md.Sketch.
service(orusecase) package:GetObject,PutObject,HeadObject, list ops, multipart gate.Acceptance. Zero behavior diff, lint clean, tests green. Sets the stage for PR E (streaming) by giving it a clean seam to plug into.
PR C — Redact upstream error details from client responses —
[security / low]Problem. Raw AWS XML responses or internal error chains may reach end-users, leaking backend topology, bucket names, request IDs, or library internals.
Sketch.
slogline gets the full error (with%+v/ wrapped chain); HTTP response body gets a sanitized, generic message + a correlation ID.Acceptance. No internal error string leaks to the response body in production mode. Logs remain verbose. Snapshot tests cover the redaction layer.
PR D — Perf polish: buffer pool + idle connection tuning —
[performance / medium]Problem. Under high throughput the proxy allocates fresh
[]bytebuffers per request and uses defaulthttp.Transportidle settings.Sketch.
sync.Poolof[]bytebuffers sized for the dominant body class (e.g.MaxPutBodySizetier). Wire into the encrypt/decrypt path and anyio.Copystaging buffer.MaxIdleConnsPerHost(andMaxIdleConns,IdleConnTimeout) as env-configurable settings onforwardHTTPClient. Pick sane defaults but do not bake them.Acceptance. Measured drop in allocations / GC pause / p99 latency. No regression on small-object workload.
PR E (epic) — Streaming AEAD encryption —
[performance / high]Problem. Today the proxy buffers entire object bodies in RAM to encrypt/decrypt. RSS is bounded by
object_size × concurrent_requests, which caps the workload before CPU does.Sketch.
AES256_GCM_HKDF_4KB/_1MBstreaming primitives).r.Bodyand the upstream PUT request body.Acceptance. Constant RAM per request regardless of object size. Existing objects remain readable under the chosen compat strategy. New writes use the streaming format. Bench numbers attached.
Out of scope for this issue
Notes