Skip to content

Backport: abort leaked request Context in StatelessAuthenticationFilter to all remaining customers (propagate #1352 / dc57ffded8) #1353

Description

@jr-rk

1. Problem

On several customer branches a submitted item can silently end up orphaned: owning_collection = NULL and in_archive = false while its collection2item mapping and handle survive. The item then disappears from search and collection listings, has no breadcrumb, and "Edit Item" throws HTTP 500 (NPE in CanManageMappingsFeature, cf. #1348).

The behavioral fix already shipped to customer/zcu-data in #1352 (and previously to dtq-dev / customer/lindat as dc57ffded8 / #852). It is still missing from the remaining customer branches. This issue tracks propagating that fix to every customer instance that lacks it.

2. Root cause (hypothesis)

A Hibernate lost update via a leaked, thread-bound session on the request error path:

  1. A slow submission file upload (WorkspaceItemRestRepository.uploadwis.find loads the Item, long stream, then context.commit() at the end) holds the Item in its Hibernate L1 cache with pre-deposit values (owning_collection = null, in_archive = false).
  2. A concurrent deposit installs the item (sets owning_collection, in_archive = true) and deletes the workspace item the upload is working on.
  3. The upload throws (500 / ClientAbortException: Broken pipe) before reaching context.commit().
  4. DSpaceRequestContextFilter assigns its context local after chain.doFilter(...), so on an exception that assignment is skipped, context stays null, and its finally skips the abort — leaving the Hibernate session bound to the Tomcat worker thread (ThreadLocal), still holding the dirty, stale Item.
  5. A later request that reuses that thread commits → Hibernate flushes the leftover dirty Item as a full-row UPDATE (Item has no @DynamicUpdate/@Version), reverting owning_collection → NULL and in_archive → false. The collection2item join row and handle survive → the observed orphan.

Real incident: item 62133a8a-fbd1-474f-9b55-fa92d318aa03 (handle 20.500.14592/107) on a ZCU-DATA instance.

3. Acceptance criteria

  • Every customer branch that lacks the guard carries the StatelessAuthenticationFilter change: chain.doFilter(...) wrapped in try { … } finally { … }, and in the finally the request Context is read from req.getAttribute(ContextUtil.DSPACE_CONTEXT) and abort()ed when still valid.
  • On the normal success path the change is a no-op (request already committed → context no longer valid).
  • The behavioral file StatelessAuthenticationFilter.java on each patched branch is equivalent to the already-shipped version on dtq-dev / customer/zcu-data / customer/vsb-tuo.
  • One PR per customer branch, base = that branch, CI green, human sign-off.
  • No orphaning under the concurrent upload-vs-deposit repro on a patched instance (matches LINDAT behaviour).

4. Stack(s) & type

  • Stack: backend (DSpace, dspace-server-webapp).
  • Type: backport / propagation (bug fix, data-integrity).

5. Scope (in / out)

In scope

  • Port only the behavioral StatelessAuthenticationFilter hunk (the outer filter's try/finally abort of the leaked Context) to every customer branch missing it.

Out of scope

  • The diagnostic / CI files from the original dc57ffded8 (Context.java, Utils.java, HibernateDBConnection.java, log4j2.xml, build.yml) — they conflict on diverged branches and are not needed for the fix.
  • Defense-in-depth follow-ups (separate issues): fix the latent null-local bug in DSpaceRequestContextFilter (read the Context inside finally); add @DynamicUpdate/@Version to Item/DSpaceObject for a deterministic optimistic-lock guard; the read-path 500 guard (CanManageMappings, fix: NPE (HTTP 500) in CanManageMappingsFeature for item with null owning collection #1348).

6. Constraints

  • No DB migration — pure Java, no Flyway change.
  • API-compatible — no REST contract change; success path is a no-op.
  • Module boundary — change confined to dspace-server-webapp; relies only on Context, ContextUtil, and the existing log field, all already imported on the target branches.
  • Backport family by family; gate each PR on the green-light checklist (CI + human sign-off).

7. Backport context

  • Source PR: ZCU-DATA/fix: prevent orphaned items — abort leaked request Context (backport dc57ffded8/#852 to zcu-data) #1352 (ZCU-DATA/fix: prevent orphaned items — abort leaked request Context).

  • Source commit to cherry-pick: ea6116ebec1a82403b1bbe07be8cc5da38edaf8b (single-file port; cleaner to propagate than the multi-file dc57ffded8).

  • Original fix: dc57ffded8 / Transaction bug - close context in finally block (#845) #852 ("Transaction bug - close context in finally block").

  • Targeting: all customers from customers/registry.yml that lack the guard (do not hardcode). Current coverage audit:

    customer family be_branch status
    tul 7.5 customer/TUL missing → backport
    zcu-pub 7.6.1 customer/zcu-pub missing → backport
    sav 7.6.1 customer/sav missing → backport
    uk 7.6.1 customer/uk missing → backport (⚠ not in registry)
    mendelu 9.1 customer/mendelu missing → backport
    jcu 9.3 customer/jcu missing → backport
    zcu-data 7.6.1 customer/zcu-data present (ZCU-DATA/fix: prevent orphaned items — abort leaked request Context (backport dc57ffded8/#852 to zcu-data) #1352)
    vsb-tuo 7.6.5 customer/vsb-tuo present (identical guard)
    ufal 7.6.5 dtq-dev present (dc57ffded8)
    lindat customer/lindat present (dc57ffded8)
    palo-docker customer/palo-docker present

    Registry gap: customer/uk (live, DSpace 7.6.1, last commit 2025-09) is a real customer branch but is absent from customers/registry.yml. It is included as a target here; the registry should be updated so future fan-outs don't silently skip it (lindat and palo-docker are likewise unlisted but already carry the fix).

  • BE module impact: dspace-server-webapp only. Target region (chain.doFilter(req, res); closing the doFilterInternal method) is byte-identical across all 5 missing branches, so the cherry-pick applies with minimal/no conflict.

8. Verification plan

  • Automated: per branch — mvn checkstyle:check -pl dspace-server-webapp + module compile; full test suite runs in PR CI.
  • Content check: post-port StatelessAuthenticationFilter.java is equivalent to the shipped dtq-dev / zcu-data version.
  • Manual (on a test instance): run the repro — start a large-file submission upload, trigger the deposit concurrently. Before: item ends owning_collection = NULL / in_archive = false. After: item stays correct (matches LINDAT).

9. Risk & rollback

10. Related links

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions