Skip to content

[Port to dtq-dev] Issue ufal/clarin-dspace#1317 metadata health check#1307

Merged
milanmajchrak merged 7 commits into
dataquest-dev:dtq-devfrom
ufal:backport-1338-to-dtq-dev
Jun 26, 2026
Merged

[Port to dtq-dev] Issue ufal/clarin-dspace#1317 metadata health check#1307
milanmajchrak merged 7 commits into
dataquest-dev:dtq-devfrom
ufal:backport-1338-to-dtq-dev

Conversation

@kosarko

@kosarko kosarko commented Apr 17, 2026

Copy link
Copy Markdown

Port of ufal#1338 by @kuchtiak-ufal to dtq-dev.

Summary by CodeRabbit

  • New Features
    • Added a Metadata check health check that scans repository metadata and produces both plain-text and JSON reports with categorized error/warning totals and per-type breakdowns, including example messages (with automatic truncation).
    • Introduced configurable message matching rules to classify validation issues consistently.
    • Updated report diff support to include the metadata check’s errorCount and warningCount.
  • Chores
    • Enabled the Metadata check in the healthcheck module configuration.
  • Tests
    • Added integration tests verifying aggregated results and capped/truncated reporting behavior.

* Issue 1317: health-check for metadata - initial commit

* implement MetadataCheck report

* improve MetadataCheck

* improve MetadataCheck with more sophisticated selection of error/warning messages

* code cleanup

* rename qa-metadata-error-patterns.json to metadata-check-patterns.json

* implement copilot suggestions

* add test for Metadata check to HealthReportIT

* improved documentation

* added test

* improve documentation

* set default error dispersion quota to 5

(cherry picked from commit 39157a5)

@milanmajchrak milanmajchrak left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also failing integration tests.

Comment thread dspace-api/src/main/java/org/dspace/health/MetadataCheck.java
@kuchtiak-ufal

Copy link
Copy Markdown
Collaborator

Regarding "There are also failing integration tests".

There's a complex HealthReportIT test testing this check.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a MetadataCheck health report that runs metadataqa, classifies output with configurable patterns, aggregates counts and examples, updates report JSON mappings, and adds integration tests for normal and size-limited reporting.

Changes

Metadata Check Implementation and Integration

Layer / File(s) Summary
MetadataCheck class structure and pattern loading
dspace-api/src/main/java/org/dspace/health/MetadataCheck.java, dspace-api/src/main/resources/metadata-check-patterns.json
Introduces MetadataCheck with configuration constants and static pattern maps; loads error and warning patterns from metadata-check-patterns.json; adds helpers to build in-memory pattern maps and format issue labels.
MetadataCheck execution and message processing
dspace-api/src/main/java/org/dspace/health/MetadataCheck.java
Implements run(ReportInfo) to read config limits and dispersion quotas, execute metadataqa with a MetadataReporter, match lines to configured patterns, categorize as errors or warnings, aggregate counts, store example messages, and apply dispersion-based replacement when message-storage limits are exceeded.
Registration, report schema, and configuration
dspace/config/modules/healthcheck.cfg, dspace-api/src/main/resources/report-diff-fields.json
Adds Metadata check to healthcheck.checks and plugin.named.org.dspace.health.Check; extends report-diff-fields.json with errorCount and warningCount mappings and reorders licenses in the field order.
Integration tests and helper methods
dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java
Adds imports and two tests: testMetadataCheck validates counts and report JSON, and testMetadataCheckWithRestrictedReportSize validates truncation and dispersion behavior; adds helpers to extract item URIs, select the latest report, and find a named check in report JSON.

Sequence Diagram(s)

sequenceDiagram
  participant HealthReportIT
  participant MetadataCheck
  participant metadataqa curation
  participant MetadataReporter
  participant ReportInfo
  HealthReportIT->>MetadataCheck: run(ReportInfo)
  MetadataCheck->>metadataqa curation: execute metadataqa
  metadataqa curation->>MetadataReporter: append output lines
  MetadataReporter->>MetadataReporter: classify and count messages
  MetadataReporter-->>MetadataCheck: aggregated results
  MetadataCheck->>ReportInfo: set text and JSON report
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • milanmajchrak
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is far too brief and omits the required template sections and testing checkboxes. Add the required Problem description, Analysis, Problems, Manual Testing, and Copilot review sections, even if some are empty.
Docstring Coverage ⚠️ Warning Docstring coverage is 3.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and clearly points to the metadata health check port to dtq-dev.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (11)
dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java (5)

370-370: 💤 Low value

Add explicit private access modifier to helper method.

The method lacks an explicit access modifier, defaulting to package-private. For test helper methods that should not be accessed outside this class, use explicit private.

Suggested fix
-ReportResult findLastReportResult(List<ReportResult> reportResults) {
+private ReportResult findLastReportResult(List<ReportResult> reportResults) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` at line 370,
The helper method findLastReportResult currently has package-private visibility;
change its declaration to include the private modifier so it becomes private
ReportResult findLastReportResult(List<ReportResult> reportResults). Update the
method signature in HealthReportIT (method name: findLastReportResult) to be
explicitly private and ensure no external references rely on package visibility.

375-375: 💤 Low value

Add explicit private access modifier to helper method.

The method lacks an explicit access modifier, defaulting to package-private. For test helper methods that should not be accessed outside this class, use explicit private.

Suggested fix
-JsonNode findCheckByName(JsonNode root, String checkName) {
+private JsonNode findCheckByName(JsonNode root, String checkName) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` at line 375,
The helper method findCheckByName in HealthReportIT is declared with default
(package-private) visibility; make it explicitly private to restrict access to
this test class only by adding the private access modifier to the method
declaration (i.e., change "JsonNode findCheckByName(...)" to "private JsonNode
findCheckByName(...)" within the HealthReportIT class).

256-256: ⚡ Quick win

Add Javadoc for public test method.

Per coding guidelines, all public methods should have Javadoc comments. Consider adding a brief description explaining how this test validates report size restrictions via configuration.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` at line 256,
Add a Javadoc comment to the public test method in HealthReportIT that is
annotated with `@Test` (the test validating report size restrictions via
configuration). Place a brief Javadoc immediately above the method (above the
`@Test` annotation) describing the test purpose, e.g., that it verifies report
size limits are enforced based on configuration and what behavior is asserted;
reference the test’s name in the description if present to make intent clear.
Ensure the Javadoc follows project style (one-line summary and optionally a
short sentence) and is applied to the public test method in class
HealthReportIT.

Source: Coding guidelines


209-214: ⚡ Quick win

Consider more flexible assertion approach for formatted output.

The assertions use hardcoded whitespace patterns (" ".repeat(15)) that tightly couple tests to the exact output formatting. If MetadataCheck adjusts padding logic, these tests will fail even though the functionality remains correct.

Alternative assertion approach

Consider using regex patterns or extracting numeric values separately:

-assertThat(messages.get(0), containsString("dc.relation issues:  " + " ".repeat(15) + "2"));
+assertThat(messages.get(0), containsString("dc.relation issues:"));
+assertThat(messages.get(0), containsString("2"));

Or use a regex matcher to verify the count appears after the label without enforcing exact spacing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` around lines
209 - 214, The assertions in HealthReportIT are brittle due to hardcoded
spacing; update the containsString checks on messages.get(0) to use flexible
regex or numeric extraction instead of " ".repeat(15). For each label (e.g.,
"dc.relation issues", "dc.title issues", "dc.type issues", "Error count total",
"dc.subject issues", "Warning count total") replace the containsString
assertions with a regex-based assertion (e.g., pattern "dc\\.relation
issues:\\s+2") or extract the numeric suffix via a Pattern/Matcher and assert
the parsed integer equals the expected count; keep the checks against
messages.get(0) and the same labels (MetadataCheck-related labels) so only
spacing enforcement is removed.

154-154: ⚡ Quick win

Add Javadoc for public test method.

Per coding guidelines, all public methods should have Javadoc comments. Consider adding a brief description of what this test validates.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` at line 154,
Add a Javadoc comment to the public test method in the HealthReportIT class (the
method annotated with `@Test` at the shown location); the comment should be a
brief one-line description stating what the test validates (e.g., expected
behavior or assertion being checked) and can omit `@param/`@return since it’s a
test method; place the Javadoc immediately above the method declaration so it
documents that specific test.

Source: Coding guidelines

dspace-api/src/main/java/org/dspace/health/MetadataCheck.java (6)

461-465: 💤 Low value

Clarify sentinel value for highestFrequency initialization.

Line 463 initializes highestFrequency to Integer.MAX_VALUE, which acts as a sentinel meaning "not yet calculated". This is non-obvious and could confuse maintainers. A brief comment explaining that this sentinel value ensures the replacement logic recalculates the actual frequency on first use (lines 407-409) would improve clarity.

📝 Suggested comment
         StoredMessagesInfo() {
             this.count = 0;
+            // Sentinel value; actual frequency is computed lazily in replaceMessage
             this.highestFrequency = Integer.MAX_VALUE;
             storedMessages = new TreeMap<>();
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines
461 - 465, The constructor StoredMessagesInfo() sets highestFrequency to
Integer.MAX_VALUE as a sentinel meaning "not yet calculated", which is
non-obvious; update the constructor (StoredMessagesInfo) to add a short comment
next to the highestFrequency initialization explaining that Integer.MAX_VALUE is
used intentionally so the replacement logic in the class will trigger
recalculation on first use (i.e., it signals "uninitialized" and ensures the
algorithm that compares/replaces frequencies will compute the real highest
frequency when needed).

212-224: 💤 Low value

Simplify pattern collection with direct iteration.

The getPatterns method uses StreamSupport.stream(patterns.spliterator(), false) to iterate ArrayNode. Since ArrayNode implements Iterable<JsonNode>, you can iterate directly or use forEach without constructing a stream.

♻️ Simpler alternative
     private static Map<String, List<String>> getPatterns(JsonNode parentNode) {
         Map<String, List<String>> validationPatterns = new HashMap<>();
         parentNode.fieldNames().forEachRemaining(validationType -> {
             List<String> validationMessages = new ArrayList<>();
             ArrayNode patterns = parentNode.withArray(validationType);
-            StreamSupport.stream(patterns.spliterator(), false).forEach(message -> {
+            patterns.forEach(message -> {
                 validationMessages.add(message.asText());
             });
             validationPatterns.put(validationType, validationMessages);
         });
         return validationPatterns;
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines
212 - 224, The getPatterns method currently builds validationMessages by
streaming over the ArrayNode via StreamSupport; simplify this by iterating the
ArrayNode directly (it implements Iterable<JsonNode>) — inside getPatterns,
replace StreamSupport.stream(patterns.spliterator(), false).forEach(...) with a
direct for-each loop or patterns.forEach(node -> ...) that calls
validationMessages.add(node.asText()), keeping the rest of the logic
(validationPatterns.put(validationType, validationMessages)) unchanged to
preserve behavior.

21-31: ⚡ Quick win

Consider standardizing on a single JSON library.

The code uses both Jackson (ObjectMapper, JsonNode) for reading input at lines 203, 206-208 and org.json (JSONObject, JSONArray) for constructing output at lines 92, 174-190. Mixing JSON libraries increases the dependency surface and can lead to version conflicts or inconsistent behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines
21 - 31, The code mixes Jackson and org.json; standardize on Jackson in the
MetadataCheck class by replacing all org.json JSONObject/JSONArray usages (the
places constructing/reporting JSON output) with Jackson types
(ObjectNode/ArrayNode) created from the existing ObjectMapper, update the code
paths that currently call new JSONObject()/new JSONArray() to build
ObjectNode/ArrayNode trees, replace any JSONObject.toString() or similar with
ObjectMapper.writeValueAsString(), and remove org.json imports so only Jackson
imports remain (keep existing ObjectMapper/JsonNode/ArrayNode usage for parsing
and output).

65-71: ⚖️ Poor tradeoff

Static initialization failure will cause confusing class-loading errors.

If metadata-check-patterns.json is missing or malformed, the IOException caught at line 68 is wrapped in a RuntimeException that will fail during class initialization, potentially far from where the healthcheck is invoked. This makes debugging difficult because the stack trace will point to the static initializer rather than the health-report execution context.

Consider lazy-loading the patterns in run() or providing a more descriptive startup validation that fails fast with clear diagnostics.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines
65 - 71, The static initializer in MetadataCheck currently calls loadPatterns()
and throws a RuntimeException on IOException which causes class-loading
failures; change this to lazy-load or validate patterns inside the run() method
instead: remove (or guard) the static loadPatterns() call and ensure
loadPatterns() is invoked from run() (or another startup-validation method) with
proper try/catch that logs a clear, descriptive error and returns a sensible
health-check result instead of letting initialization fail; reference
MetadataCheck, its static initializer, loadPatterns(), and run() when making the
change.

407-407: 💤 Low value

Remove unnecessary Objects.requireNonNull wrapper.

Line 407 wraps getMessageWithHighestCount() in Objects.requireNonNull(), but the method at lines 431-437 uses .orElseThrow() which already throws NoSuchElementException and never returns null. The requireNonNull is redundant.

🧹 Proposed simplification
-            String messageKeyWithHighestFrequency = Objects.requireNonNull(getMessageWithHighestCount(storedMessages));
+            String messageKeyWithHighestFrequency = getMessageWithHighestCount(storedMessages);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` at line 407,
The assignment to messageKeyWithHighestFrequency unnecessarily wraps
getMessageWithHighestCount(storedMessages) with Objects.requireNonNull; remove
the requireNonNull wrapper and assign the result directly since
getMessageWithHighestCount uses orElseThrow() and cannot return null—update the
line that assigns messageKeyWithHighestFrequency to call
getMessageWithHighestCount(storedMessages) directly (referencing the variable
messageKeyWithHighestFrequency and the method getMessageWithHighestCount).

335-340: ⚡ Quick win

Pattern matching is substring-based, not regex, which may confuse maintainers.

The code treats ^ and $ as literal string anchors for prefix/suffix matching rather than regex metacharacters. While this works correctly, calling them "patterns" (line 42, 62-63, 206-208, 314) and storing them in variables like errorPatterns suggests regex matching, but the implementation uses startsWith, endsWith, and contains for simple substring matching.

Consider either:

  1. Renaming variables/documentation to clarify these are "message filters" or "substring rules" rather than "patterns"
  2. Using actual Pattern.compile() and Matcher for true regex support if future extensibility is needed
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines
335 - 340, The code in MetadataCheck uses variables like errorPatterns and a
local variable pattern but implements substring logic with
startsWith/endsWith/contains, which is misleading; rename errorPatterns to
messageFilters (and the local variable pattern to filter) and update inline
comments/docstrings to describe "substring filters" or "message rules"
(prefix/suffix/contains) instead of "patterns"; update all references to
errorPatterns, pattern, and any related method/field names in MetadataCheck to
the new names so maintainers understand this is simple substring matching
(alternatively, if you want true regex support, replace the
startsWith/endsWith/contains logic with Pattern.compile(filter) and
Matcher.matches() and update docs accordingly).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java`:
- Around line 337-341: Lines in MetadataCheck/MetadataReporter exceed the
120-char limit (notably the pattern-match conditional using messageKey/pattern
and the addMessage(...) calls and dispersion quota conditionals). Fix by
breaking long expressions into shorter named boolean/local variables (e.g.,
compute startsWithMatch, endsWithMatch, containsMatch from messageKey and
pattern) and by formatting addMessage(type, message, counts, storedMessagesInfo,
limit, dispersionQuota) with one argument per line or grouping to keep each line
<120 chars; do the same for the dispersion quota conditionals (assign
intermediate variables like isOverDispersion or hasQuota) to shorten the
conditional expressions. Ensure you update the code in the MetadataCheck class
where messageKey/pattern matching and addMessage calls occur and in the
dispersion quota checks so every logical fragment is on its own line and no line
exceeds 120 chars.
- Around line 78-79: Several lines in MetadataCheck exceed the 120-character
limit; split long statements and extract intermediate variables to shorten
lines. For the configuration reads (the assignment to maxErrorsToShow and the
other getIntProperty/getBooleanProperty calls), break the call across multiple
lines or assign configurationService.getIntProperty/getBooleanProperty(...) to a
clearly named local variable before using it so each line stays under 120 chars
(refer to the variable maxErrorsToShow and the other config variables in that
block). For the long Curator.curate(...) expression, pull each chained service
factory call into its own local variable (e.g., obtain the service factory, then
the curator, then call curate) so the final Curator.curate(...) invocation is
short; ensure string literals are wrapped or concatenated across lines if
needed. Keep identifiers unchanged and only restructure into multiple shorter
statements to comply with the 120-character rule.
- Around line 73-74: Add a Javadoc comment for the public method run in class
MetadataCheck: document the parameter ReportInfo ri (what it represents),
describe the returned String (what the report contains or format), and note any
exceptions or error conditions the method may throw or return values used to
indicate failure; place the Javadoc immediately above the public String
run(ReportInfo ri) declaration so it follows project guidelines for public
methods.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java`:
- Around line 266-267: The test in HealthReportIT mutates global config via
configurationService.setProperty("healthcheck.metadata.max-errors-to-show", 8)
and
configurationService.setProperty("healthcheck.metadata.error-dispersion-quota",
1) without restoring originals; capture the original values (e.g., using
configurationService.getProperty or getPropertyAsType) before setting, then
ensure they are restored in a finally block or an `@After` method so other tests
aren’t polluted; update the test method (or add a tearDown/@After method) to
restore the saved originals (handling null/missing keys appropriately) after the
assertions complete.

---

Nitpick comments:
In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java`:
- Around line 461-465: The constructor StoredMessagesInfo() sets
highestFrequency to Integer.MAX_VALUE as a sentinel meaning "not yet
calculated", which is non-obvious; update the constructor (StoredMessagesInfo)
to add a short comment next to the highestFrequency initialization explaining
that Integer.MAX_VALUE is used intentionally so the replacement logic in the
class will trigger recalculation on first use (i.e., it signals "uninitialized"
and ensures the algorithm that compares/replaces frequencies will compute the
real highest frequency when needed).
- Around line 212-224: The getPatterns method currently builds
validationMessages by streaming over the ArrayNode via StreamSupport; simplify
this by iterating the ArrayNode directly (it implements Iterable<JsonNode>) —
inside getPatterns, replace StreamSupport.stream(patterns.spliterator(),
false).forEach(...) with a direct for-each loop or patterns.forEach(node -> ...)
that calls validationMessages.add(node.asText()), keeping the rest of the logic
(validationPatterns.put(validationType, validationMessages)) unchanged to
preserve behavior.
- Around line 21-31: The code mixes Jackson and org.json; standardize on Jackson
in the MetadataCheck class by replacing all org.json JSONObject/JSONArray usages
(the places constructing/reporting JSON output) with Jackson types
(ObjectNode/ArrayNode) created from the existing ObjectMapper, update the code
paths that currently call new JSONObject()/new JSONArray() to build
ObjectNode/ArrayNode trees, replace any JSONObject.toString() or similar with
ObjectMapper.writeValueAsString(), and remove org.json imports so only Jackson
imports remain (keep existing ObjectMapper/JsonNode/ArrayNode usage for parsing
and output).
- Around line 65-71: The static initializer in MetadataCheck currently calls
loadPatterns() and throws a RuntimeException on IOException which causes
class-loading failures; change this to lazy-load or validate patterns inside the
run() method instead: remove (or guard) the static loadPatterns() call and
ensure loadPatterns() is invoked from run() (or another startup-validation
method) with proper try/catch that logs a clear, descriptive error and returns a
sensible health-check result instead of letting initialization fail; reference
MetadataCheck, its static initializer, loadPatterns(), and run() when making the
change.
- Line 407: The assignment to messageKeyWithHighestFrequency unnecessarily wraps
getMessageWithHighestCount(storedMessages) with Objects.requireNonNull; remove
the requireNonNull wrapper and assign the result directly since
getMessageWithHighestCount uses orElseThrow() and cannot return null—update the
line that assigns messageKeyWithHighestFrequency to call
getMessageWithHighestCount(storedMessages) directly (referencing the variable
messageKeyWithHighestFrequency and the method getMessageWithHighestCount).
- Around line 335-340: The code in MetadataCheck uses variables like
errorPatterns and a local variable pattern but implements substring logic with
startsWith/endsWith/contains, which is misleading; rename errorPatterns to
messageFilters (and the local variable pattern to filter) and update inline
comments/docstrings to describe "substring filters" or "message rules"
(prefix/suffix/contains) instead of "patterns"; update all references to
errorPatterns, pattern, and any related method/field names in MetadataCheck to
the new names so maintainers understand this is simple substring matching
(alternatively, if you want true regex support, replace the
startsWith/endsWith/contains logic with Pattern.compile(filter) and
Matcher.matches() and update docs accordingly).

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java`:
- Line 370: The helper method findLastReportResult currently has package-private
visibility; change its declaration to include the private modifier so it becomes
private ReportResult findLastReportResult(List<ReportResult> reportResults).
Update the method signature in HealthReportIT (method name:
findLastReportResult) to be explicitly private and ensure no external references
rely on package visibility.
- Line 375: The helper method findCheckByName in HealthReportIT is declared with
default (package-private) visibility; make it explicitly private to restrict
access to this test class only by adding the private access modifier to the
method declaration (i.e., change "JsonNode findCheckByName(...)" to "private
JsonNode findCheckByName(...)" within the HealthReportIT class).
- Line 256: Add a Javadoc comment to the public test method in HealthReportIT
that is annotated with `@Test` (the test validating report size restrictions via
configuration). Place a brief Javadoc immediately above the method (above the
`@Test` annotation) describing the test purpose, e.g., that it verifies report
size limits are enforced based on configuration and what behavior is asserted;
reference the test’s name in the description if present to make intent clear.
Ensure the Javadoc follows project style (one-line summary and optionally a
short sentence) and is applied to the public test method in class
HealthReportIT.
- Around line 209-214: The assertions in HealthReportIT are brittle due to
hardcoded spacing; update the containsString checks on messages.get(0) to use
flexible regex or numeric extraction instead of " ".repeat(15). For each label
(e.g., "dc.relation issues", "dc.title issues", "dc.type issues", "Error count
total", "dc.subject issues", "Warning count total") replace the containsString
assertions with a regex-based assertion (e.g., pattern "dc\\.relation
issues:\\s+2") or extract the numeric suffix via a Pattern/Matcher and assert
the parsed integer equals the expected count; keep the checks against
messages.get(0) and the same labels (MetadataCheck-related labels) so only
spacing enforcement is removed.
- Line 154: Add a Javadoc comment to the public test method in the
HealthReportIT class (the method annotated with `@Test` at the shown location);
the comment should be a brief one-line description stating what the test
validates (e.g., expected behavior or assertion being checked) and can omit
`@param/`@return since it’s a test method; place the Javadoc immediately above the
method declaration so it documents that specific test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2c06009b-3f0d-4601-8de5-14c743daa6dd

📥 Commits

Reviewing files that changed from the base of the PR and between 82085cb and 24d3dbb.

📒 Files selected for processing (5)
  • dspace-api/src/main/java/org/dspace/health/MetadataCheck.java
  • dspace-api/src/main/resources/metadata-check-patterns.json
  • dspace-api/src/main/resources/report-diff-fields.json
  • dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java
  • dspace/config/modules/healthcheck.cfg

Comment thread dspace-api/src/main/java/org/dspace/health/MetadataCheck.java
Comment thread dspace-api/src/main/java/org/dspace/health/MetadataCheck.java
Comment thread dspace-api/src/main/java/org/dspace/health/MetadataCheck.java
Comment thread dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java (1)

361-362: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Configuration cleanup fix is incomplete—restore original values in try/finally.

While the cleanup at the end of the test addresses the normal execution path, two issues remain:

  1. Not protected by try/finally: If an assertion or exception occurs before line 361, the configuration properties remain modified, polluting subsequent tests.
  2. Setting to null vs. restoring original values: The properties are set to null rather than restored to their original values. This may not properly restore defaults and could interfere with other tests that expect specific default values (typically max-errors-to-show=100 and error-dispersion-quota=5).
🔒 Recommended fix: capture original values and restore in finally block
 public void testMetadataCheckWithRestrictedReportSize() throws Exception {
-    // set max-errors-to-show to 8 and error-dispersion-quota to 1,
-    // This test has 14 errors in total, but the report will contain only 8 error messages.
-    // The errors with low frequency will be prioritized.
-    // The error-dispersion-quota set to 1 means that the number of errors shown
-    // for each error will be almost the same, in this case maximally 2 errors for each error type
     context.turnOffAuthorisationSystem();
 
     ConfigurationService configurationService = DSpaceServicesFactory.getInstance().getConfigurationService();
-    configurationService.setProperty("healthcheck.metadata.max-errors-to-show", 8);
-    configurationService.setProperty("healthcheck.metadata.error-dispersion-quota", 1);
+    int originalMaxErrors = configurationService.getIntProperty("healthcheck.metadata.max-errors-to-show", 100);
+    int originalQuota = configurationService.getIntProperty("healthcheck.metadata.error-dispersion-quota", 5);
+    
+    try {
+        // set max-errors-to-show to 8 and error-dispersion-quota to 1,
+        // This test has 14 errors in total, but the report will contain only 8 error messages.
+        // The errors with low frequency will be prioritized.
+        // The error-dispersion-quota set to 1 means that the number of errors shown
+        // for each error will be almost the same, in this case maximally 2 errors for each error type
+        configurationService.setProperty("healthcheck.metadata.max-errors-to-show", 8);
+        configurationService.setProperty("healthcheck.metadata.error-dispersion-quota", 1);
 
-    Community community = CommunityBuilder.createCommunity(context)
+        Community community = CommunityBuilder.createCommunity(context)
                 .withName("Community")
                 .build();
 
-    Collection collection = CollectionBuilder.createCollection(context, community)
+        Collection collection = CollectionBuilder.createCollection(context, community)
                 .withName("Collection")
                 .withSubmitterGroup(eperson)
                 .build();
 
-    Item item1 = ItemBuilder.createItem(context, collection)
+        Item item1 = ItemBuilder.createItem(context, collection)
                 .withTitle("Test item 1")
                 .withType("corpus")
                 .withSubject("Test subject")
                 .withMetadata("local", "branding", null, "Community")
                 .build();
 
-    Item item2 = ItemBuilder.createItem(context, collection)
+        Item item2 = ItemBuilder.createItem(context, collection)
                 .withTitle("Test item 2")
                 .withType("toolService")
                 .withSubject("Test subject")
                 .withMetadata("local", "branding", null, "Community")
                 .withMetadata("dc", "relation", "replaces", findItemUri(item1))
                 .build();
 
-    ItemBuilder.createItem(context, collection)
+        ItemBuilder.createItem(context, collection)
                 .withTitle("Test item 3")
                 .withType("toolService")
                 .withSubject("Test subject")
                 .withMetadata("local", "branding", null, "Community")
                 .withMetadata("dc", "relation", "isreplacedby", findItemUri(item2))
                 .build();
 
-    // create 4 items with missing title
-    for (int i = 0; i < 4; i++) {
-        ItemBuilder.createItem(context, collection)
-                .withType("toolService")
-                .withSubject("Test subject")
-                .withMetadata("local", "branding", null, "Community")
-                .build();
-    }
+        // create 4 items with missing title
+        for (int i = 0; i < 4; i++) {
+            ItemBuilder.createItem(context, collection)
+                    .withType("toolService")
+                    .withSubject("Test subject")
+                    .withMetadata("local", "branding", null, "Community")
+                    .build();
+        }
 
-    // create 4 items with missing type
-    for (int i = 4; i < 8; i++) {
-        ItemBuilder.createItem(context, collection)
-                .withTitle("Test Item " + i)
-                .withSubject("Test subject")
-                .withMetadata("local", "branding", null, "Community")
-                .build();
-    }
+        // create 4 items with missing type
+        for (int i = 4; i < 8; i++) {
+            ItemBuilder.createItem(context, collection)
+                    .withTitle("Test Item " + i)
+                    .withSubject("Test subject")
+                    .withMetadata("local", "branding", null, "Community")
+                    .build();
+        }
 
-    // create 4 items with duplicate type
-    for (int i = 8; i < 12; i++) {
-        ItemBuilder.createItem(context, collection)
-                .withTitle("Test Item " + i)
-                .withType("toolService")
-                .withType("corpus")
-                .withSubject("Test subject")
-                .withMetadata("local", "branding", null, "Community")
-                .build();
-    }
+        // create 4 items with duplicate type
+        for (int i = 8; i < 12; i++) {
+            ItemBuilder.createItem(context, collection)
+                    .withTitle("Test Item " + i)
+                    .withType("toolService")
+                    .withType("corpus")
+                    .withSubject("Test subject")
+                    .withMetadata("local", "branding", null, "Community")
+                    .build();
+        }
 
-    TestDSpaceRunnableHandler testDSpaceRunnableHandler = new TestDSpaceRunnableHandler();
+        TestDSpaceRunnableHandler testDSpaceRunnableHandler = new TestDSpaceRunnableHandler();
 
-    // with "health-report -c 5", only Metadata check is running
-    String[] args = new String[]{"health-report", "-c", "5"};
-    ScriptLauncher.handleScript(args, ScriptLauncher.getConfig(kernelImpl), testDSpaceRunnableHandler, kernelImpl);
+        // with "health-report -c 5", only Metadata check is running
+        String[] args = new String[]{"health-report", "-c", "5"};
+        ScriptLauncher.handleScript(args, ScriptLauncher.getConfig(kernelImpl), testDSpaceRunnableHandler, kernelImpl);
 
-    assertThat(testDSpaceRunnableHandler.getErrorMessages(), empty());
-    List<String> messages = testDSpaceRunnableHandler.getInfoMessages();
+        assertThat(testDSpaceRunnableHandler.getErrorMessages(), empty());
+        List<String> messages = testDSpaceRunnableHandler.getInfoMessages();
 
-    assertThat(messages, hasSize(1));
-    assertThat(messages.get(0), containsString("dc.relation issues:  " + " ".repeat(15) + "2"));
-    assertThat(messages.get(0), containsString("dc.title issues:     " + " ".repeat(15) + "4"));
-    assertThat(messages.get(0), containsString("dc.type issues:      " + " ".repeat(15) + "4"));
-    assertThat(messages.get(0), containsString("duplicate value issues:" + " ".repeat(13) + "4"));
-    assertThat(messages.get(0), containsString("Error count total:   " + " ".repeat(14) + "14"));
+        assertThat(messages, hasSize(1));
+        assertThat(messages.get(0), containsString("dc.relation issues:  " + " ".repeat(15) + "2"));
+        assertThat(messages.get(0), containsString("dc.title issues:     " + " ".repeat(15) + "4"));
+        assertThat(messages.get(0), containsString("dc.type issues:      " + " ".repeat(15) + "4"));
+        assertThat(messages.get(0), containsString("duplicate value issues:" + " ".repeat(13) + "4"));
+        assertThat(messages.get(0), containsString("Error count total:   " + " ".repeat(14) + "14"));
 
-    assertThat(messages.get(0), containsString("Errors:"));
+        assertThat(messages.get(0), containsString("Errors:"));
 
-    // check if dc.type error is present exactly 2 times
-    assertThat(StringUtils.countMatches(messages.get(0), "Does not have dc.type metadata"), is(2));
-    // check if dc.title error is present exactly 2 times
-    assertThat(StringUtils.countMatches(messages.get(0), "Item has no dc.title metadata"), is(2));
-    // check id duplicate value error is present exactly 2 times
-    assertThat(StringUtils.countMatches(messages.get(0), "value [dc.type] is present multiple times"), is(2));
+        // check if dc.type error is present exactly 2 times
+        assertThat(StringUtils.countMatches(messages.get(0), "Does not have dc.type metadata"), is(2));
+        // check if dc.title error is present exactly 2 times
+        assertThat(StringUtils.countMatches(messages.get(0), "Item has no dc.title metadata"), is(2));
+        // check id duplicate value error is present exactly 2 times
+        assertThat(StringUtils.countMatches(messages.get(0), "value [dc.type] is present multiple times"), is(2));
 
-    // check if all dc.relation errors are present
-    assertThat(StringUtils.countMatches(messages.get(0), "does not refer back via dc.relation.replaces"), is(1));
-    assertThat(
-            StringUtils.countMatches(messages.get(0), "does not refer back via dc.relation.isreplacedby"), is(1));
-    assertThat(messages.get(0), containsString("and more..."));
-
-    configurationService.setProperty("healthcheck.metadata.max-errors-to-show", null);
-    configurationService.setProperty("healthcheck.metadata.error-dispersion-quota", null);
+        // check if all dc.relation errors are present
+        assertThat(StringUtils.countMatches(messages.get(0), "does not refer back via dc.relation.replaces"), is(1));
+        assertThat(
+                StringUtils.countMatches(messages.get(0), "does not refer back via dc.relation.isreplacedby"), is(1));
+        assertThat(messages.get(0), containsString("and more..."));
+    } finally {
+        configurationService.setProperty("healthcheck.metadata.max-errors-to-show", originalMaxErrors);
+        configurationService.setProperty("healthcheck.metadata.error-dispersion-quota", originalQuota);
+    }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` around lines
361 - 362, Capture the original values of the two properties before modifying
them (use configurationService.getProperty or equivalent to read
"healthcheck.metadata.max-errors-to-show" and
"healthcheck.metadata.error-dispersion-quota"), then wrap the test logic that
mutates them in a try/finally and restore the saved originals in the finally
block (call configurationService.setProperty with the original values; if
originals were null/absent, remove/reset appropriately). Update references
around configurationService.setProperty(...) in HealthReportIT so the properties
are restored to their prior values regardless of test failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java`:
- Around line 361-362: Capture the original values of the two properties before
modifying them (use configurationService.getProperty or equivalent to read
"healthcheck.metadata.max-errors-to-show" and
"healthcheck.metadata.error-dispersion-quota"), then wrap the test logic that
mutates them in a try/finally and restore the saved originals in the finally
block (call configurationService.setProperty with the original values; if
originals were null/absent, remove/reset appropriately). Update references
around configurationService.setProperty(...) in HealthReportIT so the properties
are restored to their prior values regardless of test failure.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ebebc4a2-be81-4b68-af63-72821b98391d

📥 Commits

Reviewing files that changed from the base of the PR and between 24d3dbb and 16eb768.

📒 Files selected for processing (1)
  • dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java

@milanmajchrak milanmajchrak requested a review from Kasinhou June 24, 2026 10:13

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java (1)

511-513: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Handle missing checks defensively in findCheckByName.

If checks is absent/non-array, the current loop can fail with NPE instead of returning null.

Suggested fix
     JsonNode findCheckByName(JsonNode root, String checkName) {
-        for (JsonNode check : root.get("checks")) {
+        JsonNode checksNode = root.get("checks");
+        if (checksNode == null || !checksNode.isArray()) {
+            return null;
+        }
+        for (JsonNode check : checksNode) {
             if (check.get("name").asText().equals(checkName)) {
                 return check;
             }
         }
         return null;
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` around lines
511 - 513, Handle missing or non-array checks defensively in findCheckByName:
the loop currently assumes root.get("checks") is always present and iterable,
which can throw an NPE. Update the JsonNode traversal in findCheckByName to
first verify that the checks node exists and is an array before iterating, and
return null when it is absent, null, or not an array.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java`:
- Around line 268-269: Add Javadoc to the public test method
testStoredArgsContainAllCheckOptions in HealthReportIT so it complies with the
guideline requiring documentation for all public methods; place the comment
directly above the method and keep it concise, describing that the test verifies
the stored arguments include all check options.
- Around line 276-283: Avoid indexing allReports before confirming it has at
least one element, since ReportResultService.findAll(context) may return an
empty list and allReports.get(allReports.size() - 1) would throw
IndexOutOfBoundsException. In HealthReportIT, add an explicit assertion that
allReports is not empty before sorting and selecting latest, and keep the
existing latest report checks against ReportResult#getArgs and
handler.getErrorMessages() after that guard.

---

Outside diff comments:
In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java`:
- Around line 511-513: Handle missing or non-array checks defensively in
findCheckByName: the loop currently assumes root.get("checks") is always present
and iterable, which can throw an NPE. Update the JsonNode traversal in
findCheckByName to first verify that the checks node exists and is an array
before iterating, and return null when it is absent, null, or not an array.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 50c0c733-195d-416a-beb3-e311b8f8f625

📥 Commits

Reviewing files that changed from the base of the PR and between 16eb768 and b2c1f41.

📒 Files selected for processing (2)
  • dspace-api/src/main/resources/report-diff-fields.json
  • dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java (1)

511-513: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Handle missing checks defensively in findCheckByName.

If checks is absent/non-array, the current loop can fail with NPE instead of returning null.

Suggested fix
     JsonNode findCheckByName(JsonNode root, String checkName) {
-        for (JsonNode check : root.get("checks")) {
+        JsonNode checksNode = root.get("checks");
+        if (checksNode == null || !checksNode.isArray()) {
+            return null;
+        }
+        for (JsonNode check : checksNode) {
             if (check.get("name").asText().equals(checkName)) {
                 return check;
             }
         }
         return null;
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` around lines
511 - 513, Handle missing or non-array checks defensively in findCheckByName:
the loop currently assumes root.get("checks") is always present and iterable,
which can throw an NPE. Update the JsonNode traversal in findCheckByName to
first verify that the checks node exists and is an array before iterating, and
return null when it is absent, null, or not an array.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java`:
- Around line 268-269: Add Javadoc to the public test method
testStoredArgsContainAllCheckOptions in HealthReportIT so it complies with the
guideline requiring documentation for all public methods; place the comment
directly above the method and keep it concise, describing that the test verifies
the stored arguments include all check options.
- Around line 276-283: Avoid indexing allReports before confirming it has at
least one element, since ReportResultService.findAll(context) may return an
empty list and allReports.get(allReports.size() - 1) would throw
IndexOutOfBoundsException. In HealthReportIT, add an explicit assertion that
allReports is not empty before sorting and selecting latest, and keep the
existing latest report checks against ReportResult#getArgs and
handler.getErrorMessages() after that guard.

---

Outside diff comments:
In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java`:
- Around line 511-513: Handle missing or non-array checks defensively in
findCheckByName: the loop currently assumes root.get("checks") is always present
and iterable, which can throw an NPE. Update the JsonNode traversal in
findCheckByName to first verify that the checks node exists and is an array
before iterating, and return null when it is absent, null, or not an array.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 50c0c733-195d-416a-beb3-e311b8f8f625

📥 Commits

Reviewing files that changed from the base of the PR and between 16eb768 and b2c1f41.

📒 Files selected for processing (2)
  • dspace-api/src/main/resources/report-diff-fields.json
  • dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java
🛑 Comments failed to post (2)
dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java (2)

268-269: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Add Javadoc for this public test method.

testStoredArgsContainAllCheckOptions is public and currently undocumented.

As per coding guidelines: "Provide Javadoc comments for all public classes and methods."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` around lines
268 - 269, Add Javadoc to the public test method
testStoredArgsContainAllCheckOptions in HealthReportIT so it complies with the
guideline requiring documentation for all public methods; place the comment
directly above the method and keep it concise, describing that the test verifies
the stored arguments include all check options.

Source: Coding guidelines


276-283: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Avoid indexing allReports before proving it is non-empty.

allReports.get(allReports.size() - 1) can throw IndexOutOfBoundsException and hide the real test failure cause.

Suggested fix
-        context.reloadEntity(eperson);
-        List<ReportResult> allReports = reportResultService.findAll(context);
-        // findAll() does not guarantee ordering; sort by lastModified so the newest report is last.
-        allReports.sort(java.util.Comparator.comparing(ReportResult::getLastModified));
-        ReportResult latest = allReports.get(allReports.size() - 1);
-
         assertThat(handler.getErrorMessages(), empty());
+        context.reloadEntity(eperson);
+        List<ReportResult> allReports = reportResultService.findAll(context);
+        assertThat("Expected at least one report result", allReports.isEmpty(), org.hamcrest.Matchers.is(false));
+        ReportResult latest = findLastReportResult(allReports);
         assertThat(latest.getArgs(), containsString("-c: 2"));
         assertThat(latest.getArgs(), containsString("-c: 3"));
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` around lines
276 - 283, Avoid indexing allReports before confirming it has at least one
element, since ReportResultService.findAll(context) may return an empty list and
allReports.get(allReports.size() - 1) would throw IndexOutOfBoundsException. In
HealthReportIT, add an explicit assertion that allReports is not empty before
sorting and selecting latest, and keep the existing latest report checks against
ReportResult#getArgs and handler.getErrorMessages() after that guard.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ports the “metadata health check” feature to dtq-dev by introducing a new MetadataCheck healthcheck plugin which runs the existing metadataqa curation task repository-wide and reports aggregated error/warning counts in both text and JSON (for persistence and diffing).

Changes:

  • Enable a new “Metadata check” in the healthcheck configuration and plugin registry.
  • Add MetadataCheck implementation + a JSON resource defining message classification patterns.
  • Extend health report diff field mappings and add integration tests covering JSON output and truncated/dispersion behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
dspace/config/modules/healthcheck.cfg Registers and enables the new “Metadata check” in the healthcheck checks list.
dspace-api/src/main/java/org/dspace/health/MetadataCheck.java Implements the new healthcheck which runs metadataqa, aggregates counts, and produces text/JSON reports with truncation/dispersion logic.
dspace-api/src/main/resources/metadata-check-patterns.json Defines patterns used to categorize metadataqa messages into error/warning types.
dspace-api/src/main/resources/report-diff-fields.json Adds metadata error/warning totals to report diff field mappings and ordering.
dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java Adds integration tests verifying metadata check output, JSON persistence, and restricted report-size behavior.

Comment thread dspace-api/src/main/java/org/dspace/health/MetadataCheck.java
Comment thread dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java
Comment thread dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java Outdated
Comment thread dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java
Comment thread dspace-api/src/main/resources/metadata-check-patterns.json

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java`:
- Around line 391-392: The metadata summary logic in MetadataCheck should guard
against non-positive report limits before attempting replacement. Update the
path around getMessageWithHighestCount() / replaceMessage() so that when
healthcheck.metadata.max-errors-to-show or max-warnings-to-show is 0 or
negative, it skips replacement and handles the first message directly instead of
querying an empty storedMessages map. Use the existing MetadataCheck methods and
storedMessages flow to keep the fix localized.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b3a26365-fa8e-4fe1-89c9-9a115c49e6b4

📥 Commits

Reviewing files that changed from the base of the PR and between b2c1f41 and 6008d6f.

📒 Files selected for processing (3)
  • dspace-api/src/main/java/org/dspace/health/MetadataCheck.java
  • dspace-api/src/main/resources/metadata-check-patterns.json
  • dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java
✅ Files skipped from review due to trivial changes (1)
  • dspace-api/src/main/resources/metadata-check-patterns.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java

Comment thread dspace-api/src/main/java/org/dspace/health/MetadataCheck.java
@milanmajchrak milanmajchrak merged commit 6e0ed0c into dataquest-dev:dtq-dev Jun 26, 2026
17 checks passed
milanmajchrak added a commit that referenced this pull request Jul 2, 2026
* UFAL/Fixed failing integration test (ufal#1332) (#1249)

* Add debug messages to fauling test

(cherry picked from commit 4cc3694)

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>

* [Port to dtq-dev] Fix OpenAIRE integration: null handling and HTTP client lifecycle (#1248)

* Fix OpenAIRE integration: null handling and HTTP client lifecycle (ufal#1330)

* Add test for OpenAIRE connector

* Initial plan

* Add null check for OpenAIRE response to prevent NullPointerException

Co-authored-by: kosarko <1842385+kosarko@users.noreply.github.com>

* Fix HTTP client lifecycle to prevent premature connection closure

Co-authored-by: kosarko <1842385+kosarko@users.noreply.github.com>

* Keep the try with resources but copy the response

into an in memory stream and return that

* license:check

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: kosarko <1842385+kosarko@users.noreply.github.com>
(cherry picked from commit 02984db)

* Handle NumberFormatException in OpenAIREFundingDataProvider.getNumberOfResults and use explicit UTF-8 charset in OpenAIRERestConnectorTest

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: kosarko <1842385+kosarko@users.noreply.github.com>
Co-authored-by: milanmajchrak <milan.majchrak@dataquest.sk>

* UFAL/Added a comment to do not forget mounting the file which is changed via ocnfiguration feature (#1247)

* UFAL/Issue 1315: Store file preview to database when file preview is created on Item Page load. (ufal#1316) (#1241)

* Issue ufal/clarin-dspace1315: Store file preview to database when file preview is created on item page load

* assert text improvement

* PR comments: commit context only when any of the file preview is successfully created

* change variable name

(cherry picked from commit aab626b)

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>

* UFAL/Issue 1313: fixed error when file preview is not generated for bitstream with store_number = 77 (ufal#1318) (#1240)

* Issue ufal#1313: fixed error when file preview is not generated for bitstream with store number = 77

* resolve MR comments

(cherry picked from commit 04d64f7)

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>

* UFAL/Nw version metadata issues (#1236)

* Issue ufal#1266: dc.date.available and dc.relation.replaces metadata not cleared properly (ufal#1307)

* Issue ufal#1266: dc.date.available and dc.relation.replaces metadata not cleaned properly in new item version

* resolve MR comments - update ignoredMetadataFields in versioning-service.xml

* update ClarinVersionedHandleIdentifierProviderIT test to check dc.identifier.uri metadata for new version

(cherry picked from commit 7ffaf9a)

* Issue 1319: do not copy dc.identifier.doi metadata when new item version is created

(cherry picked from commit 1b7ed17)

---------

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>

* UFAL/Fix: add bitstream download-by-handle endpoint for curl instructions (#1252)

* fix: add bitstream download-by-handle endpoint for curl instructions

Adds GET /api/core/bitstreams/handle/{prefix}/{suffix}/{filename} endpoint
that directly serves bitstream content by item handle and filename.

This resolves the issue where curl download instructions generated by the
UI produced URLs pointing to non-existent backend endpoints, resulting in
404 errors when users attempted to download files via command line.

The new endpoint resolves the handle to an Item, finds the bitstream by
exact filename in ORIGINAL bundles, and streams the raw content with
correct Content-Type and Content-Disposition headers.

Refs: dataquest-dev/dspace-angular#1210

* Fixed compliing errors

* Small refactoring - use constants and removed unnecessary changes

* added comments, return 404 status instead of 402

* unauthorized instead of forbidden

* fix: use RFC 5987 Content-Disposition for non-ASCII filenames

curl -J on Windows cannot create files with non-ASCII characters (e.g.
diacritics like e/a) from a raw UTF-8 Content-Disposition filename header.

Uses filename*=UTF-8''percent-encoded-name (RFC 5987/6266) which curl
properly decodes. Also includes an ASCII fallback in filename param.

* fix: move context.complete() after streaming to prevent truncated downloads

context.complete() was called before bitstreamService.retrieve(), closing
the DB connection and causing 'end of response with X bytes missing' errors.
Now context.complete() is called only after the full content has been streamed.
For S3 redirect and HEAD paths, context.complete() remains before return
since no streaming is needed.

* fix: use real UTF-8 filename in Content-Disposition instead of ASCII fallback

The filename parameter now contains the original name (with diacritics like
e/a) instead of replacing non-ASCII chars with underscores. Characters in
the ISO-8859-1 range are transmitted correctly by Tomcat and understood by
curl on Western/Central-European systems. The filename* parameter still
provides RFC 5987 percent-encoded UTF-8 for modern clients (curl 7.56+).

* fix: revert to ASCII fallback in Content-Disposition, add edge-case tests

Content-Disposition filename parameter now uses ASCII fallback (non-ASCII
replaced with underscore) per RFC 6266. Modern clients use filename* (RFC
5987) which has the full UTF-8 name. The curl command no longer relies on
Content-Disposition at all (uses -o instead of -OJ).

New integration tests for edge cases:
- Multiple dots in filename (archive.v2.1.tar.gz)
- Double quotes in filename (escaped in Content-Disposition)
- CJK characters (beyond ISO-8859-1)
- Same filename in ORIGINAL and TEXT bundles (only ORIGINAL served)

* fix: resolve compilation errors and fix IT test assertions

- Remove duplicate HttpStatus import (apache vs spring)
- Add missing MediaType import (spring)
- Fix Content-Type assertion to include charset=UTF-8
- Use URI.create() for pre-encoded URLs in tests to prevent
  double-encoding (%25) rejection by StrictHttpFirewall

All 15 integration tests pass.

* test: add complex filename test (diacritics, plus, hash, unmatched paren)

New IT test for filename 'Media (+)#9) ano' verifying correct URL decoding,
Content-Disposition encoding, and content delivery. 16/16 tests pass.

* fix authorization, comments, tests

* fix: change expected status from 401 to 403 for authenticated non-admin user

The test downloadBitstreamByHandleUnauthorizedForNonAdmin uses getClient(token)
which means the user IS authenticated. The controller correctly returns 403
(Forbidden) for authenticated users without access, not 401 (Unauthorized).
401 is only for anonymous/unauthenticated requests.

---------

Co-authored-by: Paurikova2 <michaela.paurikova@dataquest.sk>

* Reduce warn logs noise (#1268)

* Log 404 responses at DEBUG instead of WARN to reduce log noise

* Log 404 responses at DEBUG instead of WARN (configurable via logging.server.debug-404)

* Skip stack trace extraction for suppressed 404 debug logs

* Replace custom debug-404 property with dedicated Log4j2 logger (org.dspace.app.rest.NotFound)

* Suppress 404 warn logs via dedicated Log4j2 logger (org.dspace.app.rest.NotFound)

* Turn off that warn logs for the dspace.log

* Updated log name to be more unique

* The row lenght was updated to be less than 120 chars (#1274)

* Reduce noisy WARN logs to DEBUG level (#1269)

Changed two frequently occurring WARN log messages to DEBUG level:
- Context.java: 'Initializing a context while an active transaction exists'
- ClarinItemServiceImpl.java: 'Cannot update item dates metadata because the approximate date is empty'

* Added oai bundle exclude feature

* Updated docs

* Fix OAI bundle exclusion docs and isolate OAIPMHBundleExposureIT config state

* Fix indentation in OAIPMHBundleExposureIT field declaration

* Updated docs

* fix failing Curation tests (ufal#1353) (#1304)

* fix RequiredMetadataIT failure

* different fix for failing curator tests

* change response to see last bitstream format results

* cleaning custom bitstream format creation in PreviewContentServiceImplIT test

* add debug messages

* IIIFCacheEventConsumer: don't consume events when event subject is null

(cherry picked from commit 50db8cd)

**NOTE**: This is without the `dspace-api/src/test/java/org/dspace/curate/ItemMetadataQACheckerIT.java` change
will add that one into #1237

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>

* UFAL/Remove 'clariah' submission process (#1305)

Removed the 'clariah' submission process it is not used

(cherry picked from commit cae96d5)

* UFAL/Issue 1349: admin user is not allowed to delete himself/herself (ufal#1350) (#1306)

* Issue 1349: admin user is not allowed to delete himself/herself

* improve the fix: test context.getCurrentUser() for null

* throw IllegalStateException rather than AuthorizeException, and allow client to see the error message

(cherry picked from commit b913627)

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>

* Issue 1354: add dc.relation.isreplacedby only when item is installed (ufal#1356) (#1308)

* Issue 1354: add dc.relation.isreplacedby only when item is installed

* set dc.relation.replaces on new item creation

* fixed JavaDoc

* fixed failing tests

* test if dc.relation.isreplacedby metadata are only added when new item version is installed

* use Context#reloadEntity rather than calling find method

* removing unused field

* also removing the now unused import

---------


(cherry picked from commit 3e204e5)

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>

* UFAL/Issue 1339: fixed NPE when hidden item metadata are checked for the item with deleted submitter (ufal#1344) (#1290)

* Issue 1339: fixed NPE when hidden item metadata are checked for the item with deleted submitter

(cherry picked from commit 64fda50)

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>

* Issue 1321: disable File preview for files where user has no Bitstream READ permission (ufal#1327) (#1280)

* Issue 1321: disable File preview for files where the user has no Bitstream READ permission

* alow file preview in case only the License agreement is needed

* don't allow to create file preview for non-authorized user, nor for item that requires license confirmation

* fixed failing FilePreviewIT test. Now only the user with file READ permission can generate file preview

* add more tests for HTML file preview

* add test for HTML File preview

* improve warning messages

* extend test to see if non admin user can see already generated file preview

---------


(cherry picked from commit 50d7bbc)

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>

* UFAL/issue 1324: curation task - implement 3 types of reporters to allow proper report writing to selected destination (ufal#1326) (#1264)

* issue 1324: implement 3 types of reporters to allow proper writing to file or to console

* don't throw exception when not needed

* no AbstractUnitTest required - AbstractDSpaceTest is sufficient

* improve append(str) method in reporters, by using StringUtils.chomp() method

(cherry picked from commit 6f19d1f)

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>

* Issue ufal#1292 link/unlink items with version relationship (ufal#1304) (#1253)

* Issue 1292: script to allow link two items into version relationship

* implement link and unlink actions

* ItemVersionLinkerIT test

* improve ItemVersionLinkerIT, fix ScriptRestRepositoryIT

* improve test to be more realistic

* improve options description

* better call of itemService.clearMetadata()

* clear correctly dc.relation.replaces and dc.relation.isreplacedby

* use dc.identifier.uri metadata value rather than item.getHandle() to set dc.relation.replaces and dc.relation.isreplacedby

* code-cleanup

* add also as a cli script

* Use Item.ANY instead of null

tested with the production db dump on the items mentions in the issue
(11234/1-5537). It was returning:
```
The script has started
Item '11234/1-5537' has no handle assigned.
```

because it's dc.identifier.uri.*

---------



(cherry picked from commit 903b35a)

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>

* UFAL/[Port to dtq-dev] Port ItemMetadataQAChecker curation task from v5 to v7 (#1237)

* Port ItemMetadataQAChecker curation task from v5 to v7 (ufal#1312)

* Add ItemMetadataQAChecker curation task with tests

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: kosarko <1842385+kosarko@users.noreply.github.com>
(cherry picked from commit d5517be)

* Issue ufal#1310 curation task to check relation metadata (ufal#1325)

* issue 1310: check versioning releationship for items with relation metadata

* improve implementation + test

* More readable, I think.

* update logging

add the handle of the referenced item where possible

* a test case to cover "no related item"

* improve the failure message

---------

Co-authored-by: Ondřej Košarko <ko_ok@centrum.cz>
(cherry picked from commit c97f406)

* fix failing Curation tests (ufal#1353)

* fix RequiredMetadataIT failure

* different fix for failing curator tests

* change response to see last bitstream format results

* cleaning custom bitstream format creation in PreviewContentServiceImplIT test

* resolve MR comments

* add debug messages

* more debug messages

* IIIFCacheEventConsumer: don't consume events when event subject is null

(cherry picked from commit 50db8cd)

**NOTE**: this is just `dspace-api/src/test/java/org/dspace/curate/ItemMetadataQACheckerIT.java` the rest is in #1304

* removed empty line

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: kosarko <1842385+kosarko@users.noreply.github.com>
Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>

* UFAL/Separate CLARIN license payload from sections.license (#1319)

* fix(submission): separate CLARIN license payload from sections.license

* fix(clarin-license): rename step path to /select, rename DataClarinLicense to ClarinDataLicense, and clean up comments

* fix(clarin-license): align DTO name with Rest suffix convention

* fix(submission): align CLARIN section DTO naming and apply Copilot review fixes

* Align CLARIN license patch semantics and tighten section path handling

* Fix checkstyle issue

* Stabilize unknown CLARIN license metadata assertion

* Added doc and checked null value

* Harden CLARIN license patch handling and logging

* UFAL/Fix clarin-license IT: expect 422 for empty value on select patch (#1323)

* Expect 422 for empty value on clarin-license select patch

* Treat blank value as clear on clarin-license section select

* UFAL/Fix refbox buttons (ufal#1367) (#1318)

* adding test and fixing the issue

fixes ufal#1366 there was a conflict between the
produces=application/json and response.setContentType("application/xml")

* Strengthen citations endpoint test assertions for ufal#1366 regression coverage

Agent-Logs-Url: https://github.com/ufal/clarin-dspace/sessions/e385ef9e-8186-4899-b811-fc82bbfa942b



---------



(cherry picked from commit 8400982)

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: kosarko <1842385+kosarko@users.noreply.github.com>

* Health report, report diff fixes (#1254)

* fix(health-report): fix CLI args, multi-check support, and report-diff comparison logic

* Fix ReportDiff setup and date validation

* fix failed integration test

* added tests, used -c 1 2 instead of -c 1 -c 2

* used UNLIMITED_VALUES unstead of MAX_VALUE

* improved doc

* used multilist for -c , removed unused method

* improved doc

* WIP updated health report and report diff

* Complete update of health-report and report-diff

* Sorting reports and updating tests, plus enable multiple -c in report-diff

* Improved docs, output and info and logic

* Improved comparision of reports w/o changes, or with only one report specified

* Updated tests

* Removed unused Report and refactor getChecks

* Address review comments: simplify null checks, deduplicate skipped-checks section, validate report IDs before defaulting

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Matus Kasak <matus.kasak@dataquest.sk>
Co-authored-by: Paurikova2 <michaela.paurikova@dataquest.sk>
Co-authored-by: milanmajchrak <milan.majchrak@dataquest.sk>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

* Fix flaky tests in IT pipeline (#1321)

* Fixed integration tests because they use to fail sometimes

* test: stabilize flaky CI tests (Hibernate cleanup retry, Shibboleth auth sequence reset, ORCID assertion hardening)

* test: fix flaky ITs at the source (live ORCID, Shibboleth config-reload) + Hibernate CME diagnostics

ORCID CachingOrcidRestConnectorTest no longer hits the live ORCID sandbox:
search/getLabel/search_fail mock the HTTP layer (httpGet made protected) with a
canned expanded-search response, so they are deterministic instead of asserting
against fluctuating sandbox data.

Shibboleth WWW-Authenticate flakiness: add a test-only config-definition.xml with
config-reload=false. Runtime setProperty(...AuthenticationMethod...) overrides were
silently discarded whenever the auto-reload listener rebuilt the combined config
(restoring clarin-dspace.cfg's [Password, ClarinShib] default), intermittently
leaking 'password realm' into the header. Verified: with auto-reload off the override
survives; the explicit reloadConfig() reset in @after still works.

Hibernate ConcurrentModificationException in @after cleanup: the per-session JDBC
ResourceRegistry is not thread-safe, so the CME means two threads touch one Session.
Capture a full thread dump on CME (target/cme-dumps/) to identify the colliding
thread in CI; keep a resilient retry so an already-passed test isn't failed by this
teardown race. (Context.finalize() ruled out: sessions are thread-local.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test: revert IT-env config-reload=false override

Disabling config auto-reload globally in the test environment broke
AuthorizeConfigIT.testReloadConfiguration, which deliberately verifies that
AuthorizeConfiguration picks up live changes written to local.cfg via the
auto-reload mechanism. Auto-reload is a tested feature here, so it must not be
disabled to work around the Shibboleth WWW-Authenticate flakiness.

The Shibboleth flakiness (runtime setProperty override discarded when the combined
config is rebuilt) needs a reload-safe fix in the auth test instead; tracked
separately.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test: make Shibboleth auth-sequence override reload-safe (fix WWW-Authenticate flakiness)

The flaky 'password realm' leak in AuthenticationRestControllerIT had this root cause:
configurationService.setProperty(plugin.sequence...AuthenticationMethod, ...) only
updates the in-memory view of the combined configuration. That view is discarded
whenever it is rebuilt, and the auto-reload listener rebuilds it as soon as any
reloadable cfg file's mtime changes mid-run (e.g. another test writing local.cfg).
When that rebuild lands between the override and the request, clarin-dspace.cfg's
default [PasswordAuthentication, ClarinShibAuthentication] returns and 'password realm'
leaks into the header. The previous clear-then-set helper did not help (it is
equivalent to a plain setProperty).

Fix: set the sequence via a JVM system property (highest-precedence override layer,
re-read on every rebuild) + reloadConfig(), and clear it in @after. This survives
auto-reload without disabling it (so AuthorizeConfigIT, which verifies auto-reload,
still passes).

Verified in the real /api/authn/status endpoint: an explicit reloadConfig() after a
setProperty override reproduces the leak, while the system-property approach keeps the
header Shibboleth-only across rebuilds. Full AuthenticationRestControllerIT (43 tests)
passes, and running it alongside ClarinAuthenticationRestControllerIT /
AnonymousAdditionalAuthorizationFilterIT confirms the property does not leak across classes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test: add Hibernate concurrency monitor + CI upload to pinpoint @after CME

The intermittent ConcurrentModificationException in @after cleanup is a genuine
cross-thread data race on Hibernate's per-session, non-thread-safe JDBC
ResourceRegistry (xref): a second thread mutates the test thread's session while
it commits/rolls back. Verified against hibernate-core-5.6.15 sources that the
releaseResources forEach lambda never touches xref, so single-thread re-entrancy
is impossible (this disproves the earlier HHH-15116 single-thread theory). The
window is microseconds, so it does not reproduce locally even with deliberate
cross-thread session sharing; it only surfaces under CI load.

A live thread dump of a running IT JVM shows NO legitimate background thread ever
touches Hibernate (all are Solr/HTTP/Jetty/JVM). So the culprit is a transient
thread, and any non-test thread caught inside Hibernate JDBC/session code is by
definition the offender.

- HibernateConcurrencyMonitor: JVM-wide background sampler that records (de-duped)
  any non-test thread found inside org.hibernate.{resource.jdbc,engine.jdbc,
  internal.SessionImpl}; flushed to target/cme-dumps/ on CME and at JVM shutdown.
  Pure observer, never changes test behaviour.
- AbstractIntegrationTestWithDatabase: start the monitor and mark the JUnit thread
  in setUp; flush it alongside the existing thread dump on a captured CME.
- build.yml: always-upload **/target/cme-dumps/** (not gated on failure) so a
  successful cleanup retry no longer hides the diagnostic.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: don't close iterate() Hibernate stream from a finalize() (root cause of flaky CME)

Root cause of the intermittent ConcurrentModificationException in @after integration-test
cleanup, identified via the HibernateConcurrencyMonitor CI dumps: the GC Finalizer thread,
running org.dspace.core.AbstractHibernateDAO$1.finalize(), closed the Hibernate Stream
returned by AbstractHibernateDAO.iterate(). Closing a stream closes its ScrollableResults,
which mutates the owning Session's per-session, non-thread-safe JDBC ResourceRegistry (xref)
- but on the Finalizer thread, concurrently with the thread that owns the session. When that
collided with the owning thread's commit/rollback (releaseResources -> xref.forEach), it threw
ConcurrentModificationException. The CI dumps showed this exact finalizer stack as the only
non-test thread inside Hibernate in dspace-api, and present in dspace-server-webapp too.

This was confirmed genuine cross-thread access (not the previously assumed single-thread/HHH
bug): verified against hibernate-core-5.6.15 sources that the releaseResources forEach lambda
never touches xref, so single-thread re-entrancy is impossible.

Fix: close the backing stream on the owning thread when iteration is exhausted, and remove the
finalize() override. An iterator abandoned before exhaustion is released safely when its
Context/Session is closed (releaseResources then runs on the owning thread).

Adds AbstractHibernateDAOIteratorIT to guard against reintroducing a stream-closing finalizer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix: remove broken Context.finalize() that leaked finalizer-thread sessions

Context.finalize() ran on the GC Finalizer thread and called
dbConnection.isTransActionAlive()/abort(), which resolve sessionFactory.getCurrentSession()
to a brand-new session bound to the Finalizer thread - never the (now-unreachable) thread
that opened the Context. So it could not roll back the Context's transaction anyway; it only
opened and leaked a throwaway Hibernate session on the Finalizer thread, and threw
IllegalStateException once the SessionFactory was closed (seen in the CI thread dumps used to
diagnose the flaky integration-test ConcurrentModificationException).

Abandoned Contexts are cleaned up safely when their owning thread's session ends; callers
already close Contexts via complete()/abort()/try-with-resources (Context is AutoCloseable).

Removes the now-redundant ContextTest.testFinalize (close()/abort() are covered by
testClose/testAbort/testAbort2).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test: remove flaky-CME diagnostic scaffolding and teardown retry (root cause fixed)

The intermittent @after ConcurrentModificationException is now fixed at its source
(AbstractHibernateDAO.iterate no longer closes its Hibernate stream from a finalizer; broken
Context.finalize() removed). The temporary diagnostics that pinpointed it are no longer needed:

- Restore AbstractIntegrationTestWithDatabase.destroy() to its plain form (drop the 3x cleanup
  retry and the per-CME thread dump) and remove the HibernateConcurrencyMonitor wiring.
- Delete HibernateConcurrencyMonitor.
- Revert the build.yml always-upload of target/cme-dumps.

CI keeps -Dfailsafe.rerunFailingTestsCount=2 as the generic flaky-test safety net.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* revert: keep Context.finalize() (out of scope, not the CME cause)

Reverts the Context.finalize() removal (and the ContextTest.testFinalize deletion). The flaky
@after ConcurrentModificationException is fully fixed by the AbstractHibernateDAO.iterate()
change alone; Context.finalize() runs on a single GC Finalizer thread against its own
finalizer-thread session and provably cannot cause that cross-thread xref race. Removing a
finalizer from this core, widely-used class is a riskier change that does not belong in a
flaky-test fix, so leave Context untouched. The (pre-existing, harmless) finalizer-thread
session it opens can be addressed separately if desired.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test: address review comments on flaky-test fix

- AbstractHibernateDAOIteratorIT: add Javadoc to the test method and walk the iterator's full
  class hierarchy (up to Object) when asserting no finalize() override, so a finalizer
  reintroduced on a superclass/helper is also caught (per CodeRabbit review).
- AuthenticationRestControllerIT: wrap an over-length (122 char) Javadoc line.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

* Security patches from vanilla DSpace 7.6.7 (CVE-2026-49830, CVE-2026-49831) (#1340)

* ORE aggregated resource URI validation

(cherry picked from commit 7ac17f6)

* Velocity and template safety for Email and LDN messages

* Safer Velocity configuration
* New "message.templates.allowed-config" config
* Remove "UnmodifiableConfiguration" in favour of a
  simple Map of whitelisted Config keys/values
* Centralise Velocity config in core Utils
* Small javadoc changes

(cherry picked from commit b2d6141)
(cherry picked from commit 5b31db5)

* Better null checking in allowed config props

(cherry picked from commit 6b66531)
(cherry picked from commit 46a0dfb)

* Access configurationService at runtime, not rely on class setup

(cherry picked from commit 5803819)
(cherry picked from commit 4be430f)

* Remove strict mode Velocity engine configuration (allow nulls)

(cherry picked from commit 655fc62)

* Filter requests for JSPs or traversal

(cherry picked from commit cf9be85)
(cherry picked from commit dc3e455)

* Add additional logging to GlobalRequestSecurityFilter

(cherry picked from commit 295a046)
(cherry picked from commit 0b1deae)

* Fix import order

(cherry picked from commit e2e6a79)
(cherry picked from commit 2e40077)

* Update sitemap traversal test expectations

(cherry picked from commit 56ae287)
(cherry picked from commit 1a3dfd7)

* Backport GlobalRequestSecurityFilter for javax

(cherry picked from commit 8a2eee9)

* Add secure file access methods

(cherry picked from commit 22bec44)

* Backport Curation I/O using secure file access

Removes some JDK >= 16 usage

(cherry picked from commit 55905a2)

* Curation config support for allowed base paths

(cherry picked from commit 4502224)

* Move curation -r reporter param to CLI only

(cherry picked from commit 277af82)

* Fix import order

(cherry picked from commit a757221)

* Ignore CurationScriptIT -T taskFile tests, to rewrite w/ CLI

(cherry picked from commit 6437472)
(cherry picked from commit 37cd6eb)

* Move taskfile -T option to CLI script config only

(cherry picked from commit 00e4979)
(cherry picked from commit 27708ea)

* UFAL/Allow lr.help.mail in email template config allowlist

The 7.6.7 Velocity hardening restricts templates to an allowlisted
"config" map (Utils.getAllowedTemplateConfig). UFAL/CLARIN templates
(clarin_download_link_admin, clarin_token, matomo_report,
share_submission) reference config.get('lr.help.mail'), which vanilla
DSpace does not ship, so it was missing from the allowlist and those
emails would render a null help address. Add it to dspace.cfg only;
Utils.java stays identical to upstream.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Kim Shepherd <kim@shepherd.nz>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

* Autolabel for new issues (#1341)

Co-authored-by: Matus Kasak <matus.kasak@dataquest.sk>

* [Port to dtq-dev] Issue 1364: tgz file preview fix (#1338)

* Issue 1364: tgz file preview fix (ufal#1372)

* Issue 1364: tgz file preview fix

* resolve MR comments + fixed test

* Update help message for force preview option

* get rid of the password requirement on FilePreview script

---------

Co-authored-by: Ondřej Košarko <kosarko@ufal.mff.cuni.cz>
(cherry picked from commit 00a2a37)

* PR Comments

---------

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>

* [Port to dtq-dev] Issue 1343: add PUT and DELETE endpoint methods to ClarinLicenseLabel REST repository (#1325)

* Issue 1343: add PUT and DELETE endpoint methods to ClarinLicenseLabel REST repository (ufal#1357)

* Issue 1343: add PUT and DELETE endpoint methods to ClarinLicenseLabelRest repository

* resolve Copilot comments

* fixed PUT request in ClarinLicenseLabelRestRepository

* added check for Clarin License Label -> Label string to be shorter that 5 characters

* implement coorrect put method in ClarinLicenseLabelRestRepository

* add constraints to license_label table: made label UNIQUE, make license_label not deletable when used in clarin licenses

* change order of deleting objects in test cleanup(): delete license objects before license_label objects (to satisfy license_label constraints)

* resolve PR Copilot comments, fixed failing ClarinWorkspaceItemRestRepositoryIT

* prevent creating duplicate Clarin License Labels in REST API

* not necessary to trim label twice

* minor fixes, suggested by Copilot

* Rename SQL migration files to use today's date (2026.06.01)

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
(cherry picked from commit b041c90)

* Fix test compilation: update createClarinLicense call sites for new label arg

The backport added a `label` String parameter to the createClarinLicense
test helper but left six 4-arg call sites unchanged, breaking testCompile
in dspace-server-webapp. Pass a label at each remaining call site
("lbl"; "lbl1"/"lbl2" for the paired-license test) to match the helper's
new signature, preserving the previously hard-coded "lbl" behaviour.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* PR comments: code cleanup

---------

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

* AI-Skills/Wire private AI skills submodule (.dspace-skills) (#1343)

* test: de-flake ORCID cache tests and ZIP-download IT (#1344)

* test: de-flake ORCID cache tests and ZIP-download IT

Two independent flaky tests keep turning the dtq-dev pipeline red after #1321:

1. CachingOrcidRestConnectorTest.testCachable / testCacheableWithError still
   hit the live ORCID sandbox (#1321 only mocked getLabel/search/search_fail).
   They use the real Spring @Cacheable bean (to exercise the CGLIB caching
   proxy), so they could not be spied. Point the bean's apiURL at a local
   MockWebServer serving the canned orcid-expanded-search.xml instead -- keeps
   the real caching proxy and HTTP transport under test, removes the network
   dependency. No production change.

2. MetadataBitstreamControllerIT.downloadAllZip compared the response to a
   locally-built ZIP byte-for-byte; a ZIP entry's DOS timestamp (2s resolution)
   defaults to "now" on both sides and differs across a 2s boundary. Assert the
   unzipped entry name + content instead of raw bytes.

Test-only changes. Verified locally (ORCID class 8/8 green, repeated offline
runs; webapp test module compiles; checkstyle clean).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test: assert exactly one ZIP entry in downloadAllZip

Address CodeRabbit: a Map keyed by entry name could mask duplicate ZIP entries
(same filename overwrites). Track the entry count separately and assert it is 1,
so an unexpected extra entry fails the test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test: de-flake SSR authorization and Solr TopCountries ITs

Two more intermittent failures on dtq-dev, both addressed at the root cause
(test-only changes):

1. AuthorizationRestRepositoryIT.findByObjectSSRTest flaked with 400 instead of
   200. The test sets dspace.server.ssr.url (used by Utils.getBaseObjectRestFromUri
   to resolve the request URI) and the AlwaysThrowExceptionFeature.turnoff flag via
   configurationService.setProperty(...). Such in-memory overrides are silently
   dropped when the combined config is rebuilt by the auto-reload listener (fires on
   any reloadable cfg file mtime change mid-run). When ssr.url is dropped the URI no
   longer resolves -> 400; if turnoff were also dropped the /search/object path would
   let alwaysexception throw -> 500. Fix: set both via JVM system properties (+
   reloadConfig) so they sit in the highest-precedence override layer and survive
   auto-reload, cleared in @after. Same pattern as #1321's Shibboleth fix
   (AuthenticationRestControllerIT#setAuthenticationMethodSequence). Applied to all
   three SSR tests.

2. StatisticsRestRepositoryIT.topCountriesReport_Community_Visited flaked with an
   empty report (points: []). postView() commits with waitSearcher=false, so the
   just-posted view events can be invisible to the immediately-following report query
   (and a dropped solr-statistics.autoCommit override would skip the commit entirely).
   Fix: after posting the view events, force solrLoggerService.commit() (waitSearcher
   =true) so the events are flushed and visible before the report is queried.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

* [Backport dtq-dev] Issue 1360: allow to change license in workflow item, in PATCH operation (ufal#1365) (#1327)

* Issue 1360: allow to change license in workflow item, in PATCH operation (ufal#1365)

* Issue 1360: allow to change license in workflow item, in PATCH operation

* improve fix + Integration test

* resolve Copilot comments

* resolve more MR comments

* return 404 in case PATCH request references non existing license

* improve error message for task claimed by other user

* resolve second round of Copilot comments

* small code refactoring

* fixed failing test

(cherry picked from commit 40672c3)

* resolve PR comments

---------

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>

* [Port to dtq-dev] Issue ufal#1317 metadata health check (#1307)

* Issue ufal#1317 metadata health check (ufal#1338)

* Issue 1317: health-check for metadata - initial commit

* implement MetadataCheck report

* improve MetadataCheck

* improve MetadataCheck with more sophisticated selection of error/warning messages

* code cleanup

* rename qa-metadata-error-patterns.json to metadata-check-patterns.json

* implement copilot suggestions

* add test for Metadata check to HealthReportIT

* improved documentation

* added test

* improve documentation

* set default error dispersion quota to 5

(cherry picked from commit 39157a5)

* add JavaDoc description

* resolve coderabbitai comment

* update report-diff-fields.json

* resolve Copilot comments

---------

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>

* test: de-flake ItemHandleCheckerIT (mock the live handle resolver) (#1346)

* test: de-flake ItemHandleCheckerIT (mock the live handle resolver)

ItemHandleCheckerIT.testItemHandleNotFound intermittently failed on dtq-dev with
a synthetic 617 (SocketTimeoutException) instead of 404. The checkhandles task
(ItemHandleChecker) issues a HEAD request to each item's handle URL
(handle.canonical.prefix + handle) with a 3s read timeout; the test pointed the
prefix at the live http://hdl.handle.net/ and asserted the resolver returns 404
(non-existent handle) and 302->200 (a real handle). When the live resolver was
slow/unreachable the HEAD timed out -> 617 -> red pipeline. Same class of flake
as the ORCID tests; not a regression (the merge commit passed many other runs).

Fix (test-only): serve the handle URLs from a local okhttp3 MockWebServer. setUp
sets handle.canonical.prefix to the mock base URL; a path-based dispatcher returns
302 (+Location to a -target path) for the redirect handle, 200 for the target, and
404 otherwise. The real/invalid/ignored URLs are derived from the mock base instead
of hard-coded hdl.handle.net literals; the server is closed in @after. All six tests
keep their original assertions and behavior; only the live-network hop is removed.

Verified locally: 6/6 green across 4 runs, no external network involved.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test: restore handle.canonical.prefix in ItemHandleCheckerIT teardown

Address CodeRabbit: setUp overwrites the shared handle.canonical.prefix with the
per-run mock-server URL. Capture the previous value and restore it in destroy()
(after closing the mock server) so a later test in the same JVM is never left
pointing at the now-closed localhost port. (The superclass destroy() reloadConfig()
already resets it, but restoring explicitly makes the test self-contained.)

Re-verified locally: 6/6 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

* UFAL/Obtain special groups from user context when new token is generated (on token refresh)  (ufal#1378) (#1347)

* Issue 1373: obtain special groups from user context when new token is generated (on token refresh)

* resolve Copilot comments

* resolve Copilot Comments: compute special groups only when when user is authenticated

* Remove HttpSession dependency from ClarinShibAuthentication

Use request-scoped attributes for shib.authenticated instead of
HttpSession/JSESSIONID, aligning with upstream ShibAuthentication.
Follow-up to ufal#1373/ufal#1378.

* Guard against null special groups in Context.getSpecialGroups

A special-group UUID may reference a Group that has since been deleted;
GroupService.find returns null in that case. The list was built with an
unconditional add, so it could contain null elements, which caused an NPE
downstream (e.g. SpecialGroupClaimProvider.getValue maps group.getID()
while generating the JWT sg claim on token refresh). Filter nulls once
here so every caller is covered. Follow-up to ufal#1373/ufal#1378.

---------


(cherry picked from commit 4c294b2)

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>

* UFAL/fix: DOI Organizer creates duplicate dc.identifier.doi metadata (ufal#1368) (#1350)

* Issue 1361 fix: DOI Organizer creates duplicate dc.identifier.doi metadata

* copilot comments

* Issue 1361: make DOI metadata save additive, report duplicates via QA

saveDOIToObject now adds dc.identifier.doi only when that exact value is
not already present, and no longer deletes metadata. The previous fix
cleared existing values when more than one was found or when a different
DOI was present; since this method runs after the DOI has already been
registered with the external agency, silently dropping a (possibly
legacy/citable) identifier is lossy and irreversible. The operation stays
idempotent, so re-registration no longer creates duplicate values.

Items that legitimately end up with more than one dc.identifier.doi value
are now surfaced for manual review by adding dc.identifier.doi to the
ItemMetadataQAChecker noDuplicate list (metadataqa curation task) rather
than being cleaned up silently in the write path.

Tests: flip the replace test to assert a pre-existing different DOI is
preserved alongside the new one, keep the idempotency test, and add a QA
checker IT asserting an item with two DOIs fails curation.

* local field renaming

---------


(cherry picked from commit 3b7db4c)

Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>

---------

Co-authored-by: Ondřej Košarko <ko_ok@centrum.cz>
Co-authored-by: Milan Kuchtiak <kuchtiak@ufal.mff.cuni.cz>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: kosarko <1842385+kosarko@users.noreply.github.com>
Co-authored-by: Paurikova2 <michaela.paurikova@dataquest.sk>
Co-authored-by: Kasinhou <129340513+Kasinhou@users.noreply.github.com>
Co-authored-by: Matus Kasak <matus.kasak@dataquest.sk>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Co-authored-by: Kim Shepherd <kim@shepherd.nz>
Co-authored-by: jurinecko <juraj.roka@dataquest.sk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants