Skip to content

[tests] Improve DoNotLeakWeakReferences reliability#11615

Merged
jonathanpeppers merged 4 commits into
mainfrom
jonathanpeppers/tests-donotleak-reliability
Jun 10, 2026
Merged

[tests] Improve DoNotLeakWeakReferences reliability#11615
jonathanpeppers merged 4 commits into
mainfrom
jonathanpeppers/tests-donotleak-reliability

Conversation

@jonathanpeppers

Copy link
Copy Markdown
Member

Improves reliability of the flaky JnienvTest.DoNotLeakWeakReferences test by consolidating three small fixes:

  1. Marshal worker thread exceptions — capture and re-throw exceptions from the worker Thread, so assertion failures inside it produce a real test failure instead of being swallowed.
  2. Allow drift in peer countsRuntime.GetSurfacedObjects() is process-global, and NUnit/MTP infrastructure (per-test TestExecutionContext, listeners, Console capture, etc.) creates and releases transient Java.Lang.Object peers around every test. Use Is.EqualTo(...).Within(tolerance) instead of Assert.AreEqual so the count can drift by a few entries between snapshots.
  3. Widen the tolerance to 10 — small bump to account for additional drift seen on slower devices/configurations. A real leak would dwarf this.

Split out of #11224 (branch jonathanpeppers/dogfood-dotnet-test) so it can land independently of the larger MTP / dotnet test dogfooding work. Only touches tests/Mono.Android-Tests/Mono.Android-Tests/Java.Interop/JnienvTest.cs.

jonathanpeppers and others added 3 commits June 9, 2026 14:17
Under stock NUnit 3 (after the dotnet test migration), an unhandled
NUnit AssertionException thrown from inside a raw 'new Thread (...)'
body becomes an unhandled exception on the .NET runtime side, which
SIGABRTs the whole process. The instrumentation host then never gets
a Finish() call and 'dotnet test' reports 'Zero tests ran' for the
entire run -- masking the real failure (Assert #2 inside the worker)
and taking out every other test in the assembly with it.

Capture exceptions from the worker thread in a local, then rethrow on
the test thread after Join() so an Assert failure (or any other
exception) becomes a normal NUnit test failure instead of a process
abort.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Build 14275720 had this test fail in 6 lanes simultaneously with two
failure modes:

  #2  Expected: 36/37  But was: 37/38   (+1 extra peer at allocation)
  #3  Expected: 36/37  But was: 34/35   (1-3 fewer peers after GC)

Build 14212378 on main (pre-PR) hit the same #2 failure, so this is a
pre-existing flake -- not a regression. Before ac81297, mode #2
escaped from new Thread() and SIGABRT'd the process (manifesting as
'Zero tests ran' for the whole lane); the marshal-exception fix now
surfaces it as a real test failure, which is why we suddenly see it.

Runtime.GetSurfacedObjects() is process-global: NUnit/MTP per-test
infrastructure (TestExecutionContext, listeners, Console capture)
allocates and releases Java.Lang.Object peers around every test, so
the count drifts by a few entries between snapshots. Observed drift
across recent builds is in the range -3..+1; allow +/-5 tolerance
which is plenty tight to still detect a real leak (which would dwarf
this) while removing the noise.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
On the CoreCLRTrimmable lane, the surfaced-object count drifted by 7
(expected 24 +/- 5, actual 31). Bump tolerance from 5 to 10 so transient
GC/NUnit infrastructure drift doesn't fail the test. A real leak would
still produce far higher counts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 9, 2026 19:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the reliability of the flaky JnienvTest.DoNotLeakWeakReferences device test by making failures from a worker thread observable and by tolerating small process-global drift in Runtime.GetSurfacedObjects() counts between snapshots.

Changes:

  • Add a tolerance window when asserting surfaced peer counts to account for per-test infrastructure churn.
  • Capture exceptions from the worker thread and fail the test on the main thread when they occur.

@jonathanpeppers jonathanpeppers added the ready-to-review This PR is ready to review/merge, I think any CI failures are just flaky (ignorable). label Jun 10, 2026
@jonathanpeppers jonathanpeppers merged commit be1e3b7 into main Jun 10, 2026
40 checks passed
@jonathanpeppers jonathanpeppers deleted the jonathanpeppers/tests-donotleak-reliability branch June 10, 2026 14:06
jonathanpeppers added a commit that referenced this pull request Jun 10, 2026
## Summary

Migrate Mono.Android device test infrastructure from the legacy NUnitLite stack to `dotnet test` with the Microsoft Testing Platform (MTP), following the pattern established for the `androidtest` template in 031246d.

Callers now run device tests with stock `dotnet test`, no custom MSBuild targets required. **NUnitLite is removed entirely** from the workload — both as a framework assembly and as a vendored copy.

## Changes

### New: MTP-based `TestInstrumentation` base class

- `tests/TestRunner.Core/TestInstrumentation.cs` is now an MTP-based on-device runner using stock NUnit (`NUnitTestAssemblyRunner`) plus `Microsoft.Testing.Platform`.
- Mirrors the `androidtest` template: tests run synchronously on the instrumentation thread (`NumberOfTestWorkers = 0`, see comment in code), TRX is written to the app data dir, and the path is sent back through the instrumentation bundle so the host adapter can pick it up.

### Updated test projects

- **Mono.Android.NET-Tests**, **Java.Interop-Tests**, **JcwGen-Tests**: now reference stock NUnit `3.13.3` from NuGet (no NUnitLite / TestRunner.NUnit), set `IsTestingPlatformApplication=true`, and use the new `TestInstrumentation` base.

### Deleted (~41K lines)

- `src/Xamarin.Android.NUnitLite/` — the `Xamarin.Android.NUnitLite.dll` framework assembly (NUnitLite wrapper, `TestSuiteInstrumentation`, GUI `TestSuiteActivity` / `TestResultActivity` / `OptionsActivity`, layouts).
- `src-ThirdParty/NUnitLite/` — the embedded NUnitLite source we used to compile into the above.
- `tests/TestRunner.NUnit/` — `NUnitTestRunner` / `NUnitTestInstrumentation`.
- `tests/TestRunner.xUnit/` — `XUnitTestRunner` / `XUnitTestInstrumentation`.
- `Xamarin.Android.NUnitLite.dll/.pdb/.xml` removed from `build-tools/installers/create-installers.targets` (no longer shipped in the framework).
- Various custom `RunTestApp` / device-test MSBuild targets.

The `XA4313` deprecation message in `Common.targets` (which suggested customers migrate `Xamarin.Android.NUnitLite` → `Xamarin.Legacy.NUnitLite`) is preserved as the existing escape hatch; the `Xamarin.Legacy.NUnitLite` NuGet is published from a separate repo and is unaffected by these changes.

### NativeAOT trimmer fix

NUnit 3.13.3's `EquatablesComparer` walks every value via reflection: `type.GetInterfaceMap(typeof(IEquatable<T>))` then `MethodInfo.Invoke` on the result. Under stock NativeAOT this throws `Arg_NotSupportedException` because ILC strips the dispatch-slot mapping. NUnit 3.14+ guards this with try/catch, but we cannot bump globally (it regresses the Mono Interpreter `AwaitAdapter` lane).

Real fix, no `[Category("NativeAOTIgnore")]` shortcuts:

- `tests/Mono.Android-Tests/Mono.Android-Tests/Mono.Android.NET-Tests.csproj`: under `'$(PublishAot)' == 'true'`, set `<IlcGenerateCompleteTypeMetadata>true</IlcGenerateCompleteTypeMetadata>` and pull in a new `NativeAOT.rd.xml`.
- `tests/Mono.Android-Tests/Mono.Android-Tests/NativeAOT.rd.xml` roots the closed `IEquatable<T>` instantiations the test value types need (`String`, `Boolean`, `IntPtr`, `TimeSpan`, `DateTime`) plus the concrete types themselves as `Dynamic="Required All"`, so ILC keeps the slot mapping. Also roots `TaskAwaitAdapter+GenericAdapter<VoidTaskResult>` for NUnit's async adapter.

Result on CI build [14325032](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=14325032): NativeAOT lane 230 passed / 0 failed (43 legitimate category exclusions).

### CI updates (`apk-instrumentation.yaml`)

- Build with `-t:Install`, run with `dotnet test --no-build --report-trx`.
- Test results are published as VSTest (TRX) instead of NUnit XML.
- Added `adb logcat -d` capture so device-test crashes can be diagnosed without re-running CI.

### Test fixes / bookkeeping

- Centralized `JavaSystem.LoadLibrary ("reuse-threads")` in `TestInstrumentation.OnCreate` so it runs on the main Java thread instead of inside individual tests.
- Synced `ExcludedTestNames` / `ExcludedCategories` to match `main`.

## Out of scope

- `EmbeddedDSO` and `Locale-Tests` — old-style csproj that still `<Reference Include="Xamarin.Android.NUnitLite" />`. They need an SDK migration before this PR can fully delete that assembly's identity from the build (the `XA4313` deprecation will fire on these projects today). Tracked separately.

## Already split out

These improvements were carved out of this PR and merged separately to `main`:

- #11614 — `[tests] Use Android.Util.Log in JnienvTest.ThreadReuse callback`
- #11615 — `[tests] Improve DoNotLeakWeakReferences reliability`
- #11609 — `Bump to dotnet/java-interop@b881d21` — picks up dotnet/java-interop#1437 (modernize NUnit attributes), which is why no `NUnitCompatibility.cs` shim is needed in this PR.

## Related

- Template work: 031246d
- #11523 — fixed `UrlEscaping_Bug43411` on `main` (test had been silently dead under NUnitLite)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-review This PR is ready to review/merge, I think any CI failures are just flaky (ignorable).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants