Process shake restarts asynchronously by crtschin · Pull Request #4879 · haskell/haskell-language-server

crtschin · 2026-03-31T19:53:56Z

Would close #4725.

Implements the ideas from #4725 (comment). Makes the following changes:

Processes notifications asynchronously keeping with the logic that notifications preceding any request, also finish before that request is processed. But does so asynchronously.
The above opens up the possibility that there could be multiple shake restarts in-flight because multiple notifications arriving in quick succession. Instead of storing these restarts in a queue, accumulate all restart changes in a single slot, to be processed at once.

crtschin · 2026-04-09T17:30:38Z

I can't get tests to pass with notifications fully async. Kinda makes sense looking at what these notifications indicate. I thought I could get away with only LSP handling everything synchronously of the VFS-side, but it appears not.

I'm now trying out a small variant where notifications are still synchronously, but that the session restarts are initiated async, while still blocking subsequent LSP requests. So restarts still get squashed.

soulomoon

good work on the restart squash.
But if notifications are still handled synchronously, #4725 won't be close since consective typing emits consective notifications and their restarts won't be able to merge

soulomoon · 2026-05-03T10:37:03Z

-  sessionRestartTQueue <- withWorkerQueueSimple (cmapWithPrio Session.LogSessionWorkerThread recorder) "RestartTQueue"
+  sessionRestartTQueue <- liftIO $ newRestartSlot
  sessionLoaderTQueue <- withWorkerQueueSimple (cmapWithPrio Session.LogSessionWorkerThread recorder) "SessionLoaderTQueue"
+  ContT $ \action -> withRestartWorker ideMVar $ action ()


better to extend the logic of workerThread module so it can be reused here.

there is a slight chance that downstream code would swallow the async exception and hang the shutdown. workerThread handled that by setting up a shutdown flag .

crtschin · 2026-05-04T17:45:02Z

But if notifications are still handled synchronously, #4725 won't be close since consective typing emits consective notifications and their restarts won't be able to merge

Thanks for taking a peak at the PR! The idea I'm going for now here is to still have notifications be synchronous, but that they don't wait for the shake restart to finish, unlike now. So multiple subsequent edits will issue multiple synchronous notifications which will all start asynchronous shake restarts. This allows the restarts to merge.

Note that this is still me experimenting, I'm not sure if I'm breaking preconditions. This is all quite nuancy.

soulomoon · 2026-05-04T21:52:26Z

So multiple subsequent edits will issue multiple synchronous notifications which will all start asynchronous shake restarts. This allows the restarts to merge.

good idea

they don't wait for the shake restart to finish

I’ve experienced this before: if we don’t wait for Shake to fully restart, requests may observe stale state. I think we should ensure any pending restarts have completed before processing new requests but keep the notifications side of the story as you suggested, It might work.

Also it might be a good idea to go through the occurences of mkPluginNotificationHandler and find out if bug would occur.

----- update ----

Also we should move the "updatePositionMapping" in the notification handler into restart so every restart(merged or not) would see consistent mapping state ?

crtschin · 2026-05-15T23:13:37Z

Alright, report time. Tests now pass. It now runs the notification handlers synchronously but any shake restart is done asynchronously through a worker thread. Some notes.

I refactored WorkerThread so I could swap out the queue for a single slot that merges elements.
While shake restarts are asynchronously, most notifications have side-effects they want to have happen inbetween triggering shake restarts, e.g. changing the set of FOI. Tripped me up quite a bit what to do with this in the context of merging shake restarts, the current version runs these prior to even submitting the shake restart.
The eval plugin contains a request that triggers a shake restart and subsequently waited on it. As the shake restart is now async, I had to add a blocking wait on the shake restart finishing.
Other variations I tried but didn't pass tests:
- Running notifications async.
- Running notifications sync, with async shake restarts, but accumulating and calling the notification actions in between shake restarts.

I ran the benchmarks in #4934, with smaller (very small) typing delays, which is optimal for observing the effect of this PR, but not super realistic. I couldn't observe a difference with the default delay of 0.25s.

Running the typing burst benchmarks with very small typing delays

Example, Version, Configuration, name, success, samples, startup, setup, userT, delayedT, 1stBuildT, avgPerRespT, totalT, rulesBuilt, rulesChanged, rulesVisited, rulesTotal, ruleEdges, ghcRebuilds, maxResidency, allocatedBytes
cabal, upstream, All, hover after typing burst, True, 100, 4.07, 0.00, 6.49, 14.59, 0.55, 0.03, 21.09, 18, 15, 1164, 4042, 19942, 134, 286MB, 90219MB
cabal, HEAD, All, hover after typing burst, True, 100, 3.99, 0.00, 5.95, 14.93, 0.67, 0.03, 20.89, 18, 15, 1181, 4041, 19942, 114, 277MB, 88055MB
cabal, upstream, All, getDefinition after typing burst, True, 100, 0.66, 0.00, 7.51, 14.49, 0.64, 0.03, 22.02, 17, 14, 1183, 4045, 19949, 137, 271MB, 85329MB
cabal, HEAD, All, getDefinition after typing burst, True, 100, 0.57, 0.00, 6.10, 12.72, 0.75, 0.03, 18.83, 18, 15, 1181, 4043, 19949, 119, 279MB, 79739MB
cabal, upstream, All, completions after typing burst, True, 100, 0.67, 0.00, 9.62, 12.36, 0.76, 0.04, 21.99, 19, 16, 1182, 4043, 19941, 127, 348MB, 90711MB
cabal, HEAD, All, completions after typing burst, True, 100, 0.56, 0.00, 8.96, 11.11, 0.88, 0.04, 20.09, 19, 16, 1182, 4042, 19941, 117, 317MB, 88975MB

soulomoon · 2026-05-15T23:27:42Z

any shake restart is done asynchronously through a worker thread

@crtschin consider this, notification A then request B. How do we ensure B run after A's session restart in this PR ?

crtschin · 2026-05-15T23:48:02Z

@crtschin consider this, notification A then request B. How do we ensure B run after A's session restart in this PR ?

Good question, I'll walk you through an example timeline.

Notification A adds a lock identifying itself here.
Notification A gets executed.
Notification A queues up a shakeRestart via one of the set*Modified helpers.
Notification A spawns a thread waiting on the restart being finished and fills the lock it created in step (1) when the shake restart is finished.
Notification A finishes
Request B starts being processed asynchronously
Request B waits on all notification locks that were submitted prior.
Worker thread finishes the shake restart submitted by notification A.
Request B proceeds and gets handled.

Critically it's the locks that are added by notifications, which are waited on by follow-up requests, that ensures the ordering.

soulomoon

@crtschin Good work and thanks for the explaination.
There is a problem about ioActionBetweenShakeSession.

soulomoon · 2026-05-16T04:26:33Z

+              -- old session and @GetLinkable@ errors.
+              -- See Note [Notification vs request restart ordering]
+              (modifyForEvaluate queueForEvaluation >> waitForLastRestart st)
+              (modifyForEvaluate unqueueForEvaluation)


why do we not need to waitForLastRestart for unqueueForEvaluation as well ?

Good one, I was thinking waiting was only needed for the session setup, but it makes sense for the eval itself as well.

soulomoon · 2026-05-16T04:35:32Z

+  -- do need to occur here, the keys they invalidate need to propagate to the
+  -- worker so it can be used during the concrete restart.
+  -- See Note [Housekeeping rule cache and dirty key outside of hls-graph].
+  !newDirty <- fromListKeySet <$> ioActionBetweenShakeSession


ioActionBetweenShakeSession should be run after all the shake rules that rely on these states fully stop.
If it happpens here, there is chance that the newer state being picked up by an old step rule run and result in inconsistancy of the shake database. It is used to be an old bug stay in hls for years until it was fixed.

Hmm, I tried this, but some tests failed. I'll try this again, as some of the failures may have been eval-related.

Thanks, some tests failed probably means something is out of sync. We need to be very careful and find out all of the possible out of sync parts.

Moves in-between shake actions to be accumulated and ran on the worker thread again, instead of prior to restart submission.

`unload` and `loadDecls` must share the same `Interp` stale artefacts survive into splice evaluation. Add a `Var (Maybe Interp)` to `SessionEnv` so all `emptyHscEnvM` calls can reuse the first `Interp` created.

crtschin · 2026-05-18T23:14:35Z

I think I got where the failure I encountered came from, it was TH interpreter related. A new interpreter is created on each init of a HscEnv in a session for each module. If a restart occurs concurrently with a load, I believe the loader could observe different TH splice declarations in dependencies, which are unloaded in a different interpreter they are loaded in on.

The fix in the last commit is to keep a MVar that keeps the first interpreter that's ever instantiated and shares it across all HscEnv initializations. There are already explicit calls to unload in the interpreter, so I think it's safe to keep a single instance around permanently.

soulomoon · 2026-05-19T07:14:56Z

The TH interpreter is out of my area, I know nothing about it.
perhaps @fendor can review the TH interpreter part ?

soulomoon

atomically $ updatePositionMapping ide (VersionedTextDocumentIdentifier _uri _version) [] in places like mkPluginNotificationHandler LSP.SMethod_TextDocumentDidOpen might need to go into setFileModified too for cases it is called

In fact I am not very sure about this, but worth thinking about this moving part. On the one hand, we need it to be sync with the internal shake state, on the other hand, we need it to sync with the actual file state. We are bound to have a close/open world gap here. Perhaps the better idea is to halt the shake session immediatly when a restart is needed. But that was a whole lot of changes.

See lastValueIO, we would attach the version of the mapping to the build result using the this.

crtschin · 2026-05-21T21:40:52Z

I think I see what you meant now with the position mapping. I didn't quite get it at first. So the 2 approaches are:

Previously: Update position mapping in the notification handler.
Proposed: Update position mapping when the shake restart is processed, in-between the shake session being up.

Consider the following sequence of events:

a. Request $R$ arrives
b. Notification $N$ arrives
c. Request $R$ executes its actions.
d. Notification $N$ restart is processed.

With approach (1), the position mapping is updated between steps (b) and (c). This means that $R$ sees the most up-to-date position mapping.

With approach (2), the position mapping is updated after step (d). This means that $R$ saw the same position mapping as was valid at the time it started being processed.

Thinking about this, I think both might be wrong. With approach (1) it might be possible for an in-progress request action to see multiple different position mappings for a single rule. Approach (2) is wrong in that the request would be working with a position mapping that doesn't reflect the file.

I think the right solution would be for the processing of a request to be atomic relative to a position mapping. This would require useWithStaleFast (and lastValueIO) to be in STM. This would be a pretty huge refactor though, as useWithStaleFast is a 6-year old function that's used in a lot of places.

Converting back to draft, as this notes a pretty fundamental issue with the PR.

crtschin requested review from fendor and wz1000 as code owners March 31, 2026 19:53

crtschin marked this pull request as draft March 31, 2026 19:56

crtschin force-pushed the crtschin/deduplicate-shake-restarts branch 4 times, most recently from cc251cd to 88e0f88 Compare April 1, 2026 23:22

crtschin mentioned this pull request Apr 7, 2026

Avoid blocking when prepping pragmas for inlay #4882

Closed

fendor requested a review from soulomoon April 7, 2026 09:37

crtschin force-pushed the crtschin/deduplicate-shake-restarts branch from 0a7302b to 4090ec3 Compare April 9, 2026 17:27

crtschin force-pushed the crtschin/deduplicate-shake-restarts branch 2 times, most recently from 42b60e4 to 69e0d70 Compare April 9, 2026 22:30

crtschin force-pushed the crtschin/deduplicate-shake-restarts branch 5 times, most recently from 1dfd24e to 5978cad Compare May 2, 2026 22:54

soulomoon added the performance Issues about memory consumption, responsiveness, etc. label May 3, 2026

soulomoon reviewed May 3, 2026

View reviewed changes

crtschin changed the title ~~Process notifications asynchronously~~ Process shake restarts asynchronously May 4, 2026

crtschin changed the title ~~Process shake restarts asynchronously~~ WIP: Process shake restarts asynchronously May 4, 2026

crtschin added 6 commits May 13, 2026 22:38

Run notifications async to other notifications

cb98898

Add shake restart merging

8414482

Add tests on shake restart merging

85cbae7

Only do async session restarts

08aa58d

Don't swallow async cancel exceptions

1d92f82

Move withRestartWorker to the other worker functions

aa88d13

Move worker restart ref into WorkerThread

224e74f

crtschin force-pushed the crtschin/deduplicate-shake-restarts branch from 7e4c4c7 to 224e74f Compare May 13, 2026 20:38

crtschin added 2 commits May 14, 2026 03:52

Ensure notification side-effects are immediately visible

46beafa

Ensure eval handlers can wait on session restarts

b6bd052

crtschin changed the title ~~WIP: Process shake restarts asynchronously~~ Process shake restarts asynchronously May 15, 2026

crtschin marked this pull request as ready for review May 15, 2026 23:14

crtschin requested a review from soulomoon May 15, 2026 23:19

soulomoon reviewed May 16, 2026

View reviewed changes

Address review comments

fad54eb

Moves in-between shake actions to be accumulated and ran on the worker thread again, instead of prior to restart submission.

crtschin force-pushed the crtschin/deduplicate-shake-restarts branch 2 times, most recently from ca33b02 to 26d319d Compare May 17, 2026 20:11

crtschin added 2 commits May 18, 2026 00:00

Reduce worker thread exported interface

d0aba40

Fix unnecessary indenting changes

310c16d

crtschin force-pushed the crtschin/deduplicate-shake-restarts branch 4 times, most recently from 0ea6a00 to 2f26366 Compare May 18, 2026 23:00

Reuse TH interpreter accross reloads

0af5fa8

`unload` and `loadDecls` must share the same `Interp` stale artefacts survive into splice evaluation. Add a `Var (Maybe Interp)` to `SessionEnv` so all `emptyHscEnvM` calls can reuse the first `Interp` created.

crtschin force-pushed the crtschin/deduplicate-shake-restarts branch from 2f26366 to 0af5fa8 Compare May 18, 2026 23:06

soulomoon reviewed May 21, 2026

View reviewed changes

crtschin marked this pull request as draft May 21, 2026 21:41

Uh oh!

Conversation

crtschin commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crtschin commented Apr 9, 2026

Uh oh!

soulomoon left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulomoon May 3, 2026

Choose a reason for hiding this comment

Uh oh!

crtschin commented May 4, 2026

Uh oh!

soulomoon commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crtschin commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

soulomoon commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crtschin commented May 15, 2026

Uh oh!

soulomoon left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soulomoon May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crtschin May 17, 2026

Choose a reason for hiding this comment

Uh oh!

soulomoon May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crtschin May 17, 2026

Choose a reason for hiding this comment

Uh oh!

soulomoon May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crtschin commented May 18, 2026

Uh oh!

soulomoon commented May 19, 2026

Uh oh!

soulomoon left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crtschin commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

crtschin commented Mar 31, 2026 •

edited

Loading

soulomoon left a comment •

edited

Loading

soulomoon commented May 4, 2026 •

edited

Loading

crtschin commented May 15, 2026 •

edited

Loading

soulomoon commented May 15, 2026 •

edited

Loading

soulomoon left a comment •

edited

Loading

soulomoon May 16, 2026 •

edited

Loading

soulomoon May 16, 2026 •

edited

Loading

soulomoon May 17, 2026 •

edited

Loading

soulomoon left a comment •

edited

Loading

crtschin commented May 21, 2026 •

edited

Loading