Process shake restarts asynchronously#4879
Conversation
cc251cd to
88e0f88
Compare
0a7302b to
4090ec3
Compare
|
I can't get tests to pass with notifications fully async. Kinda makes sense looking at what these notifications indicate. I thought I could get away with only LSP handling everything synchronously of the VFS-side, but it appears not. I'm now trying out a small variant where notifications are still synchronously, but that the session restarts are initiated async, while still blocking subsequent LSP requests. So restarts still get squashed. |
42b60e4 to
69e0d70
Compare
1dfd24e to
5978cad
Compare
| sessionRestartTQueue <- withWorkerQueueSimple (cmapWithPrio Session.LogSessionWorkerThread recorder) "RestartTQueue" | ||
| sessionRestartTQueue <- liftIO $ newRestartSlot | ||
| sessionLoaderTQueue <- withWorkerQueueSimple (cmapWithPrio Session.LogSessionWorkerThread recorder) "SessionLoaderTQueue" | ||
| ContT $ \action -> withRestartWorker ideMVar $ action () |
There was a problem hiding this comment.
better to extend the logic of workerThread module so it can be reused here.
there is a slight chance that downstream code would swallow the async exception and hang the shutdown. workerThread handled that by setting up a shutdown flag .
Thanks for taking a peak at the PR! The idea I'm going for now here is to still have notifications be synchronous, but that they don't wait for the shake restart to finish, unlike now. So multiple subsequent edits will issue multiple synchronous notifications which will all start asynchronous shake restarts. This allows the restarts to merge. Note that this is still me experimenting, I'm not sure if I'm breaking preconditions. This is all quite nuancy. |
good idea
I’ve experienced this before: if we don’t wait for Shake to fully restart, requests may observe stale state. I think we should ensure any pending restarts have completed before processing new requests but keep the notifications side of the story as you suggested, It might work. Also it might be a good idea to go through the occurences of ----- update ---- Also we should move the "updatePositionMapping" in the notification handler into restart so every restart(merged or not) would see consistent mapping state ? |
7e4c4c7 to
224e74f
Compare
|
Alright, report time. Tests now pass. It now runs the notification handlers synchronously but any shake restart is done asynchronously through a worker thread. Some notes.
I ran the benchmarks in #4934, with smaller (very small) typing delays, which is optimal for observing the effect of this PR, but not super realistic. I couldn't observe a difference with the default delay of Running the typing burst benchmarks with very small typing delays
Example, Version, Configuration, name, success, samples, startup, setup, userT, delayedT, 1stBuildT, avgPerRespT, totalT, rulesBuilt, rulesChanged, rulesVisited, rulesTotal, ruleEdges, ghcRebuilds, maxResidency, allocatedBytes
cabal, upstream, All, hover after typing burst, True, 100, 4.07, 0.00, 6.49, 14.59, 0.55, 0.03, 21.09, 18, 15, 1164, 4042, 19942, 134, 286MB, 90219MB
cabal, HEAD, All, hover after typing burst, True, 100, 3.99, 0.00, 5.95, 14.93, 0.67, 0.03, 20.89, 18, 15, 1181, 4041, 19942, 114, 277MB, 88055MB
cabal, upstream, All, getDefinition after typing burst, True, 100, 0.66, 0.00, 7.51, 14.49, 0.64, 0.03, 22.02, 17, 14, 1183, 4045, 19949, 137, 271MB, 85329MB
cabal, HEAD, All, getDefinition after typing burst, True, 100, 0.57, 0.00, 6.10, 12.72, 0.75, 0.03, 18.83, 18, 15, 1181, 4043, 19949, 119, 279MB, 79739MB
cabal, upstream, All, completions after typing burst, True, 100, 0.67, 0.00, 9.62, 12.36, 0.76, 0.04, 21.99, 19, 16, 1182, 4043, 19941, 127, 348MB, 90711MB
cabal, HEAD, All, completions after typing burst, True, 100, 0.56, 0.00, 8.96, 11.11, 0.88, 0.04, 20.09, 19, 16, 1182, 4042, 19941, 117, 317MB, 88975MB |
@crtschin consider this, notification A then request B. How do we ensure B run after A's session restart in this PR ? |
Good question, I'll walk you through an example timeline.
Critically it's the locks that are added by notifications, which are waited on by follow-up requests, that ensures the ordering. |
| -- old session and @GetLinkable@ errors. | ||
| -- See Note [Notification vs request restart ordering] | ||
| (modifyForEvaluate queueForEvaluation >> waitForLastRestart st) | ||
| (modifyForEvaluate unqueueForEvaluation) |
There was a problem hiding this comment.
why do we not need to waitForLastRestart for unqueueForEvaluation as well ?
There was a problem hiding this comment.
Good one, I was thinking waiting was only needed for the session setup, but it makes sense for the eval itself as well.
| -- do need to occur here, the keys they invalidate need to propagate to the | ||
| -- worker so it can be used during the concrete restart. | ||
| -- See Note [Housekeeping rule cache and dirty key outside of hls-graph]. | ||
| !newDirty <- fromListKeySet <$> ioActionBetweenShakeSession |
There was a problem hiding this comment.
ioActionBetweenShakeSession should be run after all the shake rules that rely on these states fully stop.
If it happpens here, there is chance that the newer state being picked up by an old step rule run and result in inconsistancy of the shake database. It is used to be an old bug stay in hls for years until it was fixed.
There was a problem hiding this comment.
Hmm, I tried this, but some tests failed. I'll try this again, as some of the failures may have been eval-related.
There was a problem hiding this comment.
Thanks, some tests failed probably means something is out of sync. We need to be very careful and find out all of the possible out of sync parts.
Moves in-between shake actions to be accumulated and ran on the worker thread again, instead of prior to restart submission.
ca33b02 to
26d319d
Compare
0ea6a00 to
2f26366
Compare
`unload` and `loadDecls` must share the same `Interp` stale artefacts survive into splice evaluation. Add a `Var (Maybe Interp)` to `SessionEnv` so all `emptyHscEnvM` calls can reuse the first `Interp` created.
2f26366 to
0af5fa8
Compare
|
I think I got where the failure I encountered came from, it was TH interpreter related. A new interpreter is created on each init of a The fix in the last commit is to keep a |
|
The TH interpreter is out of my area, I know nothing about it. |
There was a problem hiding this comment.
atomically $ updatePositionMapping ide (VersionedTextDocumentIdentifier _uri _version) [] in places like mkPluginNotificationHandler LSP.SMethod_TextDocumentDidOpen might need to go into setFileModified too for cases it is called
In fact I am not very sure about this, but worth thinking about this moving part. On the one hand, we need it to be sync with the internal shake state, on the other hand, we need it to sync with the actual file state. We are bound to have a close/open world gap here. Perhaps the better idea is to halt the shake session immediatly when a restart is needed. But that was a whole lot of changes.
See lastValueIO, we would attach the version of the mapping to the build result using the this.
|
I think I see what you meant now with the position mapping. I didn't quite get it at first. So the 2 approaches are:
Consider the following sequence of events: a. Request With approach (1), the position mapping is updated between steps (b) and (c). This means that With approach (2), the position mapping is updated after step (d). This means that Thinking about this, I think both might be wrong. With approach (1) it might be possible for an in-progress request action to see multiple different position mappings for a single rule. Approach (2) is wrong in that the request would be working with a position mapping that doesn't reflect the file. I think the right solution would be for the processing of a request to be atomic relative to a position mapping. This would require Converting back to draft, as this notes a pretty fundamental issue with the PR. |

Would close #4725.
Implements the ideas from #4725 (comment). Makes the following changes: