Skip to content

[NFC] Simplify lexer and move to header#8597

Merged
tlively merged 2 commits intomainfrom
parser-slowdown
Apr 14, 2026
Merged

[NFC] Simplify lexer and move to header#8597
tlively merged 2 commits intomainfrom
parser-slowdown

Conversation

@tlively
Copy link
Copy Markdown
Member

@tlively tlively commented Apr 13, 2026

The lexer previously used its own internal LexerCtx abstraction that allowed it to consume the characters that made up a token without changing the lexer state, then update the state at once when committing to consuming the characters. However, manually resetting the lexer to the original position when giving up on parsing a token is simple enough that this abstraction was not holding its weight. Simplify the lexer by removing internal contexts, and move the simplified method bodies to lexer.h. Generally we try to avoid putting lots of code in headers, but in this case making the code available to the inliner, along with removing the extra layer of abstraction, makes the parser about 20% faster.

The lexer previously used its own internal `LexerCtx` abstraction that allowed it to consume the characters that made up a token without changing the lexer state, then update the state at once when committing to consuming the characters. However, manually resetting the lexer to the original position when giving up on parsing a token is simple enough that this abstraction was not holding its weight. Simplify the lexer by removing internal contexts, and move the simplified method bodies to lexer.h. Generally we try to avoid putting lots of code in headers, but in this case making the code available to the inliner, along with removing the extra layer of abstraction, makes the parser about 20% faster.
@tlively tlively requested a review from kripken April 13, 2026 23:58
@tlively tlively requested a review from a team as a code owner April 13, 2026 23:58
tlively added a commit that referenced this pull request Apr 14, 2026
The first parser pass is responsible for two things: finding the locations of definitions of top-level module items like globals and functions and finding the locations of implicit function type definitions. It previously accomplished the latter by fully parsing every instruction in each function. But the IR is not constructed in this phase of parsing, so fully parsing every instruction was largely wasted work. Optimize the parser by parsing only the instructions that might have implicit type definitions and otherwise just blindly match parentheses to skip the function body. Combined with #8597, this speeds up parsing by 30-40%.
Copy link
Copy Markdown
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I want to know how this affects our compile times? 😄

@tlively
Copy link
Copy Markdown
Member Author

tlively commented Apr 14, 2026

No difference! (at least on a very unscientific experiment with N=1)

@tlively tlively merged commit 54f9f7a into main Apr 14, 2026
16 checks passed
@tlively tlively deleted the parser-slowdown branch April 14, 2026 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants