⚡️ Add few-shot examples and a "SQON Cheat Sheet" to improve LLM SQON Generation#1078
Open
mistryrn wants to merge 1 commit into
Open
Conversation
… Generation * Added few-shot examples of valid SQON to the `sqon` parameter description in the `execute-query` tool * Added a "SQON Cheat Sheet" as the `text` response from the `get-sqon-schema` tool, while still returning the full schema response in `structuredContent` * The cheat sheet is structured to help LLMs understand how to structure SQON and build a query, with examples to reinforce best practices and override any erroneous/outdated training data * Updated the `execute-query` tool description to reinforce utilizing the "SQON cheat sheet" * Updated `get-sqon-schema` integration tests to reflect new response structure
mistryrn
commented
Jun 25, 2026
| **Fix:** (a) Docs/schema comments pass: update README.md:13, docs/usage/02-arranger-components.md (title + line 29), configs.json.schema:28, and console strings in configs/index.ts. (b) Identifier rename pass (separate commit): buildCatalogsFromFolder -> buildCatalogsFromDirectory, folderName -> directoryName in apps/search-server/src/configs/. (c) Cross-references: add pointer to docs/concepts.md early in docs/usage/02-arranger-components.md; introduce "filter clause" for leaf nodes in docs/sqon/03-sqon-in-detail.md. | ||
| **Standalone:** yes; (a) is docs-only; (b) is a mechanical rename; (c) is a docs addition. All three independent. | ||
|
|
||
| ### `docs/concepts.md` SQON examples use `field` instead of `fieldName` |
Contributor
Author
There was a problem hiding this comment.
Removing this from tech debt as the docs-in-question have been corrected as of 834ad94#diff-c282aa25f1dabaaba19197006c4aa6bab00e8d2b2100375c42b41b7cf4bc55cb
| * SQON schema in `@overture-stack/sqon` (modules/sqon/src/schema): the operator names and node | ||
| * shapes below are derived from it. Every example here is checked against `SqonSchema`. | ||
| */ | ||
| const SQON_CHEAT_SHEET = `SQON cheat sheet (Serializable Query Object Notation). |
Contributor
Author
There was a problem hiding this comment.
This is the result of a few revisions of a "SQON Cheat Sheet". Intended to be returned alongside the full JSON schema response (that is structuredContent, this gets returned as text) to guide the LLM in generating valid SQON. It can certainly be refined further, but so far the results are promising 🙏
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Enhancements to PR #1077 to improve the likelihood that smaller LLMs such as
gemma-4-e4bcan successfully generate valid SQON for use with theexecute-querytool. Adds few-shot examples of valid SQON, as well as clear instructions on how to build SQON from a user's query.Issues
Description of Changes
Initial testing of the
execute-querytool with a small local LLM (gemma-4-e4b) revealed that despite having access to the full SQON schema from theget-sqon-schematool, small LLMs still failed to produce valid SQON. An analysis led by a Claude thinking model determined 5 potential root causes for the failures, summarized below:get-sqon-schemareturns a validation artifact, not a generation guide$refssqoninput parameter toexecute-queryisZod.unknown(), which provides zero structure at the protocol levelThe docs have since been updated (thanks, @justincorrigible !), so this PR focuses on issues 1 and 2 as follows:
execute-query/SQON-generation flow, aiming to take precedence over any prior training.textresponse ofget-sqon-schemato be a SQON generation guide, while keeping thestructuredContentas the full JSON schema.Local re-testing with
gemma-4-e4bfollowing these 2 enhancements has shown promising results so far, allowing the model to generate valid SQON for some basic example queries.MCP Server
sqonparameter description in theexecute-querytooltextresponse from theget-sqon-schematool, while still returning the full schema response instructuredContentexecute-querytool description to reinforce utilizing the "SQON cheat sheet"get-sqon-schemaintegration tests to reflect new response structureSpecial Instructions
There are no additional steps required to test these changes.
Readiness Checklist
.env.schemafile and documented in the README