feat: Text2SQL — guarded natural-language query (sqlglot AST + mandatory readonly)#2
Open
dividduang wants to merge 5 commits into
Open
feat: Text2SQL — guarded natural-language query (sqlglot AST + mandatory readonly)#2dividduang wants to merge 5 commits into
dividduang wants to merge 5 commits into
Conversation
added 5 commits
June 22, 2026 22:32
- Add text2sql engine: guardrails, schema metadata, readonly DB access - Add dataset/table/example CRUD, schema, service layer - Wire text2sql capability and v1 router endpoint - Add web search (Exa/Tavily) and Text2SQL settings in plugin.toml - Add guardrails tests
…config mgmt - Resolve conflicts in plugin.toml, .env.example, 8 SQL files - Reassign Text2SQL snowflake menu IDs 840-843 -> 849-852 to avoid collisions with upstream's EditAIDefaultModel/AIConfigManage/QuickPhrase - Preserve all Text2SQL python (crud/model/schema/service/text2sql/tests) - Adopt upstream: default_model mgmt, AI config menu, chat refactor
Address upstream review blockers for the Text2SQL feature. Security (fail-closed): - Rewrite guardrail with sqlglot AST: full schema.table allowlist (fixes mysql.user->user namespace-collision bypass), reject tableless recon (@@hostname/USER()/SLEEP/LOAD_FILE...), deny dangerous funcs/vars, scan for write/DDL nodes (DELETE...RETURNING in subquery), LIMIT clamp. - Make readonly DB mandatory (no main-DB fallback). - Wire AI_TEXT2SQL_ENABLED (capability builder + run_query). - _execute_final: add asyncio.wait_for timeout; remove dead Text2SqlTimeoutError. Model consolidation (M4): - Add AIDefaultModelScene.text2sql; resolve via ai_default_model_service. - Remove AI_TEXT2SQL_PROVIDER_ID / AI_TEXT2SQL_MODEL_ID. Tests: 41 guardrail cases incl. namespace-collision, tableless recon, INTO OUTFILE, DELETE...RETURNING, large-LIMIT clamp.
Replace the bespoke 3-tool pydantic-ai Agent (list_tables/describe_table/ execute_sql) with a single structured-output call (output_type=Text2SqlResult, no tools). Table/column context is pre-fetched server-side and inlined into the system prompt. Why: the tool loop duplicated the plugin's existing capability/builtin_toolset pipeline (reviewer 'major reject reason'). The execute_sql self-correction loop was redundant -- the final SQL is re-guarded + re-executed by _execute_final regardless. Net effect: smaller attack surface (the Agent can no longer execute SQL at all), one LLM round-trip instead of N. run_query return contract unchanged. _resolve_model / _execute_final / _write_history untouched. Drops now-unused imports (json, guardrail exceptions).
The chat capability tool text2sql_query was described as 'FBA 业务数据' with order/supplier examples, so the model did not invoke it for log/count questions (e.g. 'how many operation logs today'). Broaden the description to explicitly cue logs/counts/stats and state that any database-data question should prefer this tool.
Member
|
建议搞成独立插件,vb 时发给 AI:使用 fba skills depends_on 将 text2sql 插件独立 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Text2SQL: a user asks a natural-language question, an LLM generates SQL, the system executes it against a SELECT-only readonly account and returns rows. Exposed as a chat capability (
text2sql_queryfunction tool) plus a standalone/queriesendpoint, with dataset/table/example management.Security model (fail-closed)
text2sql/guardrails.py): single SELECT only; everyTablenode is checked against the dataset allowlist by fullschema.table— no namespace stripping, somysql.user/information_schema.*exfil via name collision is blocked. Rejects tableless recon (@@hostname,USER(),VERSION(),SLEEP,LOAD_FILE…), dangerous functions, system/session variables, and write/DDL nodes anywhere in the tree (incl. PostgreSQLDELETE … RETURNINGinside a subquery). ClampsLIMITtomax_rows.text2sql/readonly_db.py): ifAI_TEXT2SQL_READONLY_*is unset, execution refuses — no fallback to the writable main DB.AI_TEXT2SQL_ENABLEDgate (defaultfalse) on both the capability builder and/queries._execute_final(defense in depth); the agent itself never executes SQL.Architecture
{sql, summary}via structured output. This avoids a second pydantic-ai Agent with its own tool loop / model resolution / history (the prior duplication).AIDefaultModelScene.text2sqlviaai_default_model_service(no parallelPROVIDER_ID/MODEL_IDconfig).ai_*/gen_*plugin tables.Tests
41 guardrail cases (pure-logic, no DB) covering: namespace-collision bypass, tableless recon, all DML/DDL, multi-statement,
INTO OUTFILE,DELETE…RETURNINGin subquery, UNION/CTE exfil, large-LIMITclamp.Config (
plugin.toml)AI_TEXT2SQL_ENABLED(default false),_SCHEMA,_MAX_ROWS,_TIMEOUT,_MAX_RETRIES,_READONLY_{HOST,PORT,USER,PASSWORD}.Follow-ups (not in this PR)