Skip to content

feat(datafabric): inject ontology schema into inner SQL agent system prompt#911

Open
sankalp-uipath wants to merge 24 commits into
mainfrom
feat/datafabric-ontology-fetch-tool
Open

feat(datafabric): inject ontology schema into inner SQL agent system prompt#911
sankalp-uipath wants to merge 24 commits into
mainfrom
feat/datafabric-ontology-fetch-tool

Conversation

@sankalp-uipath

@sankalp-uipath sankalp-uipath commented Jun 16, 2026

Copy link
Copy Markdown

What

Grounds the Data Fabric inner SQL agent with the configured ontology by fetching it deterministically and embedding it in the inner system prompt — there is no LLM-decided tool. When a context has a nested ontologySet, the ontology's OWL schema is fetched once (at sub-graph build) and injected, so the model always writes SQL against the real schema.

  • ontology_fetcher.pyfetch_ontology_text(entities_service, ontologies): fetches each configured ontology's OWL via EntitiesService.get_ontology_file_async, concatenated; individual failures degrade gracefully to "use the entity schemas" rather than raising.
  • datafabric_tool.py — the handler's _ensure_datafabric_graph eagerly fetches the ontology text (once, cached with the compiled graph) when ontologies are configured and the flag is on, and passes it into the graph.
  • datafabric_prompt_builder.py — the ## Available Ontology section embeds the OWL as the authoritative schema the LLM grounds on.
  • datafabric_subgraph.py — the inner agent has a single tool, execute_sql; the ontology arrives via the prompt, not a tool.
  • context_tool.py — resolves ontologies from the nested ontologySet (flag-gated at the entry).

Why

Deterministic injection guarantees the model always has the ontology in context. An optional fetch_ontology tool (the earlier approach) is frequently not called by the model, so grounding silently doesn't happen. Injecting it directly removes that failure mode — and it keeps the inner graph a plain single-tool SQL loop.

Notes

  • Feature-flagged (DataFabricOntologyEnabled, default off) at every entry — ontology resolution (context_tool) and the eager fetch (datafabric_tool) are each gated on the same shared constant. Off ⇒ no fetch, byte-for-byte the original entities-only prompt.
  • Ontology names/folders are pinned from the agent definition, never supplied by the LLM.
  • Graceful degrade: a missing/oversized ontology never fails the run.
  • Resolution uses name + folderId only; aligned with SDK #1728 dropping the unused ontology referenceKey.
  • Depends on SDK #1728 (uipath 2.11.17 / uipath-platform 0.1.83, nested ontologySet model). Unit/lint CI stays red until #1728 merges and publishes; then uv lock (range uipath<2.12.0) turns them green. Do not merge a .dev pin.
  • Runtime delivery of the flag also requires it in uipath-agents-python's _ALL_FLAGS prefetch list + the gitops flag (gitops-centralized-cluster PR) deployed for the target tenants.
  • Follow-up PR adds R2RML fetching on top of this (branch feat/datafabric-ontology-r2rml-grounding).

Copilot AI review requested due to automatic review settings June 16, 2026 12:50

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional fetch_ontology inner tool to the Data Fabric SQL sub-agent so the inner LLM can retrieve a configured ontology’s OWL schema from the QueryEngine REST API and use it to generate semantically-correct SQL.

Changes:

  • Introduces an ontology REST client (fetch_ontology_owl) with name validation and size limiting.
  • Adds a fetch_ontology leaf tool with an instance-level cache and wires it into the inner Data Fabric subgraph alongside execute_sql.
  • Threads ontology_name / folder_key into the Data Fabric tool construction path (with an env-var fallback).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/uipath_langchain/agent/tools/datafabric_tool/ontology_fetch_tool.py New leaf tool (fetch_ontology) and cached fetcher wrapper for inner SQL agent use.
src/uipath_langchain/agent/tools/datafabric_tool/ontology_client.py New client helper to fetch OWL content via EntitiesService.request_async, including name validation and payload cap.
src/uipath_langchain/agent/tools/datafabric_tool/models.py Adds an intentionally-empty args schema (OntologyFetchInput) for the new tool.
src/uipath_langchain/agent/tools/datafabric_tool/datafabric_tool.py Plumbs ontology_name / folder_key into the query handler creation (currently with env-var fallback).
src/uipath_langchain/agent/tools/datafabric_tool/datafabric_subgraph.py Adds optional fetch_ontology tool binding and dispatch-by-tool-name inside the inner subgraph.

Comment thread src/uipath_langchain/agent/tools/datafabric_tool/datafabric_subgraph.py Outdated
Comment thread src/uipath_langchain/agent/tools/datafabric_tool/datafabric_tool.py Outdated
Comment thread src/uipath_langchain/agent/tools/datafabric_tool/ontology_client.py Outdated
Comment on lines +47 to +50
The result is cached on this instance. Because the instance lives as long
as the compiled sub-graph (which the handler caches), repeated calls across
queries hit the API at most once, surviving the per-query reset of the
inner sub-graph state.
safe_name = _validate_ontology_name(ontology_name)
# Same datafabric_ service the entities calls target; matches the
# QueryEngine ontology route GET /ontologies/{ontologyName}/files/{fileType}.
endpoint = f"datafabric_/api/ontologies/{safe_name}/files/owl"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these need to be stitched in uipath-python

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Comment on lines +142 to +150
@@ -131,35 +143,62 @@
*[self._execute_tool_call(tc) for tc in last.tool_calls]
)
tool_messages = [msg for msg, _ in results]
all_succeeded = bool(results) and all(success for _, success in results)
# End as soon as ANY tool call is a terminal success (a row-returning
# execute_sql). `any` not `all`: a non-terminal tool (e.g. fetch_ontology)
# co-issued in the same turn must not prevent a successful SQL from ending
# the loop.
any_succeeded = any(success for _, success in results)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of any_ check for FF to see what graph gets constructed.

Comment on lines 198 to 202
ToolMessage(
content=str(result),
tool_call_id=tool_call["id"],
name="execute_sql",
name=name,
),
Comment on lines +165 to +169
# Ontologies are first-class bindings, mirroring entity_set: a LIST, each
# carrying its own folderId so it is resolved from its own folder (entities
# may also span several folders). Empty → no fetch tool added. Config comes
# only from the agent definition (the binding), never from process env.
entity_folders = {
Comment on lines +98 to +103
out = await graph.tool_node(DataFabricSubgraphState(messages=[ai]))
# SQL returned rows → terminal, even though fetch_ontology (non-terminal)
# was co-issued in the same turn. This is the all()->any() fix.
assert out["last_tool_success"] is True
assert len(out["messages"]) == 2

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Comment on lines +14 to +16
import logging
from typing import Any

return self._cached
if not self._ontologies:
return "No ontologies are configured for this agent."
blocks = [await self._fetch_one(name, folder) for name, folder in self._ontologies]
tool_messages = [msg for msg, _ in results]
return {
"messages": tool_messages,
"iteration_count": state.iteration_count + len(last.tool_calls),
Comment thread src/uipath_langchain/agent/tools/datafabric_tool/datafabric_tool.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Comment on lines +145 to +157
# End as soon as ANY tool call is a terminal success (a row-returning
# execute_sql). `any` not `all`: a non-terminal tool (e.g. fetch_ontology)
# co-issued in the same turn must not prevent a successful SQL from ending
# the loop.
any_succeeded = any(success for _, success in results)
# When short-circuiting to END, return ONLY the terminal-success
# ToolMessages so the outer agent's result is the query rows — not a
# co-issued fetch_ontology's OWL. On a non-terminal turn keep all messages
# so the inner LLM can use them on its next pass.
if any_succeeded:
tool_messages = [msg for msg, success in results if success]
else:
tool_messages = [msg for msg, _ in results]
Comment on lines +55 to +57
self._entities_service = entities_service
self._ontologies = ontologies
self._cached: str | None = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outer agent does not support true parallel invocation. Parallel tool calls are executed sequentially by the outer agent, so each SQL agent instance won't be concurrently called.

Comment on lines +83 to +95
async def __call__(self, **_kwargs: Any) -> str:
"""Fetch all configured ontologies (cached), concatenated for the LLM."""
if self._cached is not None:
return self._cached
if not self._ontologies:
return "No ontologies are configured for this agent."
# Fetch all ontologies concurrently — each fetch is independent; order is
# preserved by gather, so the concatenation is deterministic.
blocks = await asyncio.gather(
*(self._fetch_one(name, folder) for name, folder in self._ontologies)
)
self._cached = "\n\n".join(blocks)
return self._cached
Comment on lines +28 to +30
tool = create_datafabric_query_tool(resource, MagicMock()) # type: ignore[arg-type]

assert tool.coroutine._ontologies == [("library", "f1")]
Comment on lines +36 to +38
tool = create_datafabric_query_tool(resource, MagicMock()) # type: ignore[arg-type]

assert tool.coroutine._ontologies == [("finance", "f2")]
Comment on lines +44 to +46
tool = create_datafabric_query_tool(resource, MagicMock()) # type: ignore[arg-type]

assert tool.coroutine._ontologies == []

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 2 comments.

Comment on lines 158 to 162
return {
"messages": tool_messages,
"iteration_count": state.iteration_count + len(last.tool_calls),
"last_tool_success": all_succeeded,
"last_tool_success": any_succeeded,
}
Comment on lines +83 to +95
async def __call__(self, **_kwargs: Any) -> str:
"""Fetch all configured ontologies (cached), concatenated for the LLM."""
if self._cached is not None:
return self._cached
if not self._ontologies:
return "No ontologies are configured for this agent."
# Fetch all ontologies concurrently — each fetch is independent; order is
# preserved by gather, so the concatenation is deterministic.
blocks = await asyncio.gather(
*(self._fetch_one(name, folder) for name, folder in self._ontologies)
)
self._cached = "\n\n".join(blocks)
return self._cached
# Inner toolset: always execute_sql; optionally an LLM-decided
# fetch_ontology tool when one or more ontologies are configured.
inner_tools: list[BaseTool] = [self._execute_sql_tool]
if ontologies:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnabledNewLlmClients <- check for the feature flag impl of this to ensure out feature is behind the feature flag.

@sankalp-uipath sankalp-uipath Jun 29, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will add it.

# fetch_ontology tool when one or more ontologies are configured.
inner_tools: list[BaseTool] = [self._execute_sql_tool]
if ontologies:
inner_tools.append(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesnt update the subgraph ? correct?

Comment on lines +142 to +150
@@ -131,35 +143,62 @@ async def tool_node(self, state: DataFabricSubgraphState) -> dict[str, Any]:
*[self._execute_tool_call(tc) for tc in last.tool_calls]
)
tool_messages = [msg for msg, _ in results]
all_succeeded = bool(results) and all(success for _, success in results)
# End as soon as ANY tool call is a terminal success (a row-returning
# execute_sql). `any` not `all`: a non-terminal tool (e.g. fetch_ontology)
# co-issued in the same turn must not prevent a successful SQL from ending
# the loop.
any_succeeded = any(success for _, success in results)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of any_ check for FF to see what graph gets constructed.

entity set) as ``ontologySet`` items. Each carries its own ``folderId``, so
it is fetched from its own folder.
"""
items = getattr(resource, "ontology_set", None) or []

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as other PR. ontology_set?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@sankalp-uipath sankalp-uipath force-pushed the feat/datafabric-ontology-fetch-tool branch from 8b04daa to 86e5912 Compare June 29, 2026 20:13
Comment on lines +60 to +66
def test_fetch_ontology_bound_only_when_ontologies(make_graph):
without = make_graph(None)
assert "execute_sql" in without._tools_by_name
assert "fetch_ontology" not in without._tools_by_name

with_onto = make_graph([("library", None)])
assert "fetch_ontology" in with_onto._tools_by_name

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: splitting this test into two(should bind when present/should not bind when absent) is trivial and allows instantly knowing what failed from the test name alone without checking the assertion message.

Comment on lines +55 to +57
self._entities_service = entities_service
self._ontologies = ontologies
self._cached: str | None = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outer agent does not support true parallel invocation. Parallel tool calls are executed sequentially by the outer agent, so each SQL agent instance won't be concurrently called.

Comment on lines +161 to +164
# An ontology context is not a standalone tool — it only grounds the Data
# Fabric entity tool, which gathers it via resolve_context_ontologies.
if resource.context_type == AgentContextType.DATA_FABRIC_ONTOLOGY:
return None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not a standalone tool at runtime, I think it is confusing to model it as a top level resource at design time. So far, all "resource nodes" in a lowcode agent (either standalone or part of flow), are independently executable and show up in traces. This is now a different paradigm, it is an optional helper tool that will be part of another tool's subgraph.

That being said this only applies to how it's modeled today. If we indeed plan to expand ontology support in the future such that they will actually allow queries (via something like SPARQL statements for instance); then it will be better for future proofing to define them top level (at least in the package mapping). We can figure out a less confusing design time experience for now

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have plan to expand ontology support to make it a primary design experience i.e, user will select the ontologies and then it will resolve the entities internally, thus decision of making it top level resource as a part of iterative development.

Comment on lines +156 to +173
lines.append("## Available Ontology (authoritative semantic schema)")
lines.append("")
lines.append(
f"This agent has a semantic ontology attached for these entities: "
f"{names}. It is the authoritative source for the exact column names, "
"value formats (date formats, codes, zero-padding), allowed values, "
"and the relationships between entities — richer and more reliable "
"than the field list below, which omits value formats and semantics."
)
lines.append("")
lines.append(
"**Before writing any SQL, call the `fetch_ontology` tool once** to "
"load it, then base your column names, filter values, and joins on "
"what it says. The entity tables below are a quick reference only; "
"the ontology is the source of truth when they disagree."
)
lines.append("")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could be cleaner to have this as a single formatted string depending on names instead of individually applying each line like this.

Applicable to the existing sql_expert_system_prompt as well, but that one wasn't introduced by this PR

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, please review. Also there are some changes linked to your other comment in the data fabric prompt builder (adding ontology text in the prompt).

Comment on lines +167 to +170
# When short-circuiting to END, return ONLY the terminal-success
# ToolMessages so the outer agent's result is the query rows — not a
# co-issued fetch_ontology's OWL. On a non-terminal turn keep all messages
# so the inner LLM can use them on its next pass.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't concurrent execution of a ontology retrieval + data service query an anomaly? It doesn't seem to be correct. Why not mechanically enforce ontology retrieval and injecting it in the context. When is it useful for the llm to choose not to fetch the ontology?

@sankalp-uipath sankalp-uipath Jul 1, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, earlier we were doing so as to support future use cases where llm will query the ontology (for ex using SPARQL), instead of giving the complete ontology to agent.
But I agree with you right now it makes more sense to mechanically injecting it in the system prompt.
I have made the changes please review again.

Comment on lines +50 to +58
ontologies: list[tuple[str, str | None]] = []
for resource in resources:
if (
isinstance(resource, AgentContextResourceConfig)
and resource.is_datafabric_ontology
):
for item in resource.ontology_set or []:
ontologies.append((item.name, item.folder_key))
return ontologies

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I understand correctly, we implicitly assume all topologies will apply to this data service entity context. Shouldn't the link be more explicitly defined? IE either:
a) when defining an Data Service Context resource you can also specify one or more ontologies
b) when defining the Ontology Context resource you specify the list of entities it describes

@sankalp-uipath sankalp-uipath Jul 1, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently working on adding the R2RML mapping which will resolve the entities from ontologies at the agent runtime by the llm node implicitly (I am working on it in separate PR and is currently in progress ).

@sankalp-uipath sankalp-uipath force-pushed the feat/datafabric-ontology-fetch-tool branch from fbb0bea to 9a4a187 Compare July 1, 2026 09:01
@sankalp-uipath sankalp-uipath force-pushed the feat/datafabric-ontology-fetch-tool branch from 9a4a187 to a35807b Compare July 1, 2026 10:04
@sonarqubecloud

sonarqubecloud Bot commented Jul 1, 2026

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
72.7% Coverage on New Code (required ≥ 90%)

See analysis details on SonarQube Cloud

@sankalp-uipath sankalp-uipath changed the title feat(datafabric): add fetch_ontology tool to DF inner SQL agent feat(datafabric): inject ontology schema into inner SQL agent system prompt Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants