-
Notifications
You must be signed in to change notification settings - Fork 35
feat(datafabric): inject ontology schema into inner SQL agent system prompt #911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
c6e73eb
b67e170
da19087
4c22b8f
68f7cbf
ab77d65
40acdec
7a5bb69
04f79c5
0ed6210
e9c4cfb
be5ef26
1fd7a30
a871a0a
dfdd3d6
a07adb9
54db78f
941f3ff
86e5912
826f036
e57d1b0
2f41f40
7fab6d5
a35807b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,13 @@ | ||
| """Data Fabric tool module for entity-based SQL queries.""" | ||
|
|
||
| from .datafabric_tool import ( | ||
| DATAFABRIC_ONTOLOGY_FF, | ||
| create_datafabric_query_tool, | ||
| resolve_context_ontologies, | ||
| ) | ||
|
|
||
| __all__ = [ | ||
| "DATAFABRIC_ONTOLOGY_FF", | ||
| "create_datafabric_query_tool", | ||
| "resolve_context_ontologies", | ||
| ] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,6 +29,34 @@ | |
|
|
||
| BASE_SYSTEM_PROMPT = "base_system_prompt" | ||
|
|
||
| # Feature flag gating the Data Fabric ontology grounding feature. Defaults off. | ||
| # Checked at every entry into the feature: ontology resolution (context_tool) | ||
| # and inner-tool binding (datafabric_subgraph). Single source of truth so the | ||
| # flag name can never drift between call sites. | ||
| DATAFABRIC_ONTOLOGY_FF = "DataFabricOntologyEnabled" | ||
|
|
||
|
|
||
| def resolve_context_ontologies( | ||
| resources: list[Any], | ||
| ) -> list[tuple[str, str | None]]: | ||
| """Gather ontologies from the agent's ontology context(s). | ||
|
|
||
| An ontology is configured in a dedicated ontology context (``contextType`` | ||
| ``datafabricontology``) whose ``ontologySet`` mirrors the entity context's | ||
| ``entitySet`` — by convention at most one such context per agent. Its | ||
| ontologies ground the Data Fabric query tool; each carries its own | ||
| ``folderId``, so it is fetched from its own folder. | ||
| """ | ||
| ontologies: list[tuple[str, str | None]] = [] | ||
| for resource in resources: | ||
| if ( | ||
| isinstance(resource, AgentContextResourceConfig) | ||
| and resource.is_datafabric_ontology | ||
| ): | ||
| for item in resource.ontology_set or []: | ||
| ontologies.append((item.name, item.folder_key)) | ||
| return ontologies | ||
|
Comment on lines
+50
to
+58
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if I understand correctly, we implicitly assume all topologies will apply to this data service entity context. Shouldn't the link be more explicitly defined? IE either:
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am currently working on adding the R2RML mapping which will resolve the entities from ontologies at the agent runtime by the llm node implicitly (I am working on it in separate PR and is currently in progress ). |
||
|
|
||
|
|
||
| class DataFabricTextQueryHandler: | ||
| """Manages lazy initialization and invocation of the Data Fabric sub-graph. | ||
|
|
@@ -44,11 +72,13 @@ def __init__( | |
| llm: BaseChatModel, | ||
| resource_description: str = "", | ||
| base_system_prompt: str = "", | ||
| ontologies: list[tuple[str, str | None]] | None = None, | ||
| ) -> None: | ||
| self._entity_set = entity_set | ||
| self._llm = llm | ||
| self._resource_description = resource_description | ||
| self._base_system_prompt = base_system_prompt | ||
| self._ontologies = ontologies or [] | ||
| self._compiled: CompiledStateGraph[Any] | None = None | ||
| self._init_lock = asyncio.Lock() | ||
|
|
||
|
|
@@ -65,9 +95,11 @@ async def _ensure_datafabric_graph(self) -> CompiledStateGraph[Any]: | |
| if self._compiled is not None: | ||
| return self._compiled | ||
|
|
||
| from uipath.core.feature_flags import FeatureFlags | ||
| from uipath.platform import UiPath | ||
|
|
||
| from .datafabric_subgraph import DataFabricGraph | ||
| from .ontology_fetcher import fetch_ontology_text | ||
|
|
||
| sdk = UiPath() | ||
| resolution = await sdk.entities.resolve_entity_set_async(self._entity_set) | ||
|
|
@@ -76,12 +108,23 @@ async def _ensure_datafabric_graph(self) -> CompiledStateGraph[Any]: | |
| "No Data Fabric entity schemas could be fetched. " | ||
| "Check entity identifiers and permissions." | ||
| ) | ||
| # Deterministically fetch the ontology (when configured AND the flag | ||
| # is on) and embed it in the inner system prompt — the LLM never has | ||
| # to decide to fetch it. | ||
| ontology_text = "" | ||
| if self._ontologies and FeatureFlags.is_flag_enabled( | ||
| DATAFABRIC_ONTOLOGY_FF, default=False | ||
| ): | ||
| ontology_text = await fetch_ontology_text( | ||
| resolution.entities_service, self._ontologies | ||
| ) | ||
| self._compiled = DataFabricGraph.create( | ||
| llm=self._llm, | ||
| entities=resolution.entities, | ||
| entities_service=resolution.entities_service, | ||
| resource_description=self._resource_description, | ||
| base_system_prompt=self._base_system_prompt, | ||
| ontology_text=ontology_text, | ||
| ) | ||
| return self._compiled | ||
|
|
||
|
|
@@ -144,6 +187,7 @@ def create_datafabric_query_tool( | |
| llm: BaseChatModel, | ||
| tool_name: str = "query_datafabric", | ||
| agent_config: dict[str, str] | None = None, | ||
| ontologies: list[tuple[str, str | None]] | None = None, | ||
| ) -> BaseTool: | ||
| """Create the ``query_datafabric`` agentic tool. | ||
|
|
||
|
|
@@ -153,17 +197,23 @@ def create_datafabric_query_tool( | |
| tool_name: Sanitized tool name from the resource. | ||
| agent_config: Optional dict with agent-level config. | ||
| Key ``base_system_prompt`` carries the outer agent's system prompt. | ||
| ontologies: ``(name, folder_key)`` pairs resolved from the context's | ||
| nested ``ontology_set`` (see ``resolve_context_ontologies``). | ||
| Empty/None → no fetch tool is added. Resolution comes only from the | ||
| agent definition (the binding), never from process env. | ||
| """ | ||
| config = agent_config or {} | ||
| entity_set = [ | ||
| DataFabricEntityItem.model_validate(item.model_dump(by_alias=True)) | ||
| for item in (resource.entity_set or []) | ||
| ] | ||
| ontologies = ontologies or [] | ||
| handler = DataFabricTextQueryHandler( | ||
| entity_set=entity_set, | ||
| llm=llm, | ||
| resource_description=resource.description or "", | ||
| base_system_prompt=config.get(BASE_SYSTEM_PROMPT, ""), | ||
| ontologies=ontologies, | ||
| ) | ||
| entity_lines = [] | ||
| for e in entity_set: | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| """Fetches ontology OWL schemas from Data Fabric for prompt injection. | ||
|
|
||
| A Data Fabric context may attach one or more ontologies (mirroring the entity | ||
| set). This module fetches each configured ontology's OWL via the SDK | ||
| (``EntitiesService.get_ontology_file_async``) and returns them concatenated, | ||
| ready to embed in the inner SQL agent's system prompt. | ||
|
|
||
| Fetching is deterministic — done once when the sub-graph is built — rather than | ||
| an LLM-decided tool call, so the model always has the ontology in context. | ||
| Ontology names/folders are pinned from configuration, never supplied by the LLM. | ||
| """ | ||
|
|
||
| import asyncio | ||
| import logging | ||
|
|
||
| from uipath.platform.entities import EntitiesService | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| # Defensive cap per ontology so a malformed/oversized OWL can't blow up the | ||
| # prompt/token budget. | ||
| _MAX_OWL_BYTES = 1_000_000 | ||
|
|
||
|
|
||
| def _notation_label(media_type: str) -> str: | ||
| """Best-effort label for the OWL serialization (Turtle or OFN).""" | ||
| mt = (media_type or "").lower() | ||
| if "turtle" in mt or mt.endswith("ttl"): | ||
| return "Turtle" | ||
| if "functional" in mt or "ofn" in mt: | ||
| return "OWL Functional Notation" | ||
| return "Turtle or OWL Functional Notation" | ||
|
|
||
|
|
||
| async def _fetch_one( | ||
| entities_service: EntitiesService, name: str, folder_key: str | None | ||
| ) -> str: | ||
| try: | ||
| data = await entities_service.get_ontology_file_async(name, "owl", folder_key) | ||
| owl = data.get("content") or "" | ||
| media_type = data.get("mediaType") or "" | ||
| if len(owl.encode("utf-8")) > _MAX_OWL_BYTES: | ||
| raise ValueError(f"Ontology '{name}' OWL exceeds the size limit.") | ||
| except Exception as e: | ||
| logger.warning("Ontology fetch failed for %r: %s", name, e) | ||
| return ( | ||
| f"Ontology '{name}' is unavailable ({type(e).__name__}). " | ||
| "Proceed using the entity schemas in the system prompt." | ||
| ) | ||
| notation = _notation_label(media_type) | ||
| return f"--- ONTOLOGY: {name} ({notation}) ---\n{owl}\n--- END ONTOLOGY: {name} ---" | ||
|
|
||
|
|
||
| async def fetch_ontology_text( | ||
| entities_service: EntitiesService, | ||
| ontologies: list[tuple[str, str | None]], | ||
| ) -> str: | ||
| """Fetch and concatenate the OWL of every configured ontology. | ||
|
|
||
| Args: | ||
| entities_service: Authenticated SDK service used for the REST call. | ||
| ontologies: ``(name, folder_key)`` pairs to fetch (pinned from config). | ||
|
|
||
| Returns: | ||
| The concatenated ontology text ready for prompt injection, or ``""`` when | ||
| no ontologies are configured. Individual fetch failures degrade to a | ||
| short "unavailable, use entity schemas" note rather than raising, so a | ||
| missing ontology never fails the run. | ||
| """ | ||
| if not ontologies: | ||
| return "" | ||
| # Fetch concurrently — each fetch is independent; gather preserves order so | ||
| # the concatenation is deterministic. | ||
| blocks = await asyncio.gather( | ||
| *(_fetch_one(entities_service, name, folder) for name, folder in ontologies) | ||
| ) | ||
| return "\n\n".join(blocks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is not a standalone tool at runtime, I think it is confusing to model it as a top level resource at design time. So far, all "resource nodes" in a lowcode agent (either standalone or part of flow), are independently executable and show up in traces. This is now a different paradigm, it is an optional helper tool that will be part of another tool's subgraph.
That being said this only applies to how it's modeled today. If we indeed plan to expand ontology support in the future such that they will actually allow queries (via something like SPARQL statements for instance); then it will be better for future proofing to define them top level (at least in the package mapping). We can figure out a less confusing design time experience for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have plan to expand ontology support to make it a primary design experience i.e, user will select the ontologies and then it will resolve the entities internally, thus decision of making it top level resource as a part of iterative development.