Skip to content

feat: Data Fabric native entity write tool (P1 LDO writes)#947

Draft
UIPath-Harshit wants to merge 6 commits into
mainfrom
worktree-agent-aaa2a776
Draft

feat: Data Fabric native entity write tool (P1 LDO writes)#947
UIPath-Harshit wants to merge 6 commits into
mainfrom
worktree-agent-aaa2a776

Conversation

@UIPath-Harshit

@UIPath-Harshit UIPath-Harshit commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a write_datafabric tool alongside the existing query_datafabric read tool, enabling agents to perform structured CRUD (insert/update/delete) against native Data Fabric entities. Writes use structured mutation intent delegated to EntitiesService native CRUD — no LLM-generated DML.

Includes the optional OWL ontology compiler layer from write RFC v2.

P1 LDO writes coded-agent POC. Status: blocked on customer discovery (DS-8541) — draft for implementation review while discovery completes.

Two layers

1. Metadata-driven writes (always on)

Everything structural is derived from entity metadata — no ontology required:

  • is_entity_writable — native-only (excludes federated/ChoiceSet/system)
  • derive_writable_fields — filters system/hidden/PK/attachment; surfaces ChoiceSet bindings
  • validate_mutation_intent — entity allowlist, required-field + field-allowlist checks, record_id rules
  • WriteExecutor — insert/update/delete via EntitiesService
  • build_write_tool_description — NL intermediate representation (replaces raw OWL injection)

2. Ontology layer (optional, graceful fallback)

The OWL ontology is the authoring/storage format, compiled into a structured CompiledOntology — NOT injected as raw OWL (research shows LLMs reason poorly over raw OWL Turtle, 3-10% accuracy).

  • compiled_ontology.pyCompiledOntology (entity_access, measure_fields, state_fields, reference_fields, hitl_operations, entity_relationships)
  • ontology_compiler.pycompile_ontology(owl_turtle) via rdflib. Extracts what metadata can't: allowed operations per entity, field semantics (state/measure/reference), HITL-on-destructive markers, entity relationships. Supports both ontology dialects (.ttl subClassOf+actions, and RFC a df:WritableEntity+df:allowsOperation).
  • Wired into DataFabricWriteHandler via best-effort get_ontology_file_async. This platform method does not yet exist (only on a feature branch) — so the handler degrades gracefully to metadata-only (compiled_ontology stays None). The build does not break.
  • validate_mutation_intent gains optional compiled_ontology — rejects operations not in entity_access. State-transition validation deferred to v3.

Modified

File Change
models.py DataFabricWriteInput, WriteResult, WritableFieldInfo, EntityWriteSchema, EntityWriteOperation
datafabric_tool.py DataFabricWriteHandler (lazy resolution + ontology compile), create_datafabric_tools()
datafabric_prompt_builder.py build_write_context()
context_tool.py / tool_factory.py route to tool-list return; HITL propagation
pyproject.toml rdflib>=7.0.0

Testing — 109 tests passing

  • test_write_validation.py (35) — writability, field derivation, validation, ontology-constrained ops
  • test_write_integration.py (21) — tool creation, args schema, HITL, federated rejection
  • test_write_schema_builder.py (17) — NL description generation
  • test_ontology_compiler.py (23) — refund + order-management + RFC dialects, graceful/malformed paths
  • test_refund_agent_integ.py (12) — contact-center refund hero case
  • test_write_executor.py (6) — CRUD via EntitiesService mock

Also validated end-to-end on staging via CLI (df-agent-os/tests/integ_refund_agent.sh). The compiler was verified against the actual design ontology df-agent-os/roadmap/p1-owl-write-extension.ttl (and fixed a Turtle syntax bug in that artifact — adjacent-string-literal concatenation).

Open questions (RFC §10)

  1. Is ontologySet required for writes, or is metadata-only the permanent fallback?
  2. Measure fields (additive semantics): runtime read-modify-write vs LLM responsibility via SOP?
  3. ChoiceSet value validation at write time — pending live value resolution.

🤖 Generated with Claude Code

Adds a write_datafabric tool alongside the existing query_datafabric read
tool for structured CRUD (insert/update/delete) against native Data Fabric
entities. Writes use structured mutation intent delegated to EntitiesService
native CRUD — no LLM-generated DML.

Key components:
- DataFabricWriteInput / WriteResult / EntityWriteSchema models
- is_entity_writable: native-only (excludes federated, ChoiceSet, system)
- derive_writable_fields: filters system/hidden/PK/attachment fields,
  surfaces ChoiceSet bindings
- validate_mutation_intent: entity allowlist, required-field and
  field-allowlist checks, record_id requirements per operation
- WriteExecutor: insert/update/delete via EntitiesService
- build_write_tool_description: NL intermediate representation for the
  tool description (replaces raw OWL injection per write RFC v2)
- DataFabricWriteHandler: lazy entity resolution; writability enforced
  after async resolution since entity_type/external_fields are only on
  resolved Entity objects
- create_datafabric_tools: returns [read_tool, write_tool]
- HITL: require_conversational_confirmation propagated for conversational
  agents

87 tests including the contact-center refund hero case (read 4 entities,
decide, write RefundRequest + update Order/CustomerRisk/Contact).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
89.8% Coverage on New Code (required ≥ 90%)

See analysis details on SonarQube Cloud

UIPath-Harshit and others added 5 commits June 27, 2026 11:46
Builds the ontology layer from write RFC v2. The OWL ontology is the
authoring/storage format; it is compiled into a CompiledOntology
intermediate representation (NOT injected as raw OWL — research shows
LLMs reason poorly over raw OWL Turtle).

- compiled_ontology.py: CompiledOntology model (entity_access,
  measure_fields, state_fields, reference_fields, hitl_operations,
  entity_relationships) per RFC §5.2
- ontology_compiler.py: compile_ontology(owl_turtle) via rdflib.
  Supports both ontology dialects — the .ttl dialect (rdfs:subClassOf
  df:WritableEntity + action-derived ops + df:hasField) and the RFC
  dialect (a df:WritableEntity + df:allowsOperation). Resilient to
  partial annotations; raises OntologyCompileError only on malformed
  Turtle.
- write_validation.py: validate_mutation_intent gains optional
  compiled_ontology — rejects operations not in entity_access. State
  transition validation deferred to v3 (documented TODO).
- datafabric_tool.py: DataFabricWriteHandler best-effort fetches +
  compiles the ontology via get_ontology_file_async. The method is
  absent from the current platform package, so this degrades gracefully
  to the metadata-only path (compiled_ontology stays None) — the build
  does not break.
- rdflib>=7.0.0 added to dependencies.

23 new tests (refund + order-management dialects, RFC dialect, graceful
paths). 109 datafabric_tool tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scripts/run_agent_with_ontology.py bridges the gap until the platform
ships ontology storage/fetch: it monkeypatches
EntitiesService.get_ontology_file_async (currently absent) to return a
user-supplied .ttl, which activates the real _maybe_compile_ontology
path in DataFabricWriteHandler — the ontology is compiled and used in
write validation + tool description exactly as it will be once the
platform method lands.

Usage:
  python scripts/run_agent_with_ontology.py \
      --ontology PATH.ttl --entity-set PATH.json --prompt "..." \
      [--model NAME] [--system-prompt PATH.txt] [--dry-run]

--dry-run compiles the ontology and prints the extracted facts
(entity_access, hitl_operations, state/reference/measure fields,
relationships) WITHOUT network. The live run needs UiPath auth env
vars + real tenant entity ids.

Companions:
- sample_refund_entity_set.json (hero-case entities, placeholder ids)
- sample_refund_sop.txt (refund SOP from RFC §4.3)
- README_run_agent_with_ontology.md (mechanism + offline/real run steps)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two write-path bugs found while validating the ontology POC end-to-end
against live staging, both fixed:

1. Read schema stripped the Id primary key (write_validation system-field
   filter), so the NL-to-SQL model (a) invented a non-existent 'rowid'
   column for ORDER BY on paginated reads (FQS 400) and (b) never returned
   Id, leaving the agent no record_id for updates/deletes.
   Fix: retain the primary key for WRITABLE entities in the read schema
   (SELECT it, ORDER BY it); keep other system fields hidden; read-only
   entities unchanged. P3 collision guard for user/CSV fields sharing a
   system field name. Harden is_entity_writable with getattr.

2. Write executor called the CRUD endpoint with the entity NAME, but
   .../EntityService/entity/{id}/insert requires the GUID id ("not valid"
   400). Fix: handler maps entity name -> id before executing, restores the
   friendly name on the result.

Verified on staging (dataservicetest/DataFabricFQS): with both fixes the
refund flow's insert + 3 updates all persist (read-back confirmed). The
ontology compiles, activates, and correctly governs tool selection
(RefundRequest insert-only; Order/Risk/Contact update; Customer read-only,
never written).

POC harness (scripts/):
- poc_refund_setup.sh / poc_refund_teardown.sh — create+seed / delete the
  5 refund entities, emit ontology + entity-set (referenceKey=GUID) + ids
- poc_refund_drive.py — drive the real write handler with the ontology
  active, verify by read-back (deterministic; no LLM)
- run_agent_with_ontology.py — full LLM-in-the-loop variant; gains
  --agenthub-config (LLM-gateway licensing OpCode; without it the gateway
  403s) and recursion_limit
- POC_README.md — env setup + the three run levels + the known agent-loop
  gap (create_agent does not auto-execute the terminal write batch; that is
  runtime plumbing, not the ontology/write tool)

Tests: 740 passed. New: read-schema PK retention (writable vs read-only,
other-system-fields-hidden, collision-not-duplicated, rowid-free ORDER BY)
and name->id translation for CRUD.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t, debug, read-flow wiring

Extends the ontology layer per review feedback:

1. df:ReadableEntity is now first-class. CompiledOntology gains
   `known_entities` (every df:entityKey the ontology declares) plus
   is_known / is_writable / is_read_only helpers. Previously a read-only
   entity was indistinguishable from one the ontology never mentioned —
   both were merely absent from entity_access.

2. Read-only is enforced, not advisory. validate_mutation_intent rejects
   a write to an entity the ontology knows but grants no write ops; the
   write handler prunes such entities from write_schemas so they never
   appear in the write tool description. (Verified: Customer is excluded
   and a direct update is rejected.)

3. Debug output. CompiledOntology.to_human_readable() + module-level
   format_ontology_debug(owl, compiled) render the raw OWL Turtle and a
   human-readable IR (entities + access modes, measure/state/reference
   field semantics, relationships). Logged at DEBUG in the fetch/compile
   path and printed by both POC scripts during a run.

4. Ontology wired into the READ flow (reads still go through the existing
   NL-to-SQL path — ontology enriches, does not restrict). Shared
   maybe_fetch_and_compile_ontology helper used by both handlers; the read
   handler threads CompiledOntology into DataFabricGraph.create ->
   datafabric_prompt_builder, which emits an "## Ontology Context" section
   (access modes, relationships, FK/reference targets, state-value sources)
   for schema linking (P5).

Also: poc_refund_drive.py verification read-back now addresses entities by
GUID (get_record_async requires the id, not the name).

Validated live on staging (dataservicetest/addyTest): debug IR shows
Customer READ-ONLY; Customer pruned from writes + write rejected; refund
flow insert + 3 updates persist, 4/4 verified. 752 tests pass, ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Makes the full LLM-in-the-loop refund flow persist writes end-to-end
(verified on staging dataservicetest/DataFabricFQS: insert RefundRequest +
update Order/CustomerRisk/Contact all success, read-back confirmed).

Root cause of the prior "writes planned but not dispatched": the write tool
hardcoded `require_conversational_confirmation: True`, whose tool-node gate
calls request_approval -> @durable_interrupt, suspending the graph for human
approval. In a non-conversational/coded agent (no human/checkpointer) the
graph suspended at the first write and ainvoke returned without executing it.

- datafabric_tool.py: drop the unconditional `require_conversational_
  confirmation` from the write tool metadata. HITL confirmation is still
  applied per-resource for conversational agents by tool_factory; it is no
  longer forced on coded agents (where it can only deadlock). Deterministic
  guardrails remain: writability checks, ontology op-validation, field
  allowlist, read-only enforcement.
- run_agent_with_ontology.py: add --trace (DEBUG logging surfaces the inner
  NL->SQL generated SQL per read), --api-flavor (default chat-completions),
  and print tool RESPONSES (not just calls) so reads/writes are visible.
- test: assert the write tool no longer hardcodes the confirmation flag.

752 tests pass, ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant