feat: Data Fabric native entity write tool (P1 LDO writes)#947
Draft
UIPath-Harshit wants to merge 6 commits into
Draft
feat: Data Fabric native entity write tool (P1 LDO writes)#947UIPath-Harshit wants to merge 6 commits into
UIPath-Harshit wants to merge 6 commits into
Conversation
Adds a write_datafabric tool alongside the existing query_datafabric read tool for structured CRUD (insert/update/delete) against native Data Fabric entities. Writes use structured mutation intent delegated to EntitiesService native CRUD — no LLM-generated DML. Key components: - DataFabricWriteInput / WriteResult / EntityWriteSchema models - is_entity_writable: native-only (excludes federated, ChoiceSet, system) - derive_writable_fields: filters system/hidden/PK/attachment fields, surfaces ChoiceSet bindings - validate_mutation_intent: entity allowlist, required-field and field-allowlist checks, record_id requirements per operation - WriteExecutor: insert/update/delete via EntitiesService - build_write_tool_description: NL intermediate representation for the tool description (replaces raw OWL injection per write RFC v2) - DataFabricWriteHandler: lazy entity resolution; writability enforced after async resolution since entity_type/external_fields are only on resolved Entity objects - create_datafabric_tools: returns [read_tool, write_tool] - HITL: require_conversational_confirmation propagated for conversational agents 87 tests including the contact-center refund hero case (read 4 entities, decide, write RefundRequest + update Order/CustomerRisk/Contact). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Builds the ontology layer from write RFC v2. The OWL ontology is the authoring/storage format; it is compiled into a CompiledOntology intermediate representation (NOT injected as raw OWL — research shows LLMs reason poorly over raw OWL Turtle). - compiled_ontology.py: CompiledOntology model (entity_access, measure_fields, state_fields, reference_fields, hitl_operations, entity_relationships) per RFC §5.2 - ontology_compiler.py: compile_ontology(owl_turtle) via rdflib. Supports both ontology dialects — the .ttl dialect (rdfs:subClassOf df:WritableEntity + action-derived ops + df:hasField) and the RFC dialect (a df:WritableEntity + df:allowsOperation). Resilient to partial annotations; raises OntologyCompileError only on malformed Turtle. - write_validation.py: validate_mutation_intent gains optional compiled_ontology — rejects operations not in entity_access. State transition validation deferred to v3 (documented TODO). - datafabric_tool.py: DataFabricWriteHandler best-effort fetches + compiles the ontology via get_ontology_file_async. The method is absent from the current platform package, so this degrades gracefully to the metadata-only path (compiled_ontology stays None) — the build does not break. - rdflib>=7.0.0 added to dependencies. 23 new tests (refund + order-management dialects, RFC dialect, graceful paths). 109 datafabric_tool tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scripts/run_agent_with_ontology.py bridges the gap until the platform
ships ontology storage/fetch: it monkeypatches
EntitiesService.get_ontology_file_async (currently absent) to return a
user-supplied .ttl, which activates the real _maybe_compile_ontology
path in DataFabricWriteHandler — the ontology is compiled and used in
write validation + tool description exactly as it will be once the
platform method lands.
Usage:
python scripts/run_agent_with_ontology.py \
--ontology PATH.ttl --entity-set PATH.json --prompt "..." \
[--model NAME] [--system-prompt PATH.txt] [--dry-run]
--dry-run compiles the ontology and prints the extracted facts
(entity_access, hitl_operations, state/reference/measure fields,
relationships) WITHOUT network. The live run needs UiPath auth env
vars + real tenant entity ids.
Companions:
- sample_refund_entity_set.json (hero-case entities, placeholder ids)
- sample_refund_sop.txt (refund SOP from RFC §4.3)
- README_run_agent_with_ontology.md (mechanism + offline/real run steps)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two write-path bugs found while validating the ontology POC end-to-end
against live staging, both fixed:
1. Read schema stripped the Id primary key (write_validation system-field
filter), so the NL-to-SQL model (a) invented a non-existent 'rowid'
column for ORDER BY on paginated reads (FQS 400) and (b) never returned
Id, leaving the agent no record_id for updates/deletes.
Fix: retain the primary key for WRITABLE entities in the read schema
(SELECT it, ORDER BY it); keep other system fields hidden; read-only
entities unchanged. P3 collision guard for user/CSV fields sharing a
system field name. Harden is_entity_writable with getattr.
2. Write executor called the CRUD endpoint with the entity NAME, but
.../EntityService/entity/{id}/insert requires the GUID id ("not valid"
400). Fix: handler maps entity name -> id before executing, restores the
friendly name on the result.
Verified on staging (dataservicetest/DataFabricFQS): with both fixes the
refund flow's insert + 3 updates all persist (read-back confirmed). The
ontology compiles, activates, and correctly governs tool selection
(RefundRequest insert-only; Order/Risk/Contact update; Customer read-only,
never written).
POC harness (scripts/):
- poc_refund_setup.sh / poc_refund_teardown.sh — create+seed / delete the
5 refund entities, emit ontology + entity-set (referenceKey=GUID) + ids
- poc_refund_drive.py — drive the real write handler with the ontology
active, verify by read-back (deterministic; no LLM)
- run_agent_with_ontology.py — full LLM-in-the-loop variant; gains
--agenthub-config (LLM-gateway licensing OpCode; without it the gateway
403s) and recursion_limit
- POC_README.md — env setup + the three run levels + the known agent-loop
gap (create_agent does not auto-execute the terminal write batch; that is
runtime plumbing, not the ontology/write tool)
Tests: 740 passed. New: read-schema PK retention (writable vs read-only,
other-system-fields-hidden, collision-not-duplicated, rowid-free ORDER BY)
and name->id translation for CRUD.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t, debug, read-flow wiring Extends the ontology layer per review feedback: 1. df:ReadableEntity is now first-class. CompiledOntology gains `known_entities` (every df:entityKey the ontology declares) plus is_known / is_writable / is_read_only helpers. Previously a read-only entity was indistinguishable from one the ontology never mentioned — both were merely absent from entity_access. 2. Read-only is enforced, not advisory. validate_mutation_intent rejects a write to an entity the ontology knows but grants no write ops; the write handler prunes such entities from write_schemas so they never appear in the write tool description. (Verified: Customer is excluded and a direct update is rejected.) 3. Debug output. CompiledOntology.to_human_readable() + module-level format_ontology_debug(owl, compiled) render the raw OWL Turtle and a human-readable IR (entities + access modes, measure/state/reference field semantics, relationships). Logged at DEBUG in the fetch/compile path and printed by both POC scripts during a run. 4. Ontology wired into the READ flow (reads still go through the existing NL-to-SQL path — ontology enriches, does not restrict). Shared maybe_fetch_and_compile_ontology helper used by both handlers; the read handler threads CompiledOntology into DataFabricGraph.create -> datafabric_prompt_builder, which emits an "## Ontology Context" section (access modes, relationships, FK/reference targets, state-value sources) for schema linking (P5). Also: poc_refund_drive.py verification read-back now addresses entities by GUID (get_record_async requires the id, not the name). Validated live on staging (dataservicetest/addyTest): debug IR shows Customer READ-ONLY; Customer pruned from writes + write rejected; refund flow insert + 3 updates persist, 4/4 verified. 752 tests pass, ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Makes the full LLM-in-the-loop refund flow persist writes end-to-end (verified on staging dataservicetest/DataFabricFQS: insert RefundRequest + update Order/CustomerRisk/Contact all success, read-back confirmed). Root cause of the prior "writes planned but not dispatched": the write tool hardcoded `require_conversational_confirmation: True`, whose tool-node gate calls request_approval -> @durable_interrupt, suspending the graph for human approval. In a non-conversational/coded agent (no human/checkpointer) the graph suspended at the first write and ainvoke returned without executing it. - datafabric_tool.py: drop the unconditional `require_conversational_ confirmation` from the write tool metadata. HITL confirmation is still applied per-resource for conversational agents by tool_factory; it is no longer forced on coded agents (where it can only deadlock). Deterministic guardrails remain: writability checks, ontology op-validation, field allowlist, read-only enforcement. - run_agent_with_ontology.py: add --trace (DEBUG logging surfaces the inner NL->SQL generated SQL per read), --api-flavor (default chat-completions), and print tool RESPONSES (not just calls) so reads/writes are visible. - test: assert the write tool no longer hardcodes the confirmation flag. 752 tests pass, ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
Adds a
write_datafabrictool alongside the existingquery_datafabricread tool, enabling agents to perform structured CRUD (insert/update/delete) against native Data Fabric entities. Writes use structured mutation intent delegated toEntitiesServicenative CRUD — no LLM-generated DML.Includes the optional OWL ontology compiler layer from write RFC v2.
P1 LDO writes coded-agent POC. Status: blocked on customer discovery (DS-8541) — draft for implementation review while discovery completes.
Two layers
1. Metadata-driven writes (always on)
Everything structural is derived from entity metadata — no ontology required:
is_entity_writable— native-only (excludes federated/ChoiceSet/system)derive_writable_fields— filters system/hidden/PK/attachment; surfaces ChoiceSet bindingsvalidate_mutation_intent— entity allowlist, required-field + field-allowlist checks, record_id rulesWriteExecutor— insert/update/delete via EntitiesServicebuild_write_tool_description— NL intermediate representation (replaces raw OWL injection)2. Ontology layer (optional, graceful fallback)
The OWL ontology is the authoring/storage format, compiled into a structured
CompiledOntology— NOT injected as raw OWL (research shows LLMs reason poorly over raw OWL Turtle, 3-10% accuracy).compiled_ontology.py—CompiledOntology(entity_access, measure_fields, state_fields, reference_fields, hitl_operations, entity_relationships)ontology_compiler.py—compile_ontology(owl_turtle)via rdflib. Extracts what metadata can't: allowed operations per entity, field semantics (state/measure/reference), HITL-on-destructive markers, entity relationships. Supports both ontology dialects (.ttlsubClassOf+actions, and RFCa df:WritableEntity+df:allowsOperation).DataFabricWriteHandlervia best-effortget_ontology_file_async. This platform method does not yet exist (only on a feature branch) — so the handler degrades gracefully to metadata-only (compiled_ontologystaysNone). The build does not break.validate_mutation_intentgains optionalcompiled_ontology— rejects operations not inentity_access. State-transition validation deferred to v3.Modified
models.pyDataFabricWriteInput,WriteResult,WritableFieldInfo,EntityWriteSchema,EntityWriteOperationdatafabric_tool.pyDataFabricWriteHandler(lazy resolution + ontology compile),create_datafabric_tools()datafabric_prompt_builder.pybuild_write_context()context_tool.py/tool_factory.pypyproject.tomlrdflib>=7.0.0Testing — 109 tests passing
test_write_validation.py(35) — writability, field derivation, validation, ontology-constrained opstest_write_integration.py(21) — tool creation, args schema, HITL, federated rejectiontest_write_schema_builder.py(17) — NL description generationtest_ontology_compiler.py(23) — refund + order-management + RFC dialects, graceful/malformed pathstest_refund_agent_integ.py(12) — contact-center refund hero casetest_write_executor.py(6) — CRUD via EntitiesService mockAlso validated end-to-end on staging via CLI (
df-agent-os/tests/integ_refund_agent.sh). The compiler was verified against the actual design ontologydf-agent-os/roadmap/p1-owl-write-extension.ttl(and fixed a Turtle syntax bug in that artifact — adjacent-string-literal concatenation).Open questions (RFC §10)
ontologySetrequired for writes, or is metadata-only the permanent fallback?🤖 Generated with Claude Code