UiPath · ajay-kesavan · May 21, 2026 · May 21, 2026 · May 22, 2026 · May 24, 2026
diff --git a/packages/uipath/pyproject.toml b/packages/uipath/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "uipath"
-version = "2.10.70"
+version = "2.10.72"
 description = "Python SDK and CLI for UiPath Platform, enabling programmatic interaction with automation services, process management, and deployment tools."
 readme = { file = "README.md", content-type = "text/markdown" }
 requires-python = ">=3.11"

diff --git a/packages/uipath/samples/classifier_demo/README.md b/packages/uipath/samples/classifier_demo/README.md
@@ -0,0 +1,136 @@
+# Classifier aggregator end-to-end demo
+
+A minimal intent-classification agent that exercises the new
+classification **aggregator** end-to-end. Use this as the test fixture for
+both SDK-only validation (Path A below) and Studio Web full-stack validation
+(Path B).
+
+## What's here
+
+```
+classifier_demo/
+├── main.py                       # 3-class keyword classifier
+├── uipath.json
+├── pyproject.toml
+├── bindings.json
+└── evaluations/
+    ├── eval-sets/
+    │   └── main.json             # 9 datapoints, 3 per class, some intentionally wrong
+    └── evaluators/
+        └── intent_match.json     # ExactMatch on agent_output.intent + classification aggregator
+```
+
+There is **one** evaluator. `intent_match` is an `ExactMatchEvaluator` whose
+`evaluatorConfig` carries an `aggregators: [{ name: "classification", classes: [...] }]`
+entry. Per datapoint, the evaluator emits a 1.0/0.0 score and an
+`ExactMatchJustification` whose `aggregators` field round-trips the config
+through to the downstream consumer (the C# layer in Studio Web), which builds
+a confusion matrix and precision / recall / F-score across the dataset.
+
+## Path A — SDK only (real run, ~30 seconds)
+
+```bash
+cd packages/uipath
+uv sync --all-extras
+
+cd samples/classifier_demo
+uv run --project ../.. uipath eval main main.json --no-report --output-file /tmp/out.json
+```
+
+Expected: a results table with a single `intent_match` column averaging 0.667
+(6/9 correct).
+
+To see the metadata payload that lands in the backend's
+`CodedEvaluatorScore.Justification`:
+
+```bash
+python3 -c "
+import json
+with open('/tmp/out.json') as f: d = json.load(f)
+for r in d['evaluationSetResults'][0]['evaluationRunResults']:
+    print(r['evaluatorName'], r['result'].get('details'))
+"
+```
+
+You should see entries like:
+
+```
+intent_match  {'expected': 'book', 'actual': 'book', 'aggregators': [{'name': 'classification', 'classes': ['book', 'cancel', 'reschedule']}]}
+```
+
+The `aggregators` list is identical on every datapoint by design — it's the
+mechanism by which the per-datapoint records carry the class set to the C#
+post-pass without requiring a separate evaluator-snapshot lookup.
+
+## Path B — Full Studio Web stack (real UI, click Run, see panel)
+
+The pieces below assume you have a local KinD cluster running per
+`Agents/LOCAL_DEVELOPMENT.md`.
+
+### Prereqs
+- Docker installed and running
+- `make` available
+- Azure CLI authenticated session (`az login`)
+- Azure DevOps PAT exported as `AZURE_DEVOPS_PAT`
+- GitHub NPM registry token exported as `GH_NPM_REGISTRY_TOKEN`
+- Azure access token exported as `AZURE_ACCESS_TOKEN` (for the python worker build)
+- `cloud-provider-kind` binary (used for the local KinD cluster)
+
+### Steps
+
+1. **Point python-eval-worker at the local SDK branch.** The published
+   `uipath` package on PyPI doesn't yet have the classification aggregator.
+   Edit `Agents/python-eval-worker/pyproject.toml`:
+
+   ```toml
+   [tool.uv.sources]
+   uipath = { path = "../../uipath-python/packages/uipath", editable = true }
+   ```
+
+   Then `cd python-eval-worker && uv lock && uv sync`.
+
+2. **Bring up the local KinD cluster** (from `Agents/`):
+   ```bash
+   make create-kind-cluster
+   kubectl get nodes
+   sudo ./bin/cloud-provider-kind &      # in a separate shell or background
+   make up
+   make deploy
+   ```
+
+3. **Build the backend with the classifier changes:**
+   ```bash
+   git checkout feat/eval-classifier-backend       # in Agents repo
+   # Re-trigger the helm/skaffold deploy for the backend
+   make deploy
+   ```
+
+4. **Build the frontend with the UI changes:**
+   ```bash
+   git checkout feat/eval-dataset-evaluators-ui    # in Agents repo
+   # Same deploy command rebuilds frontend image
+   ```
+
+5. **Open Studio Web** (URL surfaced by the deploy output), create an agent
+   project, upload the eval-set + evaluator JSONs from this directory (or
+   author them in the UI — the evaluator picker exposes an
+   "Aggregators" section on ExactMatch where the classification aggregator
+   can be attached with its class list), and click Run.
+
+6. **Verify** the Aggregations panel renders between the run header and the
+   datapoint table, with the confusion matrix matching what Path A's Python
+   payload encodes (macro F1 ≈ 0.667 on this fixture).
+
+### Open questions for the team owning local dev
+
+- Does the existing PAT / token set get refreshed automatically by the dev tooling, or do contributors need to rotate them periodically?
+- Is there a simpler "local-only" path that bypasses the KinD cluster (e.g. docker-compose) for changes that don't touch K8s manifests?
+- What's the standard pattern for pointing the python worker at a non-PyPI uipath build? The `[tool.uv.sources]` override above is the standard uv path — confirm there's no Helm/skaffold complication.
+
+## Companion PRs
+
+| Repo | Branch | PR | What |
+|---|---|---|---|
+| uipath-python | `feat/eval-classifier-evaluator` | [#1674](https://github.com/UiPath/uipath-python/pull/1674) | SDK `ExactMatch.aggregators` + `LegacyExactMatch.aggregators` |
+| Agents | `feat/eval-classifier-backend` | [#5313](https://github.com/UiPath/Agents/pull/5313) | C# math + activity + envelope storage |
+| Agents | `feat/eval-dataset-evaluators-ui` | [#5306](https://github.com/UiPath/Agents/pull/5306) | Frontend picker + Aggregations panel |
diff --git a/packages/uipath/samples/classifier_demo/bindings.json b/packages/uipath/samples/classifier_demo/bindings.json
@@ -0,0 +1,4 @@
+{
+  "version": "2.0",
+  "resources": []
+}
diff --git a/packages/uipath/samples/classifier_demo/evaluations/eval-sets/main.json b/packages/uipath/samples/classifier_demo/evaluations/eval-sets/main.json
@@ -0,0 +1,163 @@
+{
+  "version": "1.0",
+  "id": "classifier-demo-eval-set",
+  "name": "Classifier demo eval set",
+  "evaluatorRefs": [
+    "intent_match"
+  ],
+  "evaluations": [
+    {
+      "id": "book-1",
+      "name": "book — straightforward",
+      "inputs": {
+        "utterance": "I want to book a table for two"
+      },
+      "expectedOutput": {
+        "intent": "book"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "book"
+          }
+        }
+      }
+    },
+    {
+      "id": "book-2",
+      "name": "book — schedule keyword",
+      "inputs": {
+        "utterance": "Please schedule an appointment"
+      },
+      "expectedOutput": {
+        "intent": "book"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "book"
+          }
+        }
+      }
+    },
+    {
+      "id": "book-3",
+      "name": "book — agent misclassifies (utterance triggers cancel keyword)",
+      "inputs": {
+        "utterance": "I had to cancel my last attempt but I want to reserve a slot now"
+      },
+      "expectedOutput": {
+        "intent": "book"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "book"
+          }
+        }
+      }
+    },
+    {
+      "id": "cancel-1",
+      "name": "cancel — straightforward",
+      "inputs": {
+        "utterance": "Please cancel my reservation"
+      },
+      "expectedOutput": {
+        "intent": "cancel"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "cancel"
+          }
+        }
+      }
+    },
+    {
+      "id": "cancel-2",
+      "name": "cancel — void synonym",
+      "inputs": {
+        "utterance": "I want to void the order"
+      },
+      "expectedOutput": {
+        "intent": "cancel"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "cancel"
+          }
+        }
+      }
+    },
+    {
+      "id": "cancel-3",
+      "name": "cancel — agent misclassifies (utterance has 'move' which triggers reschedule)",
+      "inputs": {
+        "utterance": "I need to move past this and cancel everything"
+      },
+      "expectedOutput": {
+        "intent": "cancel"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "cancel"
+          }
+        }
+      }
+    },
+    {
+      "id": "reschedule-1",
+      "name": "reschedule — straightforward",
+      "inputs": {
+        "utterance": "I want to reschedule the meeting"
+      },
+      "expectedOutput": {
+        "intent": "reschedule"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "reschedule"
+          }
+        }
+      }
+    },
+    {
+      "id": "reschedule-2",
+      "name": "reschedule — move synonym",
+      "inputs": {
+        "utterance": "Can we move the slot to tomorrow"
+      },
+      "expectedOutput": {
+        "intent": "reschedule"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "reschedule"
+          }
+        }
+      }
+    },
+    {
+      "id": "reschedule-3",
+      "name": "reschedule — agent misclassifies (falls through to default 'book')",
+      "inputs": {
+        "utterance": "Different timing please"
+      },
+      "expectedOutput": {
+        "intent": "reschedule"
+      },
+      "evaluationCriterias": {
+        "intent_match": {
+          "expectedOutput": {
+            "intent": "reschedule"
+          }
+        }
+      }
+    }
+  ]
+}
diff --git a/packages/uipath/samples/classifier_demo/evaluations/evaluators/intent_match.json b/packages/uipath/samples/classifier_demo/evaluations/evaluators/intent_match.json
@@ -0,0 +1,21 @@
+{
+  "version": "1.0",
+  "id": "intent_match",
+  "description": "Per-datapoint ExactMatch on the agent's `intent` output. The attached classification aggregator carries the class list to the downstream backend, which builds a confusion matrix and precision/recall/F-score across the dataset.",
+  "evaluatorTypeId": "uipath-exact-match",
+  "evaluatorConfig": {
+    "name": "intent_match",
+    "targetOutputKey": "intent",
+    "caseSensitive": false,
+    "negated": false,
+    "defaultEvaluationCriteria": {
+      "expectedOutput": "book"
+    },
+    "aggregators": [
+      {
+        "name": "classification",
+        "classes": ["book", "cancel", "reschedule"]
+      }
+    ]
+  }
+}
diff --git a/packages/uipath/samples/classifier_demo/main.py b/packages/uipath/samples/classifier_demo/main.py
@@ -0,0 +1,42 @@
+"""Tiny intent-classification agent for the ClassifierEvaluator demo.
+
+Given an utterance, returns the intent label. Three intents:
+  - book        (anything containing "book" / "reserve" / "schedule")
+  - cancel      (anything containing "cancel" / "void")
+  - reschedule  (anything containing "reschedule" / "move")
+
+A few datapoints are deliberately misclassified so the run-level
+classification metrics (precision/recall/F-score) come out non-trivially.
+"""
+
+from dataclasses import dataclass
+
+
+@dataclass
+class IntentInput:
+    utterance: str
+
+
+@dataclass
+class IntentOutput:
+    intent: str
+
+
+BOOK_KEYWORDS = {"book", "reserve", "schedule"}
+CANCEL_KEYWORDS = {"cancel", "void"}
+RESCHEDULE_KEYWORDS = {"reschedule", "move"}
+
+
+async def main(input: IntentInput) -> IntentOutput:
+    """Classify the utterance into book / cancel / reschedule."""
+    text = input.utterance.lower()
+    tokens = set(text.split())
+
+    if tokens & RESCHEDULE_KEYWORDS:
+        return IntentOutput(intent="reschedule")
+    if tokens & CANCEL_KEYWORDS:
+        return IntentOutput(intent="cancel")
+    if tokens & BOOK_KEYWORDS:
+        return IntentOutput(intent="book")
+    # Fallback to "book" — deliberately wrong-ish so the matrix is interesting.
+    return IntentOutput(intent="book")
diff --git a/packages/uipath/samples/classifier_demo/pyproject.toml b/packages/uipath/samples/classifier_demo/pyproject.toml
@@ -0,0 +1,9 @@
+[project]
+name = "classifier-demo"
+version = "0.0.1"
+description = "Tiny intent-classification agent that exercises the new ClassifierEvaluator end-to-end via `uipath eval`."
+requires-python = ">=3.11"
+dependencies = ["uipath"]
+
+[dependency-groups]
+dev = ["uipath-dev"]
diff --git a/packages/uipath/samples/classifier_demo/uipath.json b/packages/uipath/samples/classifier_demo/uipath.json
@@ -0,0 +1,5 @@
+{
+  "functions": {
+    "main": "main.py:main"
+  }
+}