Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion packages/uipath/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "uipath"
version = "2.10.70"
version = "2.10.72"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 L1 — version bump + conflict

2.10.70 → 2.10.72; the comment in the original commit notes .71 was an unused dev cache-bust. Branch is CONFLICTING and this line will collide with #1632's → 2.10.68. Rebase before merge.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack — handing the rebase + version-bump conflict back to @ajay-kesavan to resolve locally. Leaving this thread open until the rebase lands so the conversation tracks the final version number.

description = "Python SDK and CLI for UiPath Platform, enabling programmatic interaction with automation services, process management, and deployment tools."
readme = { file = "README.md", content-type = "text/markdown" }
requires-python = ">=3.11"
Expand Down
136 changes: 136 additions & 0 deletions packages/uipath/samples/classifier_demo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Classifier aggregator end-to-end demo

A minimal intent-classification agent that exercises the new
classification **aggregator** end-to-end. Use this as the test fixture for
both SDK-only validation (Path A below) and Studio Web full-stack validation
(Path B).

## What's here

```
classifier_demo/
├── main.py # 3-class keyword classifier
├── uipath.json
├── pyproject.toml
├── bindings.json
└── evaluations/
├── eval-sets/
│ └── main.json # 9 datapoints, 3 per class, some intentionally wrong
└── evaluators/
└── intent_match.json # ExactMatch on agent_output.intent + classification aggregator
```

There is **one** evaluator. `intent_match` is an `ExactMatchEvaluator` whose
`evaluatorConfig` carries an `aggregators: [{ name: "classification", classes: [...] }]`
entry. Per datapoint, the evaluator emits a 1.0/0.0 score and an
`ExactMatchJustification` whose `aggregators` field round-trips the config
through to the downstream consumer (the C# layer in Studio Web), which builds
a confusion matrix and precision / recall / F-score across the dataset.

## Path A — SDK only (real run, ~30 seconds)

```bash
cd packages/uipath
uv sync --all-extras

cd samples/classifier_demo
uv run --project ../.. uipath eval main main.json --no-report --output-file /tmp/out.json
```

Expected: a results table with a single `intent_match` column averaging 0.667
(6/9 correct).

To see the metadata payload that lands in the backend's
`CodedEvaluatorScore.Justification`:

```bash
python3 -c "
import json
with open('/tmp/out.json') as f: d = json.load(f)
for r in d['evaluationSetResults'][0]['evaluationRunResults']:
print(r['evaluatorName'], r['result'].get('details'))
"
```

You should see entries like:

```
intent_match {'expected': 'book', 'actual': 'book', 'aggregators': [{'name': 'classification', 'classes': ['book', 'cancel', 'reschedule']}]}
```

The `aggregators` list is identical on every datapoint by design — it's the
mechanism by which the per-datapoint records carry the class set to the C#
post-pass without requiring a separate evaluator-snapshot lookup.

## Path B — Full Studio Web stack (real UI, click Run, see panel)

The pieces below assume you have a local KinD cluster running per
`Agents/LOCAL_DEVELOPMENT.md`.

### Prereqs
- Docker installed and running
- `make` available
- Azure CLI authenticated session (`az login`)
- Azure DevOps PAT exported as `AZURE_DEVOPS_PAT`
- GitHub NPM registry token exported as `GH_NPM_REGISTRY_TOKEN`
- Azure access token exported as `AZURE_ACCESS_TOKEN` (for the python worker build)
- `cloud-provider-kind` binary (used for the local KinD cluster)

### Steps

1. **Point python-eval-worker at the local SDK branch.** The published
`uipath` package on PyPI doesn't yet have the classification aggregator.
Edit `Agents/python-eval-worker/pyproject.toml`:

```toml
[tool.uv.sources]
uipath = { path = "../../uipath-python/packages/uipath", editable = true }
```

Then `cd python-eval-worker && uv lock && uv sync`.

2. **Bring up the local KinD cluster** (from `Agents/`):
```bash
make create-kind-cluster
kubectl get nodes
sudo ./bin/cloud-provider-kind & # in a separate shell or background
make up
make deploy
```

3. **Build the backend with the classifier changes:**
```bash
git checkout feat/eval-classifier-backend # in Agents repo
# Re-trigger the helm/skaffold deploy for the backend
make deploy
```

4. **Build the frontend with the UI changes:**
```bash
git checkout feat/eval-dataset-evaluators-ui # in Agents repo
# Same deploy command rebuilds frontend image
```

5. **Open Studio Web** (URL surfaced by the deploy output), create an agent
project, upload the eval-set + evaluator JSONs from this directory (or
author them in the UI — the evaluator picker exposes an
"Aggregators" section on ExactMatch where the classification aggregator
can be attached with its class list), and click Run.

6. **Verify** the Aggregations panel renders between the run header and the
datapoint table, with the confusion matrix matching what Path A's Python
payload encodes (macro F1 ≈ 0.667 on this fixture).

### Open questions for the team owning local dev

- Does the existing PAT / token set get refreshed automatically by the dev tooling, or do contributors need to rotate them periodically?
- Is there a simpler "local-only" path that bypasses the KinD cluster (e.g. docker-compose) for changes that don't touch K8s manifests?
- What's the standard pattern for pointing the python worker at a non-PyPI uipath build? The `[tool.uv.sources]` override above is the standard uv path — confirm there's no Helm/skaffold complication.

## Companion PRs

| Repo | Branch | PR | What |
|---|---|---|---|
| uipath-python | `feat/eval-classifier-evaluator` | [#1674](https://github.com/UiPath/uipath-python/pull/1674) | SDK `ExactMatch.aggregators` + `LegacyExactMatch.aggregators` |
| Agents | `feat/eval-classifier-backend` | [#5313](https://github.com/UiPath/Agents/pull/5313) | C# math + activity + envelope storage |
| Agents | `feat/eval-dataset-evaluators-ui` | [#5306](https://github.com/UiPath/Agents/pull/5306) | Frontend picker + Aggregations panel |
4 changes: 4 additions & 0 deletions packages/uipath/samples/classifier_demo/bindings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"version": "2.0",
"resources": []
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
{
"version": "1.0",
"id": "classifier-demo-eval-set",
"name": "Classifier demo eval set",
"evaluatorRefs": [
"intent_match"
],
"evaluations": [
{
"id": "book-1",
"name": "book — straightforward",
"inputs": {
"utterance": "I want to book a table for two"
},
"expectedOutput": {
"intent": "book"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "book"
}
}
}
},
{
"id": "book-2",
"name": "book — schedule keyword",
"inputs": {
"utterance": "Please schedule an appointment"
},
"expectedOutput": {
"intent": "book"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "book"
}
}
}
},
{
"id": "book-3",
"name": "book — agent misclassifies (utterance triggers cancel keyword)",
"inputs": {
"utterance": "I had to cancel my last attempt but I want to reserve a slot now"
},
"expectedOutput": {
"intent": "book"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "book"
}
}
}
},
{
"id": "cancel-1",
"name": "cancel — straightforward",
"inputs": {
"utterance": "Please cancel my reservation"
},
"expectedOutput": {
"intent": "cancel"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "cancel"
}
}
}
},
{
"id": "cancel-2",
"name": "cancel — void synonym",
"inputs": {
"utterance": "I want to void the order"
},
"expectedOutput": {
"intent": "cancel"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "cancel"
}
}
}
},
{
"id": "cancel-3",
"name": "cancel — agent misclassifies (utterance has 'move' which triggers reschedule)",
"inputs": {
"utterance": "I need to move past this and cancel everything"
},
"expectedOutput": {
"intent": "cancel"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "cancel"
}
}
}
},
{
"id": "reschedule-1",
"name": "reschedule — straightforward",
"inputs": {
"utterance": "I want to reschedule the meeting"
},
"expectedOutput": {
"intent": "reschedule"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "reschedule"
}
}
}
},
{
"id": "reschedule-2",
"name": "reschedule — move synonym",
"inputs": {
"utterance": "Can we move the slot to tomorrow"
},
"expectedOutput": {
"intent": "reschedule"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "reschedule"
}
}
}
},
{
"id": "reschedule-3",
"name": "reschedule — agent misclassifies (falls through to default 'book')",
"inputs": {
"utterance": "Different timing please"
},
"expectedOutput": {
"intent": "reschedule"
},
"evaluationCriterias": {
"intent_match": {
"expectedOutput": {
"intent": "reschedule"
}
}
}
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"version": "1.0",
"id": "intent_match",
"description": "Per-datapoint ExactMatch on the agent's `intent` output. The attached classification aggregator carries the class list to the downstream backend, which builds a confusion matrix and precision/recall/F-score across the dataset.",
"evaluatorTypeId": "uipath-exact-match",
"evaluatorConfig": {
Comment thread
ajay-kesavan marked this conversation as resolved.
"name": "intent_match",
"targetOutputKey": "intent",
"caseSensitive": false,
"negated": false,
"defaultEvaluationCriteria": {
"expectedOutput": "book"
},
"aggregators": [
{
"name": "classification",
"classes": ["book", "cancel", "reschedule"]
}
]
}
}
42 changes: 42 additions & 0 deletions packages/uipath/samples/classifier_demo/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
"""Tiny intent-classification agent for the ClassifierEvaluator demo.

Given an utterance, returns the intent label. Three intents:
- book (anything containing "book" / "reserve" / "schedule")
- cancel (anything containing "cancel" / "void")
- reschedule (anything containing "reschedule" / "move")

A few datapoints are deliberately misclassified so the run-level
classification metrics (precision/recall/F-score) come out non-trivially.
"""

from dataclasses import dataclass


@dataclass
class IntentInput:
utterance: str


@dataclass
class IntentOutput:
intent: str


BOOK_KEYWORDS = {"book", "reserve", "schedule"}
CANCEL_KEYWORDS = {"cancel", "void"}
RESCHEDULE_KEYWORDS = {"reschedule", "move"}


async def main(input: IntentInput) -> IntentOutput:
"""Classify the utterance into book / cancel / reschedule."""
text = input.utterance.lower()
tokens = set(text.split())

if tokens & RESCHEDULE_KEYWORDS:
return IntentOutput(intent="reschedule")
if tokens & CANCEL_KEYWORDS:
return IntentOutput(intent="cancel")
if tokens & BOOK_KEYWORDS:
return IntentOutput(intent="book")
# Fallback to "book" — deliberately wrong-ish so the matrix is interesting.
return IntentOutput(intent="book")
9 changes: 9 additions & 0 deletions packages/uipath/samples/classifier_demo/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[project]
name = "classifier-demo"
version = "0.0.1"
description = "Tiny intent-classification agent that exercises the new ClassifierEvaluator end-to-end via `uipath eval`."
requires-python = ">=3.11"
dependencies = ["uipath"]

[dependency-groups]
dev = ["uipath-dev"]
5 changes: 5 additions & 0 deletions packages/uipath/samples/classifier_demo/uipath.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"functions": {
"main": "main.py:main"
}
}
Loading