feat: track AI generation outcome (kept vs discarded) and retry depth#2865
Open
ineagu wants to merge 2 commits into
Open
feat: track AI generation outcome (kept vs discarded) and retry depth#2865ineagu wants to merge 2 commits into
ineagu wants to merge 2 commits into
Conversation
Adds kept-vs-discarded outcome and regenerate retry-depth signals to the AI block via the existing oTrk accumulator. No new free-text capture; no consent bypass; preset ids allowlisted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Bundle Size Diff
|
Contributor
|
Plugin build for 56dd0f2 is ready 🛎️!
|
Contributor
E2E TestsPlaywright Test Status: See serial and parallel matrix jobs Performance ResultsserverResponse: {"q25":445.6,"q50":453.3,"q75":482.6,"cnt":10}, firstPaint: {"q25":519.3,"q50":588.65,"q75":648.6,"cnt":10}, domContentLoaded: {"q25":3366.1,"q50":3396.75,"q75":3434.7,"cnt":10}, loaded: {"q25":3366.8,"q50":3397.25,"q75":3435.2,"cnt":10}, firstContentfulPaint: {"q25":8940.6,"q50":9018.4,"q75":9047.8,"cnt":10}, firstBlock: {"q25":13497.7,"q50":13527.65,"q75":13560.7,"cnt":10}, type: {"q25":21.35,"q50":22.88,"q75":24.9,"cnt":10}, typeWithoutInspector: {"q25":17.31,"q50":18.99,"q75":19.83,"cnt":10}, typeWithTopToolbar: {"q25":28.01,"q50":28.75,"q75":30.08,"cnt":10}, typeContainer: {"q25":12.5,"q50":13.56,"q75":14.91,"cnt":10}, focus: {"q25":98.44,"q50":102.13,"q75":105.2,"cnt":10}, inserterOpen: {"q25":35.13,"q50":36.08,"q75":38.39,"cnt":10}, inserterSearch: {"q25":11.75,"q50":12.04,"q75":12.84,"cnt":10}, inserterHover: {"q25":4.35,"q50":4.59,"q75":4.7,"cnt":20}, loadPatterns: {"q25":1464.87,"q50":1505.17,"q75":1560.56,"cnt":10}, listViewOpen: {"q25":203.76,"q50":206.44,"q75":213.36,"cnt":10} |
- emit retry depth on the success path with a high-water mark (no clobber by a later reset; failed/aborted regenerations no longer inflate it) - track 'discard' on real block removal (covers toolbar/Backspace, not just the in-panel X) using a live output ref synced from the prompt component - reset accepted state on each new generation; record 'insert'/'replace' only after the action actually runs Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Outcome
When a user runs the AI block (content generator), we currently see that a prompt was sent, but we have no signal about what happened next: did the user actually keep the generated content, or throw it away? And how many times did they have to regenerate before they were satisfied (or gave up)?
This PR closes that gap by recording the outcome of each AI generation — kept (inserted/replaced) vs. discarded — plus a coarse retry-depth bucket for how many regenerations a session took.
The product question this answers: Is AI generation producing output people actually use, and how much retrying does it take to get there? It informs where to invest in prompt quality and model tuning — a high discard rate or a fat tail of 4+ retries flags presets that aren't pulling their weight, while a healthy kept rate validates the feature. It also lets us compare outcomes per preset (form / textTransformation / patternsPicker) so improvement effort can be targeted.
What changed
src/blocks/blocks/content-generator/edit.jsallowedPromptID()allowlist helper that maps the preset id to one ofform|textTransformation|patternsPicker, falling back toother— so an arbitrary string can never reach the tracking wire.trackingKey(attributes?.id ?? clientId) used only as the dedup key for the tracking sets (never sent as a value), and ahasAcceptedref so a later discard can't clobber a prior accept.replaceBlocks) and when content is inserted into the page (insertContentIntoPage).trackingKeyandhasAcceptedRefdown toPromptPlaceholder.useMemoimport.src/blocks/components/prompt/index.tsxallowedPromptID()allowlist helper and aretryBucket()helper that buckets the regenerate count into the coarse enum0|1|2-3|4+.retryCountref that resets on the first generation and increments on each regenerate; fires a retry-depth event on every generation.trackingKeyandhasAcceptedRefprops toPromptPlaceholderProps.onClose— but only when there was generated output to throw away (resultHistory?.length > 0) and the generation wasn't already accepted (! hasAcceptedRef?.current).Telemetry event shapes added
All events are emitted via
window.oTrk?.set(...)(the dedup key in backticks is the per-sessiontrackingKey, not part of the payload):Outcome — kept by replacing the block:
Outcome — kept by inserting into the page:
Outcome — discarded (closed with output present, not previously accepted):
Retry depth (fired on each generation/regeneration):
Compliance
featureValueis a fixed enum (replace/insert/discard, and the bucketed retry counts), and everyfeatureComponentis built from an allowlisted preset id — any unknown id collapses toother, so freeform strings can't leak onto the wire. The retry count is bucketed (0/1/2-3/4+) rather than reported raw.window.oTrk?.set(...)with no{ consent: true }argument, so they flow through the standardotter_blocks_logger_flagopt-in gate (the same flag surfaced in the welcome guide and dashboard "anonymous data tracking" toggle). They are dropped entirely when the user hasn't opted in. This is deliberately different from the pre-existingpromptandai-toolbarevents in this codebase, which pass{ consent: true }to bypass the gate — none of the events added here use that bypass.trackingKey(attributes?.id ?? clientId) is used only locally as the in-memory dedup key forset(); it is never included in any event payload.Test plan
A reviewer can validate in the block editor with the AI (content generator) block, with anonymous data tracking enabled (
otter_blocks_logger_flag = yes, e.g. via the welcome guide or the dashboard toggle):feature: 'ai-generation', featureComponent: 'regenerate-count'withfeatureValueadvancing0 → 1 → 2-3 → 4+.outcome-<preset>event withfeatureValue: 'insert'.outcome-<preset>withfeatureValue: 'replace'.outcome-<preset>withfeatureValue: 'discard'. Closing with no generated output should fire nothing.discardevent fires after an accept (guarded byhasAccepted).featureComponentis one ofoutcome-form/outcome-textTransformation/outcome-patternsPicker/outcome-otherand never contains a raw/arbitrary id.Observe events on the wire (network requests to the
tiTrktracking endpoint) or by instrumentingwindow.oTrk.set. Aggregated events surface downstream in the usual telemetry pipeline / Metabase.Related
Part of the telemetry-expansion roadmap, following the data-logging pattern established in PR #2862 (block add/remove tracking) and reusing the same
oTrk(tiTrk.with('otter')) plumbing andotter_blocks_logger_flagconsent gate. Sibling to the existingai-generationevents (prompt,ai-toolbar) — this PR adds the missing outcome and retry-depth dimensions on top of them.🤖 Generated with Claude Code