Skip to content

feat(experimentation): environment-scoped metrics & experiment results#7674

Open
gagantrivedi wants to merge 8 commits into
mainfrom
feat/experiment-metrics
Open

feat(experimentation): environment-scoped metrics & experiment results#7674
gagantrivedi wants to merge 8 commits into
mainfrom
feat/experiment-metrics

Conversation

@gagantrivedi
Copy link
Copy Markdown
Member

What

Adds a reusable, environment-scoped Metric and wires metrics into experiments end to end, ClickHouse-native. Builds on the existing experimentation app (Experiment, WarehouseConnection).

Data model

  • Metric — environment-scoped, soft-delete. metric_type (numeric/conversion), aggregation (count/sum/mean), and a JSON definition (the recipe: event + optional filters/value/window). Immutable for now (no update endpoint).
  • ExperimentMetric — attaches a metric to an experiment with an expected_direction; unique per (experiment, metric).
  • MetricResultSnapshot — freezes computed results once an experiment completes.
  • Experiment gains exposure_event (default $flag_exposure) and control_variant.

API (gated on EXPERIMENT_FLAG + environment admin)

  • …/environments/{key}/experiment-metrics/ — metric library: list / create / retrieve / delete. (Not metrics/ — that path is taken by the usage-metrics viewset.) Deletion is blocked while attached to an active experiment.
  • …/experiments/{id}/metrics/ — attach / list / detach, with same-environment + unique-attach validation.
  • …/experiments/{id}/results/ — per-metric per-variant n/mean/variance, relative lift, confidence interval, and a per-metric verdict. Cached to a snapshot once completed.

Results engine

  • query.py builds the assignment (argMin first-touch on $flag_exposure.value) + metric CTEs from a metric definition. Untrusted values are bound params; LEFT JOIN … coalesce(…,0) keeps assigned-but-inactive identities as real zeros.
  • stats.py compares variants with a Welch/z two-sample test (CI included; for a 0/1 conversion column this reduces to a two-proportion z-test).

Scope notes (intentional cuts for v1)

  • Primary metrics only — no role/secondary/guardrail concept yet.
  • Metrics are immutable — no edit endpoint.
  • No metric validation dry-run yet.
  • Numeric count/sum/mean dedupe on a natural key to blunt at-least-once Firehose duplicates; residual collision risk documented in query.py. A per-event id in the ingest stream is the clean long-term fix.

Testing

  • 47 new unit tests (models, metric CRUD, attach/detach, SQL builder, stats, results, snapshot).
  • Full experimentation suite green (191 passed); mypy clean; migrations complete.

🤖 Generated with Claude Code

@gagantrivedi gagantrivedi requested review from a team as code owners June 2, 2026 09:50
@gagantrivedi gagantrivedi requested review from emyller and removed request for a team June 2, 2026 09:50
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 2, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments
Project Deployment Actions Updated (UTC)
docs Ignored Ignored Preview Jun 5, 2026 11:45am
flagsmith-frontend-preview Ignored Ignored Preview Jun 5, 2026 11:45am
flagsmith-frontend-staging Ignored Ignored Preview Jun 5, 2026 11:45am

Request Review

@github-actions github-actions Bot added api Issue related to the REST API infrastructure feature New feature or request and removed infrastructure labels Jun 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Docker builds report

Image Build Status Security report
ghcr.io/flagsmith/flagsmith-e2e:pr-7674 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith-api-test:pr-7674 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith-frontend:pr-7674 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-api:pr-7674 Finished ✅ Results
ghcr.io/flagsmith/flagsmith:pr-7674 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-private-cloud:pr-7674 Finished ✅ Results

@gagantrivedi gagantrivedi marked this pull request as draft June 2, 2026 09:53
@gagantrivedi gagantrivedi removed the request for review from emyller June 2, 2026 09:53
@gagantrivedi gagantrivedi assigned emyller and unassigned emyller Jun 2, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.53%. Comparing base (9bdf0f2) to head (de3e8b9).
⚠️ Report is 21 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7674      +/-   ##
==========================================
+ Coverage   98.52%   98.53%   +0.01%     
==========================================
  Files        1444     1451       +7     
  Lines       55083    55469     +386     
==========================================
+ Hits        54273    54659     +386     
  Misses        810      810              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Playwright Test Results (oss - depot-ubuntu-latest-16)

passed  1 passed

Details

stats  1 test across 1 suite
duration  33 seconds
commit  86ccf2a
info  🔄 Run: #17168 (attempt 1)

Playwright Test Results (oss - depot-ubuntu-latest-arm-16)

passed  1 passed

Details

stats  1 test across 1 suite
duration  35.1 seconds
commit  86ccf2a
info  🔄 Run: #17168 (attempt 1)

Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)

passed  1 passed

Details

stats  1 test across 1 suite
duration  39.8 seconds
commit  86ccf2a
info  🔄 Run: #17168 (attempt 1)

Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

passed  1 passed

Details

stats  1 test across 1 suite
duration  52.5 seconds
commit  86ccf2a
info  🔄 Run: #17168 (attempt 1)

Playwright Test Results (oss - depot-ubuntu-latest-16)

passed  1 passed

Details

stats  1 test across 1 suite
duration  40.7 seconds
commit  1fd7efb
info  🔄 Run: #17290 (attempt 1)

Playwright Test Results (oss - depot-ubuntu-latest-arm-16)

passed  1 passed

Details

stats  1 test across 1 suite
duration  37.5 seconds
commit  1fd7efb
info  🔄 Run: #17290 (attempt 1)

Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

passed  3 passed

Details

stats  3 tests across 3 suites
duration  32.9 seconds
commit  1fd7efb
info  🔄 Run: #17290 (attempt 1)

Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)

passed  2 passed

Details

stats  2 tests across 2 suites
duration  39.7 seconds
commit  1fd7efb
info  🔄 Run: #17290 (attempt 1)

Playwright Test Results (oss - depot-ubuntu-latest-16)

passed  1 passed

Details

stats  1 test across 1 suite
duration  33 seconds
commit  de3e8b9
info  🔄 Run: #17291 (attempt 1)

Playwright Test Results (oss - depot-ubuntu-latest-arm-16)

passed  1 passed

Details

stats  1 test across 1 suite
duration  43 seconds
commit  de3e8b9
info  🔄 Run: #17291 (attempt 1)

Playwright Test Results (private-cloud - depot-ubuntu-latest-16)

passed  3 passed

Details

stats  3 tests across 3 suites
duration  34.2 seconds
commit  de3e8b9
info  🔄 Run: #17291 (attempt 1)

Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)

passed  1 passed

Details

stats  1 test across 1 suite
duration  39.9 seconds
commit  de3e8b9
info  🔄 Run: #17291 (attempt 1)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

Visual Regression

19 screenshots compared. See report for details.
View full report

@gagantrivedi gagantrivedi force-pushed the feat/experiment-metrics branch from 86ccf2a to 435d91f Compare June 2, 2026 10:41
@github-actions github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 2, 2026
@gagantrivedi gagantrivedi force-pushed the feat/experiment-metrics branch from 435d91f to 9568226 Compare June 2, 2026 11:01
@github-actions github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 2, 2026
@gagantrivedi gagantrivedi force-pushed the feat/experiment-metrics branch from 9568226 to 2fea3fa Compare June 3, 2026 07:00
@github-actions github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 3, 2026
… attachment

Add a reusable, environment-scoped Metric and the ExperimentMetric join that
attaches metrics to experiments.

- Models: Metric (numeric; count/sum/mean/occurrence aggregations + JSON
  definition), ExperimentMetric (expected_direction; one attach per
  experiment+metric); Experiment gains exposure_event ($flag_exposure) and
  control_variant.
- Metric library CRUD under environments/{key}/experiment-metrics/, gated on
  EXPERIMENT_FLAG + environment admin. Metrics are immutable for now (no
  update); deletion blocked while attached to an active experiment.
- Attach/detach metrics under an experiment, with same-environment and
  unique-attach validation.

Results computation (ClickHouse query builder + statistics) is intentionally
kept on a separate branch; this branch is models + API only.
@gagantrivedi gagantrivedi force-pushed the feat/experiment-metrics branch from 2fea3fa to c961486 Compare June 3, 2026 07:21
@github-actions github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 3, 2026
Comment on lines +167 to +170
expected_direction = models.CharField(
max_length=20,
choices=ExpectedDirection.choices,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. So you see it in the relation Experiment x Metrics. I could see it too there although i'm wondering if that's a reality.

Let's say we have those metrics:

  • Conversion rate (up)
  • Average basket (up)
  • Time to activation (down)
  • First time page render (down)

Is there a world in which we'd want an experiment to push it in the other direction ? If not i'd stick it to the metrics and maybe have the possibility to override it in an experiment (in v2)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think i'm actually mixing 2 things:

  • the metric polarity (is it better up or is it better down as per what it is) -> I think we should also add this one
  • the experiment impact (it should go up, it should keep it same level, it should impact it down) especially as a guardrail => expected_direction that we should keep

@gagantrivedi
Copy link
Copy Markdown
Member Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Metric and ExperimentMetric models, along with corresponding API endpoints, serializers, permissions, audit logging, and unit tests to support experiment metrics. The review feedback highlights several critical improvements for robustness and business logic validation: using get_object_or_404 to handle missing or soft-deleted experiments cleanly, excluding soft-deleted experiments when checking for active metric attachments, adding defensive validation in ExperimentMetricSerializer (such as preventing modifications to completed experiments or attaching deleted metrics), and restricting the detachment of metrics from completed experiments.

Comment thread api/experimentation/views.py Outdated
Comment thread api/experimentation/views.py
Comment thread api/experimentation/serializers.py
Comment thread api/experimentation/views.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a first-pass “metric library” for experimentation by introducing environment-scoped Metric objects and an ExperimentMetric join model, then exposing CRUD / attach / detach APIs under environment + experiment routes with auditing and permissions.

Changes:

  • Add Metric + ExperimentMetric models (with migration) and extend audit related-object types.
  • Introduce metric library endpoints (/experiment-metrics/) and experiment metric attachment endpoints (/experiments/{id}/metrics/) with permissions, serializers, and audit logs.
  • Add unit tests covering metric CRUD, immutability expectations, attach/detach flows, and basic validation.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
api/tests/unit/experimentation/test_metric_views.py Adds API tests for environment-scoped metric library CRUD and permission / flag gating.
api/tests/unit/experimentation/test_metric_models.py Adds model-level tests for defaults, uniqueness, and soft-delete behavior.
api/tests/unit/experimentation/test_experiment_metric_views.py Adds API tests for attaching, listing, detaching, and updating experiment-metric relationships.
api/experimentation/views.py Adds MetricViewSet and ExperimentMetricViewSet and wires metric audit logging + delete-guard logic.
api/experimentation/services.py Adds create_metric_audit_log and reuses existing feature-flag helpers.
api/experimentation/serializers.py Adds MetricSerializer and ExperimentMetricSerializer with definition + attachment validation.
api/experimentation/permissions.py Adds MetricPermission to gate metric library endpoints on experiment flag + env admin.
api/experimentation/models.py Introduces MetricAggregation, ExpectedDirection, Metric, and ExperimentMetric.
api/experimentation/migrations/0005_metrics.py Creates DB tables for Metric and ExperimentMetric.
api/experimentation/metric_urls.py Registers the metric library router under the environment.
api/experimentation/experiment_urls.py Adds nested routes for /experiments/{id}/metrics/ using nested routers.
api/environments/urls.py Includes the new experiment-metrics URL module under environments.
api/audit/related_object_type.py Adds METRIC related object type for audit logs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread api/experimentation/views.py
Comment thread api/experimentation/views.py Outdated
Comment thread api/experimentation/serializers.py
@github-actions github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 5, 2026
@Zaimwa9
Copy link
Copy Markdown
Contributor

Zaimwa9 commented Jun 5, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the Metric and ExperimentMetric models, along with their corresponding serializers, viewsets, nested API routing, and audit logging, to support associating versioned metrics with experiments. Comprehensive unit tests and database migrations are also included. The review feedback suggests two key improvements: adding a guard check in ExperimentMetricSerializer.validate to prevent a potential AttributeError when metric is None during partial updates, and caching the retrieved Experiment instance in ExperimentMetricViewSet._get_experiment to avoid redundant database queries.

Comment thread api/experimentation/serializers.py
Comment thread api/experimentation/views.py Outdated
@github-actions github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 5, 2026
Comment on lines +24 to +26
METRIC_DEFINITION_VALIDATORS: dict[int, DefinitionValidator] = {
1: _validate_v1,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I see it to evolve the schema without breaking existing experiments is to add / create a new shape and version it here.
I can see the stat engine also use versioning to generate the queries. Wdyt ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was the idea

@Zaimwa9 Zaimwa9 marked this pull request as ready for review June 5, 2026 10:59
@github-actions github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 5, 2026
@github-actions github-actions Bot added feature New feature or request and removed feature New feature or request labels Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Issue related to the REST API feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants