feat(flagd): extract evaluator into api, core, and testkit packages by aepfli · Pull Request #377 · open-feature/python-sdk-contrib

aepfli · 2026-04-08T12:10:26Z

Summary

Mirrors the Java SDK contrib architecture (PR #1696, PR #1742) by extracting the flagd evaluation logic into three independent packages:

openfeature-flagd-api (tools/openfeature-flagd-api/): Evaluator Protocol defining the contract for flag evaluation, so others can implement their own evaluator
openfeature-flagd-core (tools/openfeature-flagd-core/): Reference implementation (FlagdCore) with targeting engine and custom operators (fractional v2, sem_ver, starts_with, ends_with)
openfeature-flagd-api-testkit (tools/openfeature-flagd-api-testkit/): Compliance test suite bundling gherkin feature files from the test-harness evaluator/ directory, with pytest-bdd step definitions — installable as a package so custom evaluator implementations can run the same compliance suite

Provider refactoring

InProcessResolver now delegates evaluation to FlagdCore via an adapter pattern
Old modules (flags.py, targeting.py, custom_ops.py) are thin re-exports from core for backward compatibility
Connectors (FileWatcher, GrpcWatcher) remain unchanged
No changes to gRPC resolvers, config, or other provider functionality

Fractional bucketing

flagd-core implements the v2 fractional algorithm (unsigned hash, integer arithmetic with (hash * totalWeight) >> 32)
Includes MAX_WEIGHT_SUM overflow guard, negative weight clamping, explicit bool-as-weight rejection
v1 fractional tests are deselected since the implementation is v2

CI & release

Added tools/* packages to the build workflow matrix (lint, mypy, tests on Python 3.10–3.14)
Added py.typed marker files for PEP 561 compliance
Replaced project.scripts with poethepoet tasks to match CI conventions
Added release-please config for all 3 new packages
Added tools/* to UV workspace members

Other changes

Updated test-harness submodule to v3.5.0 (adds evaluator/ directory with gherkin feature files)
Dropped Python 3.9 support for tools packages (aligned with rest of project)
Schemas and spec submodules kept at main (no changes)

Test plan

openfeature-flagd-api unit tests: 10 passed
openfeature-flagd-api-testkit smoke tests: 2 passed, mypy clean
openfeature-flagd-core unit tests: 27 passed
openfeature-flagd-core e2e (testkit compliance): 85 passed, 15 deselected (fractional-v1), 0 failures
Provider unit tests: no regressions
Lint: ruff check + ruff format clean
Type checking: mypy strict clean for all 3 tools packages

How to use the testkit (for custom evaluator implementations)

# conftest.py
from openfeature.contrib.tools.flagd.testkit import load_testkit_flags
from openfeature.contrib.tools.flagd.testkit.steps import *  # noqa: F403

@pytest.fixture
def evaluator():
    core = MyCustomEvaluator()
    core.set_flags(load_testkit_flags())
    return core

🤖 Generated with Claude Code

gemini-code-assist

Code Review

This pull request introduces a modular architecture for the flagd provider by extracting core evaluation logic and API definitions into separate tools packages. It replaces the internal FlagStore with a new FlagdCore implementation and adds a testkit for compliance testing. I have no feedback to provide on the changes.

codecov · 2026-04-15T17:19:01Z

Codecov Report

❌ Patch coverage is 96.77419% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.09%. Comparing base (564eb68) to head (65f4052).

Files with missing lines	Patch %	Lines
...openfeature/contrib/tools/flagd/core/flagd_core.py	93.26%	7 Missing ⚠️
...trib/tools/flagd/testkit/steps/evaluation_steps.py	93.44%	4 Missing ⚠️
...openfeature/contrib/tools/flagd/core/model/flag.py	94.73%	3 Missing ⚠️
...ature/contrib/tools/flagd/core/model/flag_store.py	97.05%	1 Missing ⚠️
...e/contrib/tools/flagd/core/targeting/custom_ops.py	99.21%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #377      +/-   ##
==========================================
+ Coverage   95.91%   96.09%   +0.18%     
==========================================
  Files          30       42      +12     
  Lines        1517     1563      +46     
==========================================
+ Hits         1455     1502      +47     
+ Misses         62       61       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Split the flagd evaluation logic from the provider into three independent packages under tools/, mirroring the Java SDK contrib architecture (PRs #1696 and #1742): - openfeature-flagd-api: Evaluator Protocol defining the contract for flag evaluation implementations - openfeature-flagd-core: Reference implementation with FlagdCore class, targeting engine, and custom operators (fractional, sem_ver, starts_with, ends_with) - openfeature-flagd-api-testkit: Compliance test suite bundling gherkin feature files from the test-harness evaluator directory The provider's InProcessResolver now delegates to FlagdCore via an adapter pattern, keeping connector code (FileWatcher, GrpcWatcher) unchanged. Old provider modules (flags.py, targeting.py, custom_ops.py) are thin re-exports from the core package for backward compatibility. Also updates the test-harness submodule from v2.11.1 to v3.5.0. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

- Implement fractional v2 bucketing algorithm (unsigned hash, integer arithmetic with bit-shift instead of percentage-based float) - Add MAX_WEIGHT_SUM overflow guard - Add negative weight clamping (max(0, weight)) - Add explicit bool-as-weight rejection - Support non-string variant types (str|float|int|bool|None) - Extract _resolve_bucket_by helper - Bump mmh3 dependency to >=5.0.0,<6.0.0 - Drop Python 3.9: update requires-python to >=3.10 for all tools packages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

- Fix ruff violations: UP007 (modern type unions), N818 (rename FlagStoreException to FlagStoreError), FURB171 (simplify membership test), PERF401 (use list comprehension), S101 (allow assert in steps) - Add py.typed marker files for PEP 561 compliance - Revert protobuf evaluation.v2 imports/config back to v1 - Run ruff format on all affected files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

- Add tools/openfeature-flagd-{api,core,api-testkit} to build matrix - Replace project.scripts with poe tasks to match CI expectations - Add poethepoet dev dependency to all tools packages - Remove obsolete scripts.py files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

flagd-core implements the v2 fractional bucketing algorithm, so v1 test expectations don't match. Deselect @fractional-v1 tagged tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

The testkit is a test library, not a test suite. CI's `poe cov` failed with "no data collected" because tests/ was empty. Add smoke tests to verify the testkit can be imported and returns valid data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

Fix mypy errors: add return types, parameter types, use ErrorCode enum instead of string, and cast Mapping to dict for indexed assignment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

aepfli · 2026-04-15T17:53:57Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the openfeature-flagd-api and openfeature-flagd-core packages to provide a modular evaluator implementation for flagd. It also refactors the existing openfeature-provider-flagd to utilize these new core components. I have provided feedback to improve performance by allowing flag configuration to be passed as a dictionary, avoiding redundant serialization, and suggested fixes for regex escaping and exception handling in the core implementation.

- Accept str | dict in Evaluator protocol and FlagdCore, eliminating the dict->JSON->dict roundtrip in _FlagStoreAdapter - Fix ReferenceError handler: use exception instance, not class, and log flag.targeting instead of the function object - Escape evaluator names in $ref regex replacement (re.escape) - Fix backward-compat FlagStore to emit changed_keys, not all keys - Fix README import paths (was api.testkit, should be testkit) - Add content to flagd-core CHANGELOG.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

aepfli · 2026-04-15T18:30:31Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the flagd provider by extracting its core evaluation logic into three new standalone packages: openfeature-flagd-api (protocol definition), openfeature-flagd-core (reference implementation), and openfeature-flagd-api-testkit (compliance suite). The InProcessResolver has been updated to use the new FlagdCore evaluator, and existing modules have been refactored to re-export logic for backward compatibility. Review feedback highlights a potential breaking change in the FlagStore.update method signature and suggests a correction for JSON parsing in the testkit utilities.

Replace vendored feature/flag files with a hatch build hook that copies them from the test-harness submodule's evaluator/ directory. Files are gitignored and generated fresh on each build via force_include. Also: - Update test-harness submodule to v3.5.0 (adds @fractional-v1/v2 tags) - Add fractional-v1 deselect to provider pytest.ini - Remove redundant flags.py re-export from testkit - Address review feedback (dict passthrough, ReferenceError fix, etc.) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

The hatch build hook includes files in sdist/wheel via force_include, but tests run from the source tree. Add a sync script that copies files from the test-harness submodule, called by poe before test/cov. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

flagd-core e2e tests depend on testkit feature files which are generated from the test-harness submodule, not checked in. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

Explain why the replace('\"', '"') is needed — pytest-bdd preserves backslash escapes from Gherkin table cells. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

toddbaert · 2026-04-16T20:32:09Z

I'm reviewing this, will complete tomorrow. Thanks @aepfli

toddbaert · 2026-04-17T16:56:19Z

+
+    def update(self, flags_data: dict) -> None:
+        json_str = json.dumps(flags_data)
+        changed_keys = self.evaluator.set_flags_and_get_changed_keys(json_str)


set_flags_and_get_changed_keys already accepts dict, so this serializes to JSON only for FlagdCore to immediately deserialize it. Could pass flags_data directly and skip the round-trip.

toddbaert · 2026-04-17T16:56:19Z

@@ -0,0 +1,6 @@
+class FlagStoreError(Exception):


Is this raised anywhere? I don't see it used in flagd-core. If it's part of the Evaluator contract, it should be documented (when should implementors raise it?). If not, consider removing it.

toddbaert · 2026-04-17T16:56:19Z

+    def set_flags_and_get_changed_keys(
+        self, flag_configuration: str | dict[str, typing.Any]
+    ) -> list[str]:
+        with self._lock:


This JSON parsing block is duplicated from set_flags above (lines 71-75). Consider extracting a _parse_flag_configuration helper.

toddbaert · 2026-04-17T16:56:19Z

+    default_value: T,
+    metadata: Mapping[str, float | int | str | bool],
+    reason: Reason,
+) -> FlagResolutionDetails:


nit: return type should be FlagResolutionDetails[T] (parameterized) for type safety.

toddbaert · 2026-04-17T17:07:56Z

This mirrors the Java api/core/testkit split, which is great. One thing worth thinking through early: this creates an inter-package dependency chain (provider -> core -> api), and that has ongoing management consequences.

The provider now depends on "openfeature-flagd-core" with no version constraint, and core depends on "openfeature-flagd-api" the same way. Any time core adds or changes an API that the provider uses, you need to coordinate: bump core, release it, then bump the provider's minimum core version. Miss that, and users can end up with a core version that's missing the method the provider calls; in Python that's a silent runtime AttributeError (no compile-time safety net like Java).

The tool.uv.sources workspace resolution means dev and CI always use the local copy, so these version mismatches won't surface until users install from PyPI. You'd need a dedicated CI job installing published versions to catch this.

Some concrete suggestions:

Pin minimum versions now: "openfeature-flagd-core>=0.1.0", "openfeature-flagd-api>=0.1.0". Establish the convention that these get bumped whenever a new core/api surface is consumed.
Consider compatible release constraints (~=0.1) during 0.x to limit blast radius, since any minor bump is potentially breaking per semver.
In Java and JS we ended up cutting a 1.0 for core specifically so we could set a meaningful upper bound and semver contract. We could even start core at 1.0 here to avoid the 0.x ambiguity from the outset; that way the provider can declare something like >=1.0,<2 and breaking changes are communicated clearly.

None of this is necessarily blocking, but the release coordination overhead is real (we've felt it in Java), and it's easier to set up the guardrails now than to retrofit them after a few broken releases.

toddbaert · 2026-04-17T17:22:50Z

I pulled this locally and did a careful review. I found a few issues worth fixing I think, but fundamentally I'm close to approving. Please consider this carefully... WDYT about going right to 1.0? Or going to 1.0 very quickly after merging?

gemini-code-assist bot reviewed Apr 8, 2026

View reviewed changes

aepfli force-pushed the feat/extract-flagd-evaluator-api-core-testkit branch from cd14cf4 to 4cd2612 Compare April 15, 2026 17:18

github-actions bot assigned aepfli and federicobond Apr 15, 2026

github-actions bot requested a review from federicobond April 15, 2026 17:18

aepfli and others added 2 commits April 15, 2026 19:25

aepfli force-pushed the feat/extract-flagd-evaluator-api-core-testkit branch from ad8727a to f34c06a Compare April 15, 2026 17:26

aepfli and others added 11 commits April 15, 2026 19:32

fix: revert schemas and spec submodules to main

1c28324

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

fix: revert grpc resolvers to upstream main (keep evaluation.v2)

a976480

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

chore: remove allFlags.json test artifact

9ce6404

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

style: format test_in_process.py

fe2884b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

style: fix import sorting in testkit smoke test

8291575

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

style: simplify error_code conditional (SIM or)

ecd0ba4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

aepfli marked this pull request as ready for review April 15, 2026 17:53

aepfli requested review from a team as code owners April 15, 2026 17:53

aepfli changed the title ~~feat: extract flagd evaluator into api, core, and testkit packages~~ feat(flagd): extract evaluator into api, core, and testkit packages Apr 15, 2026

gemini-code-assist bot reviewed Apr 15, 2026

View reviewed changes

Comment thread ...openfeature-provider-flagd/src/openfeature/contrib/provider/flagd/resolvers/process/flags.py

Comment thread tools/openfeature-flagd-api-testkit/src/openfeature/contrib/tools/flagd/testkit/utils.py

aepfli and others added 5 commits April 15, 2026 20:59

style: remove unused noqa directive in hatch_build.py

ca2059a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

fix: restore FlagStore.update() return type to match supertype

65f4052

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Simon Schrottner <simon.schrottner@gmail.com>

This was referenced Apr 16, 2026

feat: add Gherkin evaluator tests using test-harness specs open-feature/flagd#1947

Open

evaluator: fractional-nested-var tests should assert error-code for invalid variant resolution open-feature/flagd-testbed#367

Open

toddbaert reviewed Apr 17, 2026

View reviewed changes

Conversation

aepfli commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Provider refactoring

Fractional bucketing

CI & release

Other changes

Test plan

How to use the testkit (for custom evaluator implementations)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

codecov bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

aepfli commented Apr 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aepfli commented Apr 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

toddbaert commented Apr 16, 2026

Uh oh!

toddbaert Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

toddbaert Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

toddbaert Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

toddbaert Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

toddbaert commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

toddbaert commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aepfli commented Apr 8, 2026 •

edited

Loading

codecov bot commented Apr 15, 2026 •

edited

Loading

toddbaert commented Apr 17, 2026 •

edited

Loading