Skip to content

feat(core): Introduce Attribute-Carrying Language-Agnostic Enums#554

Merged
junrushao merged 2 commits intoapache:mainfrom
junrushao:junrushao/2026-04-17/py-enum
Apr 18, 2026
Merged

feat(core): Introduce Attribute-Carrying Language-Agnostic Enums#554
junrushao merged 2 commits intoapache:mainfrom
junrushao:junrushao/2026-04-17/py-enum

Conversation

@junrushao
Copy link
Copy Markdown
Member

@junrushao junrushao commented Apr 17, 2026

RFC: #553

Add first-class cross-language enum support to TVM-FFI. An enum is a registered Object type whose instances are named, frozen singletons — the same model as tvm::Op, generalised into an Enum base class usable from Python and C++ and converging on a single shared registry per type_key.

At a glance

Python

from __future__ import annotations

from typing import ClassVar

from tvm_ffi.dataclasses import Enum, auto, entry


# Pure-Python enum — fresh type_key, no C++ involvement.
class Priority(Enum, type_key="my.Priority"):
    low = auto()
    medium = auto()
    high = auto()


# Attribute-carrying enum.
class Activation(Enum, type_key="nn.Activation"):
    output_zero: bool
    is_monotonic: bool

    relu: ClassVar[Activation] = entry(output_zero=True, is_monotonic=True)
    gelu: ClassVar[Activation] = entry(output_zero=False, is_monotonic=False)
    silu: ClassVar[Activation] = entry(output_zero=False, is_monotonic=True)


# Python class binding C++-registered entries.
class Variant(Enum, type_key="testing.TestEnumVariant"):
    Alpha: ClassVar[Variant]   # bound to the C++-registered "Alpha"
    Beta:  ClassVar[Variant]   # bound to the C++-registered "Beta"


assert Activation.relu.value == 0          # auto-assigned ordinal
assert Activation.relu.name == "relu"      # auto-populated
assert Activation.relu.output_zero is True # user field
assert Activation.get("relu") is Activation.relu

# Extensible per-variant attributes, writable from anywhere.
cost = Activation.def_attr("cost", default=0)
cost[Activation.relu] = 1
cost[Activation.gelu] = 4
assert cost[Activation.silu] == 0          # default — silu was never assigned
assert Activation.silu not in cost         # distinguishes default-hit vs. set

C++

#include <tvm/ffi/enum.h>
#include <tvm/ffi/reflection/enum_def.h>

class ActivationObj : public tvm::ffi::EnumObj {
 public:
  bool output_zero;
  bool is_monotonic;
  TVM_FFI_DECLARE_OBJECT_INFO_FINAL("nn.Activation", ActivationObj, tvm::ffi::EnumObj);
};

TVM_FFI_STATIC_INIT_BLOCK() {
  namespace refl = tvm::ffi::reflection;
  refl::ObjectDef<ActivationObj>(refl::init(false))
      .def_ro("output_zero", &ActivationObj::output_zero)
      .def_ro("is_monotonic", &ActivationObj::is_monotonic);

  refl::EnumDef<ActivationObj>("relu")
      .set_attr("output_zero", true)
      .set_attr("is_monotonic", true);
  refl::EnumDef<ActivationObj>("gelu")
      .set_attr("output_zero", false)
      .set_attr("is_monotonic", false);
}

The Python and C++ halves write to the same two type_index-keyed TypeAttr columns, so a Python subclass that binds type_key="nn.Activation" sees every C++-registered entry, and any later auto()/entry(...) from Python becomes visible to C++ readers of the same columns. Entries cross FFI as ordinary ObjectRefs — no wire-format work.

Design

  • Enum instances are EnumObj subclasses. Each carries a dense auto-assigned int64_t value (0-indexed per class, declaration-order ordinal) and a String name. Both are populated at registration; neither is user-supplied.

  • Two per-class TypeAttr columns, shared across all call sites:

    • __ffi_enum_entries__Dict<String, Enum> mapping instance name → frozen singleton.
    • __ffi_enum_attrs__Dict<String, List<Any>> mapping attribute name → ordinal-indexed list.
  • Register-once-then-mutate. Each column is registered exactly once via TVMFFITypeRegisterAttr; every subsequent writer fetches the live container with TVMFFIGetTypeAttrColumn and mutates it in place. Distributed registration across TUs or Python modules converges on one set of containers.

  • Python variants are declared in one of four shapes, processed in Enum.__init_subclass__:

    1. name: ClassVar[Cls] = entry(**kwargs) — registers a Python-side entry and forwards kwargs to __init__.
    2. name = entry(**kwargs) (no annotation) — same as 1, for attribute-carrying enums where ClassVar is noise.
    3. name = auto() (or name: ClassVar[Cls] = auto()) — registers a variant with no extra fields; the preferred form for simple enums.
    4. Bare name: ClassVar[Cls] — binds to a C++-registered entry of the same name, or registers a blank Python entry if none exists.

    Within one class body, bare ClassVar binders resolve first (annotation order), then sentinel assignments (class-body order); auto-ordinals follow that combined order. Mixing all four forms on a single class is supported.

  • Auto-detected backend. Enum.__init_subclass__(type_key=...) routes the subclass through @c_class if the type is already registered in the FFI type system, otherwise through @py_class. There is no separate py_enum/c_enum opt-in.

  • Integer literals are rejected on the RHS. The auto-ordinal policy owns value, so ok = 0 and entry(0) would either duplicate or conflict with the auto-ordinal. auto() is the intended replacement. entry(value=...) / entry(name=...) raise TypeError at class-body time.

New public interfaces

C++ headers

  • include/tvm/ffi/enum.hEnumObj (int64_t value, String name, both def_ro-reflected) and Enum (nullable ObjectRef wrapper), registered under type key ffi.Enum. Plus two column-name constants kEnumEntriesAttrName (= "__ffi_enum_entries__") and kEnumAttrsAttrName (= "__ffi_enum_attrs__").
  • include/tvm/ffi/reflection/enum_def.hrefl::EnumDef<T>("name").set_attr("key", value).... Each call allocates a fresh ordinal, constructs the instance, and writes it into the per-class registry. Duplicate names for the same T raise RuntimeError. Exposes .instance() / .ordinal() for tests / advanced callers.
  • include/tvm/ffi/tvm_ffi.h transitively includes both new headers.

Python surface (tvm_ffi.dataclasses)

  • Enum — base class, decorated @dataclass_transform(field_specifiers=(Field, field, entry, auto)) so type checkers recognise entry() / auto() as dataclass-field specifiers.
  • entry(**kwargs), auto() — variant-declaration sentinels.
  • EnumAttrMap — view over the shared __ffi_enum_attrs__ column; __getitem__ / __setitem__ / __contains__ / get(default=...).
  • Per-subclass surface: Cls.get(name), Cls.entries(), Cls.def_attr(name, *, default=...), and three live class-level properties Cls.by_name (Dict[str, Enum]), Cls.by_value (List[Enum] indexed by ordinal), Cls.attr_dict (Dict[str, List[Any]]). The class-level properties are backed by an internal _ClassProperty descriptor so they work without a metaclass.

Other user-visible changes

  • TVMFFITypeRegisterAttr rejects duplicate (type_index, attr_name) writes. Reverses a previously relaxed "silent overwrite" behaviour. The enforced invariant is load-bearing for the register-once-then-mutate protocol; the error message points callers at that protocol.
  • Default repr for EnumObj subclasses is <type_key>.<name> instead of the generic type_key(field1=..., field2=...) form. Rendered by ReprPrinter after the __ffi_repr__ hook check, so explicit overrides still take precedence.
  • Built-in sentinels MISSING / KWARGS now render as <MISSING> / <KWARGS> via pointer-identity dispatch, replacing the generic ffi.Object fallback.
  • C++ test-support type testing.TestEnumVariant (in src/ffi/testing/testing.cc) now extends EnumObj and registers Alpha / Beta entries with a code attribute via refl::EnumDef. This is the canonical end-to-end demonstration of the builder and is exercised by the Python test suite.

Testing

  • uv run pytest tests/python/test_dataclass_enum.py -q — 38/38 passing. Covers all four declaration forms, auto-ordinal assignment, frozen-singleton identity, rejection of entry(value=...) / entry(name=...), get / entries / by_name / by_value / attr_dict, def_attr round-trips through the unified column, direct TypeAttr verification, the C++-backed happy path against testing.TestEnumVariant, mixed C++/Python entry registration, and the repr / sentinel behaviour.
  • uv run pytest tests/python -q — 2246 passed, 16 skipped, 3 xfailed. No regressions.
  • pre-commit run --all-files — clean.

C++ GoogleTest and Rust suites were not re-run; the enum builder is exercised end-to-end from the Python tests against testing.TestEnumVariant, and no Rust bindings were touched.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds FFI-compatible enum support via @py_enum and @c_enum decorators, allowing for singleton variants and per-variant attribute maps. Key changes involve implementing EnumObject and EnumAttrMap in Python, updating Cython FFI registration, and allowing attribute overwrites in C++. Review feedback identifies a potential reference counting bug in the Cython layer, suggests optimizing variant name lookups to O(1), and recommends alphabetical sorting for exports and more descriptive test naming.

Comment thread python/tvm_ffi/cython/object.pxi Outdated
Comment thread python/tvm_ffi/dataclasses/__init__.py
Comment thread python/tvm_ffi/dataclasses/enum.py Outdated
Comment thread tests/python/test_dataclass_enum.py Outdated
@junrushao junrushao force-pushed the junrushao/2026-04-17/py-enum branch from 772fe2f to 7c5ecdc Compare April 17, 2026 19:25
@junrushao junrushao marked this pull request as draft April 17, 2026 19:35
@junrushao junrushao force-pushed the junrushao/2026-04-17/py-enum branch from 7c5ecdc to 1861928 Compare April 17, 2026 20:25
@junrushao junrushao changed the title feat(dataclasses): add @py_enum / @c_enum decorators with TypeAttr-backed entries feat(dataclasses): add Enum base with __init_subclass__ + TypeAttr-backed entries Apr 17, 2026
@junrushao junrushao force-pushed the junrushao/2026-04-17/py-enum branch from 1861928 to ee7d757 Compare April 17, 2026 20:48
junrushao added a commit to junrushao/tvm-ffi that referenced this pull request Apr 18, 2026
…cked entries

Architecture:
- Introduce a dedicated ``tvm_ffi.dataclasses.enum`` module plus two new
  public C++ headers that jointly define a single cross-language enum
  abstraction. The Python and C++ halves agree on a fixed pair of
  TypeAttr columns and on the on-the-wire representation of each
  registry, so distributed entry registration (one TU in C++, plus one
  or more Python subclasses) converges on the same live containers.
- The C++ half lives in ``include/tvm/ffi/enum.h``: a concrete
  ``EnumObj`` (fields ``int64_t value`` + ``String name``, both
  reflected via ``def_ro``) plus its nullable ObjectRef ``Enum``.
  ``EnumObj`` is registered under the type key ``ffi.Enum`` and is the
  root of every user-defined enum class tree.
- Entry registration is driven by a new builder in
  ``include/tvm/ffi/reflection/enum_def.h``:
  ``refl::EnumDef<EnumClsObj>("Name").set_attr("key", value)...``. Each
  call allocates a fresh dense ordinal (``= len(entries)``), constructs
  the instance, and writes it into a per-type ``__ffi_enum_entries__``
  TypeAttr column storing ``Dict<String, Enum>``.
- ``EnumDef`` follows a strict "register-once-then-mutate" protocol:
  on the first call per type it registers the mutable ``Dict`` via
  ``TVMFFITypeRegisterAttr``; subsequent calls look up the existing
  ``Dict`` via ``TVMFFIGetTypeAttrColumn`` and mutate it in place. The
  same protocol governs the per-class attribute store
  (``__ffi_enum_attrs__``: ``Dict<String, List<Any>>``), where each
  attribute is a list indexed by ordinal, padded with ``None`` through
  the ordinal before the write.
- The Python half (``python/tvm_ffi/dataclasses/enum.py``) exposes an
  ``Enum`` base class registered at the same ``ffi.Enum`` key.
  Subclasses declare their FFI type via parameterised inheritance:
  ``class Foo(Enum, type_key="..."):``. ``Enum.__init_subclass__``
  auto-detects whether ``type_key`` is already in the FFI type system
  and routes the class through ``@c_class`` (C++-backed) or
  ``@py_class`` (fresh Python-only) accordingly — no separate
  ``py_enum``/``c_enum`` opt-in exists.
- Variants are declared in the class body in exactly two shapes, both
  of which go through ``__init_subclass__`` scanning and materialise
  singletons cached as class attributes (guaranteeing
  ``Cls.FOO is Cls.FOO``):
    1. ``name: ClassVar[Cls] = entry(field1=..., field2=...)`` —
       registers a Python-side entry and forwards the captured kwargs
       to the subclass's ``__init__`` as user-declared fields.
    2. ``name: ClassVar[Cls]`` (no assignment) — binds to a pre-existing
       entry with the same ``name`` from the ``__ffi_enum_entries__``
       column (typically registered in C++ via ``refl::EnumDef``), or,
       if none exists, registers a blank Python entry.
- Both ``value`` (dense ordinal in declaration order) and ``name``
  (declaration key) are auto-populated on every entry; they are never
  user-supplied. ``entry(value=...)`` and ``entry(name=...)`` raise
  ``TypeError`` at class-body time.
- Per-class class-level reflection surface (``by_name``, ``by_value``,
  ``attr_dict``) is exposed through a new ``_ClassProperty`` descriptor
  — a minimal getter descriptor that receives the owning class, letting
  class-level attribute access work without a metaclass.

Public Interfaces:
- New C++ public headers:
  * ``include/tvm/ffi/enum.h`` — ``EnumObj``/``Enum`` + two
    column-name string constants ``kEnumEntriesAttrName`` (=
    ``"__ffi_enum_entries__"``) and ``kEnumAttrsAttrName`` (=
    ``"__ffi_enum_attrs__"``). Both constants are the source of truth
    for the column names; Python mirrors them as
    ``ENUM_ENTRIES_ATTR`` / ``ENUM_ATTRS_ATTR``.
  * ``include/tvm/ffi/reflection/enum_def.h`` — ``refl::EnumDef<Obj>``
    builder with ``.set_attr(name, value)`` chaining and getters for
    ``instance()`` / ``ordinal()``.
- ``include/tvm/ffi/tvm_ffi.h`` now transitively includes both new
  headers, so consumers of the aggregate header get the enum API
  without extra work.
- New Python symbols exported from ``tvm_ffi.dataclasses``:
  ``Enum``, ``EnumAttrMap``, ``entry``. Class surface on subclasses:
  ``Cls.get(name)`` / ``Cls.entries()`` / ``Cls.def_attr(name,
  *, default=...)`` + three class-level property views ``Cls.by_name``
  (``Dict[str, Enum]``), ``Cls.by_value`` (``List[Enum]`` indexed by
  ordinal), ``Cls.attr_dict`` (``Dict[str, List[Any]]``).
- ``EnumAttrMap`` round-trips through the unified
  ``__ffi_enum_attrs__`` column instead of per-attribute columns; a
  single Dict column is shared across all attribute names on a given
  class, and each value is a List indexed by variant ordinal.
- ``@dataclass_transform(field_specifiers=(Field, field, entry))`` on
  ``Enum`` lets ``ClassVar[Cls] = entry(...)`` type-check as a proper
  field-specifier pattern under typing-aware tools.
- C++ test-support type ``testing.TestEnumVariant`` (in
  ``src/ffi/testing/testing.cc``) now extends ``EnumObj`` rather than a
  bare ``Object``; it registers two entries ``Alpha``/``Beta`` via
  ``refl::EnumDef`` with a ``code`` attribute. This is the canonical
  end-to-end demonstration of the builder.

UI/UX:
- none (library-only change; no CLI, REPL, or user-visible UI surface).

Behavioral Changes:
- ``TVMFFITypeRegisterAttr`` again raises ``RuntimeError`` on duplicate
  ``(type_index, attr_name)`` writes. This is a reversal of the
  previously-relaxed "silent overwrite" behaviour: the invariant that
  a TypeAttr slot is registered once is restored and is now load-
  bearing for the ``EnumDef`` register-once-then-mutate protocol. The
  thrown message tells callers to register a mutable container
  (``Dict``/``List``) once and mutate it in place on subsequent calls
  — exactly the pattern the enum builders use internally.
- Distributed enum-entry registration — whether across TUs or across
  Python subclasses re-binding the same type key — now converges on
  shared ``Dict``/``List`` containers. There is no "last writer wins"
  semantics; instead, each registrant atomically appends to the shared
  live containers, and duplicate ``(type, instance_name)`` collisions
  are explicit ``RuntimeError``s at ``EnumDef`` construction.
- Variant declaration forms were narrowed relative to the earlier draft
  of this feature. Bare-int sugar (``ok = 0`` auto-promotion), the
  simple-integer form ``entry(0)``, and the implicit
  ``value: int`` field synthesis are no longer supported. ``value``
  is always auto-assigned as the dense ordinal and is now a concrete
  read-only field on ``EnumObj`` itself. This is a design revision of
  work that has not yet shipped to users (the PR is still open), not a
  breaking change to a released API.

Docs:
- No user-facing RST/Markdown doc page updated in this change. The
  ``dataclass_reflection.rst`` toctree entries for ``py_class`` /
  ``c_class`` are still commented out, so no sibling ``Enum`` section
  is authored yet. The two declaration forms, the auto-assigned
  ``value``/``name`` fields, and the ``by_name``/``by_value``/
  ``attr_dict`` class-level views are fully documented in the ``Enum``
  docstring and the new C++ header Doxygen comments. Follow-up:
  publish a unified dataclass + enum reference once the broader
  dataclass doc lands.

Tests:
- Executed: ``uv run pytest tests/python/test_dataclass_enum.py`` —
  24/24 passing. Covers both declaration forms (Python-side
  ``ClassVar = entry(...)`` and bare-``ClassVar[Cls]`` binding),
  explicit rejection of ``entry(value=...)`` / ``entry(name=...)``,
  auto-ordinal assignment, frozen-singleton identity, ``Cls.get`` /
  ``Cls.entries`` / ``Cls.by_name`` / ``Cls.by_value`` /
  ``Cls.attr_dict`` surface, ``def_attr`` round-trips through the
  unified ``__ffi_enum_attrs__`` column (including missing-default
  behaviour, ``__contains__``, cross-enum foreign-variant rejection,
  and fresh-wrapper lookup via ``Cls.get``), direct TypeAttr column
  verification for both ``__ffi_enum_entries__`` and
  ``__ffi_enum_attrs__``, and the C++-backed happy path against
  ``testing.TestEnumVariant``'s ``refl::EnumDef``-registered
  ``Alpha``/``Beta`` entries with their ``code`` attribute.
- Executed: ``pre-commit run --files <all staged>`` — all hooks pass
  (ASF header, file types, end-of-file / trailing whitespace, ruff
  check/format, ty, clang-format).

Untested Edge Cases:
- C++ GoogleTest suite (``tests/cpp/``) was not re-run. The C++ delta
  touches: (a) restoring ``RegisterAttr``'s duplicate-throw — for which
  no existing test asserts either the throw or the silent-overwrite
  behaviour being reverted; (b) introducing the ``EnumObj``/``Enum``
  root and ``EnumDef`` builder, covered end-to-end by the Python
  tests against ``testing.TestEnumVariant``; (c) the test-only
  refactor of ``TestEnumVariant`` from a bare ``Object`` to an
  ``EnumObj`` subclass. Regression risk is low but formally
  unverified.
- Rust test suite (``cargo test`` under ``rust/``) was not executed.
  No Rust bindings touched; risk is low but untested.
- Cross-module TypeAttr convergence (two independently-loaded Python
  modules registering entries under the same type key from different
  processes / plugin-host isolation contexts) is exercised only
  within a single process; multi-process scenarios remain uncovered.

Refs: apache#554
@junrushao junrushao force-pushed the junrushao/2026-04-17/py-enum branch from ee7d757 to d880675 Compare April 18, 2026 03:14
junrushao added a commit to junrushao/tvm-ffi that referenced this pull request Apr 18, 2026
…cked entries

Architecture:
- Introduce a dedicated ``tvm_ffi.dataclasses.enum`` module plus two new
  public C++ headers that jointly define a single cross-language enum
  abstraction. The Python and C++ halves agree on a fixed pair of
  TypeAttr columns and on the on-the-wire representation of each
  registry, so distributed entry registration (one TU in C++, plus one
  or more Python subclasses) converges on the same live containers.
- The C++ half lives in ``include/tvm/ffi/enum.h``: a concrete
  ``EnumObj`` (fields ``int64_t value`` + ``String name``, both
  reflected via ``def_ro``) plus its nullable ObjectRef ``Enum``.
  ``EnumObj`` is registered under the type key ``ffi.Enum`` and is the
  root of every user-defined enum class tree.
- Entry registration is driven by a new builder in
  ``include/tvm/ffi/reflection/enum_def.h``:
  ``refl::EnumDef<EnumClsObj>("Name").set_attr("key", value)...``. Each
  call allocates a fresh dense ordinal (``= len(entries)``), constructs
  the instance, and writes it into a per-type ``__ffi_enum_entries__``
  TypeAttr column storing ``Dict<String, Enum>``.
- ``EnumDef`` follows a strict "register-once-then-mutate" protocol:
  on the first call per type it registers the mutable ``Dict`` via
  ``TVMFFITypeRegisterAttr``; subsequent calls look up the existing
  ``Dict`` via ``TVMFFIGetTypeAttrColumn`` and mutate it in place. The
  same protocol governs the per-class attribute store
  (``__ffi_enum_attrs__``: ``Dict<String, List<Any>>``), where each
  attribute is a list indexed by ordinal, padded with ``None`` through
  the ordinal before the write.
- The Python half (``python/tvm_ffi/dataclasses/enum.py``) exposes an
  ``Enum`` base class registered at the same ``ffi.Enum`` key.
  Subclasses declare their FFI type via parameterised inheritance:
  ``class Foo(Enum, type_key="..."):``. ``Enum.__init_subclass__``
  auto-detects whether ``type_key`` is already in the FFI type system
  and routes the class through ``@c_class`` (C++-backed) or
  ``@py_class`` (fresh Python-only) accordingly — no separate
  ``py_enum``/``c_enum`` opt-in exists.
- Variants are declared in the class body in three shapes, all of
  which go through ``__init_subclass__`` scanning and materialise
  singletons cached as class attributes (guaranteeing
  ``Cls.FOO is Cls.FOO``):
    1. ``name: ClassVar[Cls]`` (no assignment) — binds to a pre-existing
       entry with the same ``name`` from the ``__ffi_enum_entries__``
       column (typically registered in C++ via ``refl::EnumDef``), or,
       if none exists, registers a blank Python entry.
    2. ``name: ClassVar[Cls] = entry(field1=..., field2=...)`` —
       registers a Python-side entry and forwards the captured kwargs
       to the subclass's ``__init__`` as user-declared fields.
    3. ``name = auto()`` / ``name: ClassVar[Cls] = auto()`` —
       registers a Python-side entry that carries no user-declared
       fields beyond the auto-assigned ``value``/``name``. Semantically
       equivalent to ``entry()`` with no arguments, but spelled with a
       dedicated helper to keep simple enum bodies uncluttered and to
       give users a discoverable alternative to stdlib-style int sugar
       (which this module deliberately rejects — see Behavioral
       Changes).
- Both ``value`` (dense ordinal in declaration order) and ``name``
  (declaration key) are auto-populated on every entry; they are never
  user-supplied. ``entry(value=...)`` and ``entry(name=...)`` raise
  ``TypeError`` at class-body time.
- Per-class class-level reflection surface (``by_name``, ``by_value``,
  ``attr_dict``) is exposed through a new ``_ClassProperty`` descriptor
  — a minimal getter descriptor that receives the owning class, letting
  class-level attribute access work without a metaclass.

Public Interfaces:
- New C++ public headers:
  * ``include/tvm/ffi/enum.h`` — ``EnumObj``/``Enum`` + two
    column-name string constants ``kEnumEntriesAttrName`` (=
    ``"__ffi_enum_entries__"``) and ``kEnumAttrsAttrName`` (=
    ``"__ffi_enum_attrs__"``). Both constants are the source of truth
    for the column names; Python mirrors them as
    ``ENUM_ENTRIES_ATTR`` / ``ENUM_ATTRS_ATTR``.
  * ``include/tvm/ffi/reflection/enum_def.h`` — ``refl::EnumDef<Obj>``
    builder with ``.set_attr(name, value)`` chaining and getters for
    ``instance()`` / ``ordinal()``.
- ``include/tvm/ffi/tvm_ffi.h`` now transitively includes both new
  headers, so consumers of the aggregate header get the enum API
  without extra work.
- New Python symbols exported from ``tvm_ffi.dataclasses``:
  ``Enum``, ``EnumAttrMap``, ``entry``, ``auto``. ``auto`` is a
  zero-arg helper returning the same ``_EnumEntry`` sentinel as
  ``entry()``; it is listed alongside ``entry``/``field``/``Field`` in
  ``@dataclass_transform(field_specifiers=...)`` so that
  ``name = auto()`` and ``name: ClassVar[Cls] = auto()`` both
  type-check as field-specifier patterns. Class surface on subclasses:
  ``Cls.get(name)`` / ``Cls.entries()`` / ``Cls.def_attr(name,
  *, default=...)`` + three class-level property views ``Cls.by_name``
  (``Dict[str, Enum]``), ``Cls.by_value`` (``List[Enum]`` indexed by
  ordinal), ``Cls.attr_dict`` (``Dict[str, List[Any]]``).
- ``EnumAttrMap`` round-trips through the unified
  ``__ffi_enum_attrs__`` column instead of per-attribute columns; a
  single Dict column is shared across all attribute names on a given
  class, and each value is a List indexed by variant ordinal.
- ``@dataclass_transform(field_specifiers=(Field, field, entry, auto))``
  on ``Enum`` lets ``ClassVar[Cls] = entry(...)`` and
  ``name = auto()`` type-check as proper field-specifier patterns
  under typing-aware tools.
- C++ test-support type ``testing.TestEnumVariant`` (in
  ``src/ffi/testing/testing.cc``) now extends ``EnumObj`` rather than a
  bare ``Object``; it registers two entries ``Alpha``/``Beta`` via
  ``refl::EnumDef`` with a ``code`` attribute. This is the canonical
  end-to-end demonstration of the builder.

UI/UX:
- none (library-only change; no CLI, REPL, or user-visible UI surface).

Behavioral Changes:
- ``TVMFFITypeRegisterAttr`` again raises ``RuntimeError`` on duplicate
  ``(type_index, attr_name)`` writes. This is a reversal of the
  previously-relaxed "silent overwrite" behaviour: the invariant that
  a TypeAttr slot is registered once is restored and is now load-
  bearing for the ``EnumDef`` register-once-then-mutate protocol. The
  thrown message tells callers to register a mutable container
  (``Dict``/``List``) once and mutate it in place on subsequent calls
  — exactly the pattern the enum builders use internally.
- Distributed enum-entry registration — whether across TUs or across
  Python subclasses re-binding the same type key — now converges on
  shared ``Dict``/``List`` containers. There is no "last writer wins"
  semantics; instead, each registrant atomically appends to the shared
  live containers, and duplicate ``(type, instance_name)`` collisions
  are explicit ``RuntimeError``s at ``EnumDef`` construction.
- Variant declaration forms were narrowed relative to the earlier draft
  of this feature. Bare-int sugar (``ok = 0`` auto-promotion), the
  simple-integer form ``entry(0)``, and the implicit
  ``value: int`` field synthesis are no longer supported. ``value``
  is always auto-assigned as the dense ordinal and is now a concrete
  read-only field on ``EnumObj`` itself. This is a design revision of
  work that has not yet shipped to users (the PR is still open), not a
  breaking change to a released API.
- Integer-literal sugar (e.g. ``ok = 0``) is deliberately *not*
  supported. The auto-ordinal policy owns the ``value`` slot, so a
  user-supplied int would either silently duplicate that assignment
  or conflict with it; both outcomes are worse than a hard
  rejection. ``auto()`` is the intended replacement — it makes the
  "no extra fields" intent explicit without fighting the auto-
  ordinal contract. The ``entry()`` docstring cross-references
  ``auto()``, and the ``Enum`` docstring's "Declaration forms"
  section calls out this rejection inline so discoverability does
  not depend on following ``See Also`` links.

Docs:
- No user-facing RST/Markdown doc page updated in this change. The
  ``dataclass_reflection.rst`` toctree entries for ``py_class`` /
  ``c_class`` are still commented out, so no sibling ``Enum`` section
  is authored yet. All three declaration forms (bare ``ClassVar``,
  ``entry(...)``, ``auto()``), the auto-assigned ``value``/``name``
  fields, the explicit rejection of integer-literal sugar, and the
  ``by_name``/``by_value``/``attr_dict`` class-level views are fully
  documented in the ``Enum``/``entry``/``auto`` docstrings and the
  new C++ header Doxygen comments. Follow-up: publish a unified
  dataclass + enum reference once the broader dataclass doc lands.

Tests:
- Executed: ``uv run pytest tests/python/test_dataclass_enum.py -q``
  — 30/30 passing. Covers all three declaration forms (Python-side
  ``ClassVar = entry(...)``, bare-``ClassVar[Cls]`` binding, and the
  new ``auto()`` helper in both annotated and bare-assignment shapes),
  explicit rejection of ``entry(value=...)`` / ``entry(name=...)``,
  auto-ordinal assignment, frozen-singleton identity, ``Cls.get`` /
  ``Cls.entries`` / ``Cls.by_name`` / ``Cls.by_value`` /
  ``Cls.attr_dict`` surface, ``def_attr`` round-trips through the
  unified ``__ffi_enum_attrs__`` column (including missing-default
  behaviour, ``__contains__``, cross-enum foreign-variant rejection,
  and fresh-wrapper lookup via ``Cls.get``), direct TypeAttr column
  verification for both ``__ffi_enum_entries__`` and
  ``__ffi_enum_attrs__``, the C++-backed happy path against
  ``testing.TestEnumVariant``'s ``refl::EnumDef``-registered
  ``Alpha``/``Beta`` entries with their ``code`` attribute, and — new
  in this amend — six dedicated ``auto()`` tests:
  ``test_auto_basic_no_annotation``,
  ``test_auto_with_classvar_annotation``,
  ``test_auto_mixed_with_bare_classvar`` (verifies the bare-
  ``ClassVar`` binders come first in annotation order, then sentinel
  entries in class-body order, with deterministic dense ordinals),
  ``test_auto_mixed_with_entry`` (composition with attribute-carrying
  ``entry(...)`` variants on the same class),
  ``test_auto_rejects_already_registered_name`` (asserts ``auto()``
  is register-not-bind — colliding with a C++-registered entry name
  raises), and ``test_auto_returns_fresh_sentinels`` (confirms each
  call yields a distinct ``_EnumEntry`` with empty ``args``/``kwargs``).
- Executed: ``pre-commit run --files <all staged>`` — all hooks pass
  (ASF header, file types, end-of-file / trailing whitespace, ruff
  check/format, ty, clang-format).

Untested Edge Cases:
- C++ GoogleTest suite (``tests/cpp/``) was not re-run. The C++ delta
  touches: (a) restoring ``RegisterAttr``'s duplicate-throw — for which
  no existing test asserts either the throw or the silent-overwrite
  behaviour being reverted; (b) introducing the ``EnumObj``/``Enum``
  root and ``EnumDef`` builder, covered end-to-end by the Python
  tests against ``testing.TestEnumVariant``; (c) the test-only
  refactor of ``TestEnumVariant`` from a bare ``Object`` to an
  ``EnumObj`` subclass. Regression risk is low but formally
  unverified.
- Rust test suite (``cargo test`` under ``rust/``) was not executed.
  No Rust bindings touched; risk is low but untested.
- Cross-module TypeAttr convergence (two independently-loaded Python
  modules registering entries under the same type key from different
  processes / plugin-host isolation contexts) is exercised only
  within a single process; multi-process scenarios remain uncovered.

Refs: apache#554
@junrushao junrushao force-pushed the junrushao/2026-04-17/py-enum branch from d880675 to 77b83ff Compare April 18, 2026 03:38
junrushao added a commit to junrushao/tvm-ffi that referenced this pull request Apr 18, 2026
…cked entries

Architecture:
- Introduce a dedicated ``tvm_ffi.dataclasses.enum`` module plus two new
  public C++ headers that jointly define a single cross-language enum
  abstraction. The Python and C++ halves agree on a fixed pair of
  TypeAttr columns and on the on-the-wire representation of each
  registry, so distributed entry registration (one TU in C++, plus one
  or more Python subclasses) converges on the same live containers.
- The C++ half lives in ``include/tvm/ffi/enum.h``: a concrete
  ``EnumObj`` (fields ``int64_t value`` + ``String name``, both
  reflected via ``def_ro``) plus its nullable ObjectRef ``Enum``.
  ``EnumObj`` is registered under the type key ``ffi.Enum`` and is the
  root of every user-defined enum class tree.
- Entry registration is driven by a new builder in
  ``include/tvm/ffi/reflection/enum_def.h``:
  ``refl::EnumDef<EnumClsObj>("Name").set_attr("key", value)...``. Each
  call allocates a fresh dense ordinal (``= len(entries)``), constructs
  the instance, and writes it into a per-type ``__ffi_enum_entries__``
  TypeAttr column storing ``Dict<String, Enum>``.
- ``EnumDef`` follows a strict "register-once-then-mutate" protocol:
  on the first call per type it registers the mutable ``Dict`` via
  ``TVMFFITypeRegisterAttr``; subsequent calls look up the existing
  ``Dict`` via ``TVMFFIGetTypeAttrColumn`` and mutate it in place. The
  same protocol governs the per-class attribute store
  (``__ffi_enum_attrs__``: ``Dict<String, List<Any>>``), where each
  attribute is a list indexed by ordinal, padded with ``None`` through
  the ordinal before the write.
- The Python half (``python/tvm_ffi/dataclasses/enum.py``) exposes an
  ``Enum`` base class registered at the same ``ffi.Enum`` key.
  Subclasses declare their FFI type via parameterised inheritance:
  ``class Foo(Enum, type_key="..."):``. ``Enum.__init_subclass__``
  auto-detects whether ``type_key`` is already in the FFI type system
  and routes the class through ``@c_class`` (C++-backed) or
  ``@py_class`` (fresh Python-only) accordingly — no separate
  ``py_enum``/``c_enum`` opt-in exists.
- Variants are declared in the class body in three shapes, all of
  which go through ``__init_subclass__`` scanning and materialise
  singletons cached as class attributes (guaranteeing
  ``Cls.FOO is Cls.FOO``):
    1. ``name: ClassVar[Cls]`` (no assignment) — binds to a pre-existing
       entry with the same ``name`` from the ``__ffi_enum_entries__``
       column (typically registered in C++ via ``refl::EnumDef``), or,
       if none exists, registers a blank Python entry.
    2. ``name: ClassVar[Cls] = entry(field1=..., field2=...)`` —
       registers a Python-side entry and forwards the captured kwargs
       to the subclass's ``__init__`` as user-declared fields.
    3. ``name = auto()`` / ``name: ClassVar[Cls] = auto()`` —
       registers a Python-side entry that carries no user-declared
       fields beyond the auto-assigned ``value``/``name``. Semantically
       equivalent to ``entry()`` with no arguments, but spelled with a
       dedicated helper to keep simple enum bodies uncluttered and to
       give users a discoverable alternative to stdlib-style int sugar
       (which this module deliberately rejects — see Behavioral
       Changes).
- Both ``value`` (dense ordinal in declaration order) and ``name``
  (declaration key) are auto-populated on every entry; they are never
  user-supplied. ``entry(value=...)`` and ``entry(name=...)`` raise
  ``TypeError`` at class-body time.
- Per-class class-level reflection surface (``by_name``, ``by_value``,
  ``attr_dict``) is exposed through a new ``_ClassProperty`` descriptor
  — a minimal getter descriptor that receives the owning class, letting
  class-level attribute access work without a metaclass.

Public Interfaces:
- New C++ public headers:
  * ``include/tvm/ffi/enum.h`` — ``EnumObj``/``Enum`` + two
    column-name string constants ``kEnumEntriesAttrName`` (=
    ``"__ffi_enum_entries__"``) and ``kEnumAttrsAttrName`` (=
    ``"__ffi_enum_attrs__"``). Both constants are the source of truth
    for the column names; Python mirrors them as
    ``ENUM_ENTRIES_ATTR`` / ``ENUM_ATTRS_ATTR``.
  * ``include/tvm/ffi/reflection/enum_def.h`` — ``refl::EnumDef<Obj>``
    builder with ``.set_attr(name, value)`` chaining and getters for
    ``instance()`` / ``ordinal()``.
- ``include/tvm/ffi/tvm_ffi.h`` now transitively includes both new
  headers, so consumers of the aggregate header get the enum API
  without extra work.
- New Python symbols exported from ``tvm_ffi.dataclasses``:
  ``Enum``, ``EnumAttrMap``, ``entry``, ``auto``. ``auto`` is a
  zero-arg helper returning the same ``_EnumEntry`` sentinel as
  ``entry()``; it is listed alongside ``entry``/``field``/``Field`` in
  ``@dataclass_transform(field_specifiers=...)`` so that
  ``name = auto()`` and ``name: ClassVar[Cls] = auto()`` both
  type-check as field-specifier patterns. Class surface on subclasses:
  ``Cls.get(name)`` / ``Cls.entries()`` / ``Cls.def_attr(name,
  *, default=...)`` + three class-level property views ``Cls.by_name``
  (``Dict[str, Enum]``), ``Cls.by_value`` (``List[Enum]`` indexed by
  ordinal), ``Cls.attr_dict`` (``Dict[str, List[Any]]``).
- ``EnumAttrMap`` round-trips through the unified
  ``__ffi_enum_attrs__`` column instead of per-attribute columns; a
  single Dict column is shared across all attribute names on a given
  class, and each value is a List indexed by variant ordinal.
- ``@dataclass_transform(field_specifiers=(Field, field, entry, auto))``
  on ``Enum`` lets ``ClassVar[Cls] = entry(...)`` and
  ``name = auto()`` type-check as proper field-specifier patterns
  under typing-aware tools.
- C++ test-support type ``testing.TestEnumVariant`` (in
  ``src/ffi/testing/testing.cc``) now extends ``EnumObj`` rather than a
  bare ``Object``; it registers two entries ``Alpha``/``Beta`` via
  ``refl::EnumDef`` with a ``code`` attribute. This is the canonical
  end-to-end demonstration of the builder.

UI/UX:
- none (library-only change; no CLI, REPL, or user-visible UI surface).

Behavioral Changes:
- ``TVMFFITypeRegisterAttr`` again raises ``RuntimeError`` on duplicate
  ``(type_index, attr_name)`` writes. This is a reversal of the
  previously-relaxed "silent overwrite" behaviour: the invariant that
  a TypeAttr slot is registered once is restored and is now load-
  bearing for the ``EnumDef`` register-once-then-mutate protocol. The
  thrown message tells callers to register a mutable container
  (``Dict``/``List``) once and mutate it in place on subsequent calls
  — exactly the pattern the enum builders use internally.
- Distributed enum-entry registration — whether across TUs or across
  Python subclasses re-binding the same type key — now converges on
  shared ``Dict``/``List`` containers. There is no "last writer wins"
  semantics; instead, each registrant atomically appends to the shared
  live containers, and duplicate ``(type, instance_name)`` collisions
  are explicit ``RuntimeError``s at ``EnumDef`` construction.
- Variant declaration forms were narrowed relative to the earlier draft
  of this feature. Bare-int sugar (``ok = 0`` auto-promotion), the
  simple-integer form ``entry(0)``, and the implicit
  ``value: int`` field synthesis are no longer supported. ``value``
  is always auto-assigned as the dense ordinal and is now a concrete
  read-only field on ``EnumObj`` itself. This is a design revision of
  work that has not yet shipped to users (the PR is still open), not a
  breaking change to a released API.
- Integer-literal sugar (e.g. ``ok = 0``) is deliberately *not*
  supported. The auto-ordinal policy owns the ``value`` slot, so a
  user-supplied int would either silently duplicate that assignment
  or conflict with it; both outcomes are worse than a hard
  rejection. ``auto()`` is the intended replacement — it makes the
  "no extra fields" intent explicit without fighting the auto-
  ordinal contract. The ``entry()`` docstring cross-references
  ``auto()``, and the ``Enum`` docstring's "Declaration forms"
  section calls out this rejection inline so discoverability does
  not depend on following ``See Also`` links.

Docs:
- No user-facing RST/Markdown doc page updated in this change. The
  ``dataclass_reflection.rst`` toctree entries for ``py_class`` /
  ``c_class`` are still commented out, so no sibling ``Enum`` section
  is authored yet. All three declaration forms (bare ``ClassVar``,
  ``entry(...)``, ``auto()``), the auto-assigned ``value``/``name``
  fields, the explicit rejection of integer-literal sugar, and the
  ``by_name``/``by_value``/``attr_dict`` class-level views are fully
  documented in the ``Enum``/``entry``/``auto`` docstrings and the
  C++ header Doxygen comments — including the new Doxygen block on
  the two-arg ``EnumObj(int64_t value, String name)`` constructor
  that was missing before. Follow-up: publish a unified dataclass +
  enum reference once the broader dataclass doc lands.

Tests:
- Executed: ``uv run pytest tests/python/test_dataclass_enum.py -q``
  — 30/30 passing. Covers all three declaration forms (Python-side
  ``ClassVar = entry(...)``, bare-``ClassVar[Cls]`` binding, and the
  new ``auto()`` helper in both annotated and bare-assignment shapes),
  explicit rejection of ``entry(value=...)`` / ``entry(name=...)``,
  auto-ordinal assignment, frozen-singleton identity, ``Cls.get`` /
  ``Cls.entries`` / ``Cls.by_name`` / ``Cls.by_value`` /
  ``Cls.attr_dict`` surface, ``def_attr`` round-trips through the
  unified ``__ffi_enum_attrs__`` column (including missing-default
  behaviour, ``__contains__``, cross-enum foreign-variant rejection,
  and fresh-wrapper lookup via ``Cls.get``), direct TypeAttr column
  verification for both ``__ffi_enum_entries__`` and
  ``__ffi_enum_attrs__``, the C++-backed happy path against
  ``testing.TestEnumVariant``'s ``refl::EnumDef``-registered
  ``Alpha``/``Beta`` entries with their ``code`` attribute, and six
  dedicated ``auto()`` tests:
  ``test_auto_basic_no_annotation``,
  ``test_auto_with_classvar_annotation``,
  ``test_auto_mixed_with_bare_classvar`` (verifies the bare-
  ``ClassVar`` binders come first in annotation order, then sentinel
  entries in class-body order, with deterministic dense ordinals),
  ``test_auto_mixed_with_entry`` (composition with attribute-carrying
  ``entry(...)`` variants on the same class),
  ``test_auto_rejects_already_registered_name`` (asserts ``auto()``
  is register-not-bind — colliding with a C++-registered entry name
  raises), and ``test_auto_returns_fresh_sentinels`` (confirms each
  call yields a distinct ``_EnumEntry`` with empty ``args``/``kwargs``).
- Executed: ``pre-commit run --files <all staged>`` — all hooks pass
  (ASF header, file types, end-of-file / trailing whitespace, ruff
  check/format, ty, clang-format).
- CI-driven operational fixes applied in this amend (no behavior
  delta, no test churn):
  * Added a ``/*! \brief ... */`` Doxygen block on
    ``EnumObj(int64_t value, String name)`` in
    ``include/tvm/ffi/enum.h`` so the doc-build (Doxygen) job no
    longer rejects an undocumented public constructor at
    ``enum.h:71``.
  * Annotated the intentional RAII temporary
    ``refl::ObjectDef<TestEnumVariantObj>(refl::init(false));`` in
    ``src/ffi/testing/testing.cc`` with a
    ``// NOLINT(bugprone-unused-raii)`` suppression plus a short
    preceding comment explaining that the destructor *is* the
    payload (it registers the type), so the clang-tidy job no
    longer fires on what is by construction a correct-use pattern.
  * Centralised the Windows ``/bigobj`` flag inside the
    ``tvm_ffi_add_msvc_flags`` macro in
    ``cmake/Utils/Library.cmake`` so every MSVC target picks it up,
    and removed the now-redundant per-target
    ``target_compile_options(tvm_ffi_objs PRIVATE /bigobj)`` in
    ``CMakeLists.txt``. This fixes the Windows MSVC
    ``error C1128: number of sections exceeded object file format
    limit`` on ``src/ffi/testing/testing.cc`` (caused by heavy
    reflection-template instantiations exceeding the default COFF
    section limit) and guards against the same failure recurring on
    any other TU that grows past the threshold.

Untested Edge Cases:
- C++ GoogleTest suite (``tests/cpp/``) was not re-run. The C++ delta
  touches: (a) restoring ``RegisterAttr``'s duplicate-throw — for which
  no existing test asserts either the throw or the silent-overwrite
  behaviour being reverted; (b) introducing the ``EnumObj``/``Enum``
  root and ``EnumDef`` builder, covered end-to-end by the Python
  tests against ``testing.TestEnumVariant``; (c) the test-only
  refactor of ``TestEnumVariant`` from a bare ``Object`` to an
  ``EnumObj`` subclass. Regression risk is low but formally
  unverified.
- Rust test suite (``cargo test`` under ``rust/``) was not executed.
  No Rust bindings touched; risk is low but untested.
- Cross-module TypeAttr convergence (two independently-loaded Python
  modules registering entries under the same type key from different
  processes / plugin-host isolation contexts) is exercised only
  within a single process; multi-process scenarios remain uncovered.

Refs: apache#554
@junrushao junrushao force-pushed the junrushao/2026-04-17/py-enum branch from 77b83ff to 18ea1b4 Compare April 18, 2026 03:57
junrushao added a commit to junrushao/tvm-ffi that referenced this pull request Apr 18, 2026
…cked entries

Architecture:
- Introduce a dedicated ``tvm_ffi.dataclasses.enum`` module plus two new
  public C++ headers that jointly define a single cross-language enum
  abstraction. The Python and C++ halves agree on a fixed pair of
  TypeAttr columns and on the on-the-wire representation of each
  registry, so distributed entry registration (one TU in C++, plus one
  or more Python subclasses) converges on the same live containers.
- The C++ half lives in ``include/tvm/ffi/enum.h``: a concrete
  ``EnumObj`` (fields ``int64_t value`` + ``String name``, both
  reflected via ``def_ro``) plus its nullable ObjectRef ``Enum``.
  ``EnumObj`` is registered under the type key ``ffi.Enum`` and is the
  root of every user-defined enum class tree.
- Entry registration is driven by a new builder in
  ``include/tvm/ffi/reflection/enum_def.h``:
  ``refl::EnumDef<EnumClsObj>("Name").set_attr("key", value)...``. Each
  call allocates a fresh dense ordinal (``= len(entries)``), constructs
  the instance, and writes it into a per-type ``__ffi_enum_entries__``
  TypeAttr column storing ``Dict<String, Enum>``.
- ``EnumDef`` follows a strict "register-once-then-mutate" protocol:
  on the first call per type it registers the mutable ``Dict`` via
  ``TVMFFITypeRegisterAttr``; subsequent calls look up the existing
  ``Dict`` via ``TVMFFIGetTypeAttrColumn`` and mutate it in place. The
  same protocol governs the per-class attribute store
  (``__ffi_enum_attrs__``: ``Dict<String, List<Any>>``), where each
  attribute is a list indexed by ordinal, padded with ``None`` through
  the ordinal before the write.
- The Python half (``python/tvm_ffi/dataclasses/enum.py``) exposes an
  ``Enum`` base class registered at the same ``ffi.Enum`` key.
  Subclasses declare their FFI type via parameterised inheritance:
  ``class Foo(Enum, type_key="..."):``. ``Enum.__init_subclass__``
  auto-detects whether ``type_key`` is already in the FFI type system
  and routes the class through ``@c_class`` (C++-backed) or
  ``@py_class`` (fresh Python-only) accordingly — no separate
  ``py_enum``/``c_enum`` opt-in exists.
- Variants are declared in the class body in three shapes, all of
  which go through ``__init_subclass__`` scanning and materialise
  singletons cached as class attributes (guaranteeing
  ``Cls.FOO is Cls.FOO``):
    1. ``name: ClassVar[Cls]`` (no assignment) — binds to a pre-existing
       entry with the same ``name`` from the ``__ffi_enum_entries__``
       column (typically registered in C++ via ``refl::EnumDef``), or,
       if none exists, registers a blank Python entry.
    2. ``name: ClassVar[Cls] = entry(field1=..., field2=...)`` —
       registers a Python-side entry and forwards the captured kwargs
       to the subclass's ``__init__`` as user-declared fields.
    3. ``name = auto()`` / ``name: ClassVar[Cls] = auto()`` —
       registers a Python-side entry that carries no user-declared
       fields beyond the auto-assigned ``value``/``name``. Semantically
       equivalent to ``entry()`` with no arguments, but spelled with a
       dedicated helper to keep simple enum bodies uncluttered and to
       give users a discoverable alternative to stdlib-style int sugar
       (which this module deliberately rejects — see Behavioral
       Changes).
- Mixed C++/Python entry registration: on a ``type_key`` whose C++
  type was registered with ``refl::init(false)`` (i.e. has no
  ``__ffi_init__``), the Python side still supports ``entry(...)`` /
  ``auto()`` for fresh Python-side variants. New variants are
  allocated via ``__ffi_new__`` (always registered by
  ``ObjectDef``'s default creator) and populated through the
  reflected ``FFIProperty.set`` frozen-setter escape hatch, exactly
  mirroring what ``reflection::EnumDef`` does in C++. A bare
  ``ClassVar[Cls]`` binder, by contrast, still means
  "bind-to-existing" and raises a descriptive ``RuntimeError`` when
  the named entry is not present in the C++ registry — ordinarily a
  typo, but the error message also points the user at
  ``auto()``/``entry(...)`` if the intent was to add a new variant.
- Both ``value`` (dense ordinal in declaration order) and ``name``
  (declaration key) are auto-populated on every entry; they are never
  user-supplied. ``entry(value=...)`` and ``entry(name=...)`` raise
  ``TypeError`` at class-body time.
- Per-class class-level reflection surface (``by_name``, ``by_value``,
  ``attr_dict``) is exposed through a new ``_ClassProperty`` descriptor
  — a minimal getter descriptor that receives the owning class, letting
  class-level attribute access work without a metaclass.
- Default repr for every ``EnumObj`` subclass is rendered by
  ``ReprPrinter`` (in ``src/ffi/extra/dataclass.cc``) as
  ``<type_key>.<name>`` — the common natural format for enum
  variants. The dispatch happens after the user-registered
  ``__ffi_repr__`` hook check, so explicit per-subclass repr
  overrides still take precedence. Complementarily, the built-in
  sentinels ``MISSING`` and ``KWARGS`` render as ``<MISSING>`` /
  ``<KWARGS>`` via a pointer-identity fast path ahead of any
  type-keyed lookup, replacing the generic ``ffi.Object`` framing.

Public Interfaces:
- New C++ public headers:
  * ``include/tvm/ffi/enum.h`` — ``EnumObj``/``Enum`` + two
    column-name string constants ``kEnumEntriesAttrName`` (=
    ``"__ffi_enum_entries__"``) and ``kEnumAttrsAttrName`` (=
    ``"__ffi_enum_attrs__"``). Both constants are the source of truth
    for the column names; Python mirrors them as
    ``ENUM_ENTRIES_ATTR`` / ``ENUM_ATTRS_ATTR``.
  * ``include/tvm/ffi/reflection/enum_def.h`` — ``refl::EnumDef<Obj>``
    builder with ``.set_attr(name, value)`` chaining and getters for
    ``instance()`` / ``ordinal()``.
- ``include/tvm/ffi/tvm_ffi.h`` now transitively includes both new
  headers, so consumers of the aggregate header get the enum API
  without extra work.
- New Python symbols exported from ``tvm_ffi.dataclasses``:
  ``Enum``, ``EnumAttrMap``, ``entry``, ``auto``. ``auto`` is a
  zero-arg helper returning the same ``_EnumEntry`` sentinel as
  ``entry()``; it is listed alongside ``entry``/``field``/``Field`` in
  ``@dataclass_transform(field_specifiers=...)`` so that
  ``name = auto()`` and ``name: ClassVar[Cls] = auto()`` both
  type-check as field-specifier patterns. Class surface on subclasses:
  ``Cls.get(name)`` / ``Cls.entries()`` / ``Cls.def_attr(name,
  *, default=...)`` + three class-level property views ``Cls.by_name``
  (``Dict[str, Enum]``), ``Cls.by_value`` (``List[Enum]`` indexed by
  ordinal), ``Cls.attr_dict`` (``Dict[str, List[Any]]``).
- ``EnumAttrMap`` round-trips through the unified
  ``__ffi_enum_attrs__`` column instead of per-attribute columns; a
  single Dict column is shared across all attribute names on a given
  class, and each value is a List indexed by variant ordinal.
- ``@dataclass_transform(field_specifiers=(Field, field, entry, auto))``
  on ``Enum`` lets ``ClassVar[Cls] = entry(...)`` and
  ``name = auto()`` type-check as proper field-specifier patterns
  under typing-aware tools.
- C++ test-support type ``testing.TestEnumVariant`` (in
  ``src/ffi/testing/testing.cc``) now extends ``EnumObj`` rather than a
  bare ``Object``; it registers two entries ``Alpha``/``Beta`` via
  ``refl::EnumDef`` with a ``code`` attribute. This is the canonical
  end-to-end demonstration of the builder.

UI/UX:
- none (library-only change; no CLI, REPL, or user-visible UI surface).

Behavioral Changes:
- ``TVMFFITypeRegisterAttr`` again raises ``RuntimeError`` on duplicate
  ``(type_index, attr_name)`` writes. This is a reversal of the
  previously-relaxed "silent overwrite" behaviour: the invariant that
  a TypeAttr slot is registered once is restored and is now load-
  bearing for the ``EnumDef`` register-once-then-mutate protocol. The
  thrown message tells callers to register a mutable container
  (``Dict``/``List``) once and mutate it in place on subsequent calls
  — exactly the pattern the enum builders use internally.
- Distributed enum-entry registration — whether across TUs or across
  Python subclasses re-binding the same type key — now converges on
  shared ``Dict``/``List`` containers. There is no "last writer wins"
  semantics; instead, each registrant atomically appends to the shared
  live containers, and duplicate ``(type, instance_name)`` collisions
  are explicit ``RuntimeError``s at ``EnumDef`` construction.
- Variant declaration forms were narrowed relative to the earlier draft
  of this feature. Bare-int sugar (``ok = 0`` auto-promotion), the
  simple-integer form ``entry(0)``, and the implicit
  ``value: int`` field synthesis are no longer supported. ``value``
  is always auto-assigned as the dense ordinal and is now a concrete
  read-only field on ``EnumObj`` itself. This is a design revision of
  work that has not yet shipped to users (the PR is still open), not a
  breaking change to a released API.
- Integer-literal sugar (e.g. ``ok = 0``) is deliberately *not*
  supported. The auto-ordinal policy owns the ``value`` slot, so a
  user-supplied int would either silently duplicate that assignment
  or conflict with it; both outcomes are worse than a hard
  rejection. ``auto()`` is the intended replacement — it makes the
  "no extra fields" intent explicit without fighting the auto-
  ordinal contract. The ``entry()`` docstring cross-references
  ``auto()``, and the ``Enum`` docstring's "Declaration forms"
  section calls out this rejection inline so discoverability does
  not depend on following ``See Also`` links.
- ``repr(variant)`` on any ``EnumObj`` subclass now produces
  ``<type_key>.<name>`` instead of the generic
  ``type_key(field1=..., field2=...)`` form that ``ReprPrinter``
  would otherwise derive from reflected fields. Attribute-carrying
  variants still surface their fields via attribute access; they
  just no longer leak into the default repr, which is much easier
  to read in REPL output and in nested containers like
  ``by_name``/``by_value``.
- ``repr(core.MISSING)`` and ``repr(core.KWARGS)`` now render as
  ``<MISSING>`` / ``<KWARGS>`` (angle-bracket sentinel convention)
  rather than the generic ``ffi.Object`` fallback. Pointer-identity
  dispatch keeps the change free of type-lookup overhead.
- Bare ``ClassVar[Cls]`` binders on a cxx-backed enum that name no
  existing C++ entry now raise a descriptive ``RuntimeError`` listing
  the known entries and the ``ClassVar`` / ``auto()`` / ``entry(...)``
  syntax, instead of falling through MRO to the ``Enum`` base's
  ``init=False`` ``TypeError`` guard with its misleading "cannot be
  constructed directly" message.
- A cxx-backed enum (``type_key`` already registered in C++) now
  accepts mixed registration: bare ``ClassVar[Cls]`` binders bind to
  existing C++ entries, while ``entry(...)`` / ``auto()`` sentinels
  register *new* Python-side entries whose ordinals continue the
  dense sequence past the C++ count. Previously only bare-binding
  was allowed; new Python entries were rejected.

Docs:
- No user-facing RST/Markdown doc page updated in this change. The
  ``dataclass_reflection.rst`` toctree entries for ``py_class`` /
  ``c_class`` are still commented out, so no sibling ``Enum`` section
  is authored yet. All three declaration forms (bare ``ClassVar``,
  ``entry(...)``, ``auto()``), the auto-assigned ``value``/``name``
  fields, the explicit rejection of integer-literal sugar, and the
  ``by_name``/``by_value``/``attr_dict`` class-level views are fully
  documented in the ``Enum``/``entry``/``auto`` docstrings and the
  C++ header Doxygen comments — including the new Doxygen block on
  the two-arg ``EnumObj(int64_t value, String name)`` constructor
  that was missing before. Follow-up: publish a unified dataclass +
  enum reference once the broader dataclass doc lands.

Tests:
- Executed: ``uv run pytest tests/python/test_dataclass_enum.py -q``
  — 38/38 passing. Covers all three declaration forms (Python-side
  ``ClassVar = entry(...)``, bare-``ClassVar[Cls]`` binding, and the
  ``auto()`` helper in both annotated and bare-assignment shapes),
  explicit rejection of ``entry(value=...)`` / ``entry(name=...)``,
  auto-ordinal assignment, frozen-singleton identity, ``Cls.get`` /
  ``Cls.entries`` / ``Cls.by_name`` / ``Cls.by_value`` /
  ``Cls.attr_dict`` surface, ``def_attr`` round-trips through the
  unified ``__ffi_enum_attrs__`` column (including missing-default
  behaviour, ``__contains__``, cross-enum foreign-variant rejection,
  and fresh-wrapper lookup via ``Cls.get``), direct TypeAttr column
  verification for both ``__ffi_enum_entries__`` and
  ``__ffi_enum_attrs__``, the C++-backed happy path against
  ``testing.TestEnumVariant``'s ``refl::EnumDef``-registered
  ``Alpha``/``Beta`` entries with their ``code`` attribute, six
  dedicated ``auto()`` tests
  (``test_auto_basic_no_annotation``,
  ``test_auto_with_classvar_annotation``,
  ``test_auto_mixed_with_bare_classvar`` — bare ``ClassVar`` binders
  come first in annotation order, then sentinel entries in class-body
  order, with deterministic dense ordinals;
  ``test_auto_mixed_with_entry`` — composition with attribute-carrying
  ``entry(...)`` variants on the same class;
  ``test_auto_rejects_already_registered_name`` — asserts ``auto()``
  is register-not-bind and colliding with a C++-registered entry
  name raises; ``test_auto_returns_fresh_sentinels`` — each call
  yields a distinct ``_EnumEntry`` with empty ``args``/``kwargs``),
  and seven new tests covering the repr and mixed-registration
  behaviour: ``test_default_repr_python_backed``,
  ``test_default_repr_cxx_backed``,
  ``test_default_repr_in_nested_container`` (Dict/List recursion
  through ``by_name`` / ``by_value``),
  ``test_default_repr_with_attribute_carrying_variant``,
  ``test_missing_and_kwargs_sentinel_repr``,
  ``test_cxx_backed_binder_typo_raises_descriptive_error`` (asserts
  the error names the unknown entry, the type key, the known C++
  entries, and the ``ClassVar`` syntax),
  ``test_cxx_backed_mixed_entries_via_auto`` (bare ``ClassVar``
  binders coexist with ``auto()`` entries on
  ``testing.TestEnumVariant``; new ordinals extend past the C++
  count), and ``test_cxx_backed_python_entry_accepts_def_attr``
  (``def_attr`` writes widen the attrs column to cover new
  Python-side ordinals while C++ entries retain defaults).
- Executed: ``pre-commit run --files <all staged>`` — all hooks pass
  (ASF header, file types, end-of-file / trailing whitespace, ruff
  check/format, ty, clang-format).
- Executed: full Python suite ``uv run pytest tests/python -q`` —
  2246 passed, 16 skipped, 3 xfailed. No regressions from the repr
  change or the mixed-registration path.
- CI-driven operational fixes applied in this amend (no behavior
  delta, no test churn):
  * Added a ``/*\! \brief ... */`` Doxygen block on
    ``EnumObj(int64_t value, String name)`` in
    ``include/tvm/ffi/enum.h`` so the doc-build (Doxygen) job no
    longer rejects an undocumented public constructor at
    ``enum.h:71``.
  * Annotated the intentional RAII temporary
    ``refl::ObjectDef<TestEnumVariantObj>(refl::init(false));`` in
    ``src/ffi/testing/testing.cc`` with a
    ``// NOLINT(bugprone-unused-raii)`` suppression plus a short
    preceding comment explaining that the destructor *is* the
    payload (it registers the type), so the clang-tidy job no
    longer fires on what is by construction a correct-use pattern.
  * Centralised the Windows ``/bigobj`` flag inside the
    ``tvm_ffi_add_msvc_flags`` macro in
    ``cmake/Utils/Library.cmake`` so every MSVC target picks it up,
    and removed the now-redundant per-target
    ``target_compile_options(tvm_ffi_objs PRIVATE /bigobj)`` in
    ``CMakeLists.txt``. This fixes the Windows MSVC
    ``error C1128: number of sections exceeded object file format
    limit`` on ``src/ffi/testing/testing.cc`` (caused by heavy
    reflection-template instantiations exceeding the default COFF
    section limit) and guards against the same failure recurring on
    any other TU that grows past the threshold.

Untested Edge Cases:
- C++ GoogleTest suite (``tests/cpp/``) was not re-run. The C++ delta
  touches: (a) restoring ``RegisterAttr``'s duplicate-throw — for which
  no existing test asserts either the throw or the silent-overwrite
  behaviour being reverted; (b) introducing the ``EnumObj``/``Enum``
  root and ``EnumDef`` builder, covered end-to-end by the Python
  tests against ``testing.TestEnumVariant``; (c) the test-only
  refactor of ``TestEnumVariant`` from a bare ``Object`` to an
  ``EnumObj`` subclass; (d) the ``ReprPrinter`` additions for
  ``EnumObj`` subclasses and MISSING/KWARGS — verified via the
  Python suite but not from a C++ GoogleTest. Regression risk is
  low but formally unverified.
- Rust test suite (``cargo test`` under ``rust/``) was not executed.
  No Rust bindings touched; risk is low but untested.
- Cross-module TypeAttr convergence (two independently-loaded Python
  modules registering entries under the same type key from different
  processes / plugin-host isolation contexts) is exercised only
  within a single process; multi-process scenarios remain uncovered.

Refs: apache#554

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@junrushao junrushao force-pushed the junrushao/2026-04-17/py-enum branch from 18ea1b4 to 7d15159 Compare April 18, 2026 07:20
junrushao added a commit to junrushao/tvm-ffi that referenced this pull request Apr 18, 2026
…cked entries

Architecture:
- Introduce a dedicated ``tvm_ffi.dataclasses.enum`` module plus two new
  public C++ headers that jointly define a single cross-language enum
  abstraction. The Python and C++ halves agree on a fixed pair of
  TypeAttr columns and on the on-the-wire representation of each
  registry, so distributed entry registration (one TU in C++, plus one
  or more Python subclasses) converges on the same live containers.
- The C++ half lives in ``include/tvm/ffi/enum.h``: a concrete
  ``EnumObj`` (fields ``int64_t value`` + ``String name``, both
  reflected via ``def_ro``) plus its nullable ObjectRef ``Enum``.
  ``EnumObj`` is registered under the type key ``ffi.Enum`` and is the
  root of every user-defined enum class tree.
- Entry registration is driven by a new builder in
  ``include/tvm/ffi/reflection/enum_def.h``:
  ``refl::EnumDef<EnumClsObj>("Name").set_attr("key", value)...``. Each
  call allocates a fresh dense ordinal (``= len(entries)``), constructs
  the instance, and writes it into a per-type ``__ffi_enum_entries__``
  TypeAttr column storing ``Dict<String, Enum>``.
- ``EnumDef`` follows a strict "register-once-then-mutate" protocol:
  on the first call per type it registers the mutable ``Dict`` via
  ``TVMFFITypeRegisterAttr``; subsequent calls look up the existing
  ``Dict`` via ``TVMFFIGetTypeAttrColumn`` and mutate it in place. The
  same protocol governs the per-class attribute store
  (``__ffi_enum_attrs__``: ``Dict<String, List<Any>>``), where each
  attribute is a list indexed by ordinal, padded with ``None`` through
  the ordinal before the write.
- The Python half (``python/tvm_ffi/dataclasses/enum.py``) exposes an
  ``Enum`` base class registered at the same ``ffi.Enum`` key.
  Subclasses declare their FFI type via parameterised inheritance:
  ``class Foo(Enum, type_key="..."):``. ``Enum.__init_subclass__``
  auto-detects whether ``type_key`` is already in the FFI type system
  and routes the class through ``@c_class`` (C++-backed) or
  ``@py_class`` (fresh Python-only) accordingly — no separate
  ``py_enum``/``c_enum`` opt-in exists.
- Variants are declared in the class body in three shapes, all of
  which go through ``__init_subclass__`` scanning and materialise
  singletons cached as class attributes (guaranteeing
  ``Cls.FOO is Cls.FOO``):
    1. ``name: ClassVar[Cls]`` (no assignment) — binds to a pre-existing
       entry with the same ``name`` from the ``__ffi_enum_entries__``
       column (typically registered in C++ via ``refl::EnumDef``), or,
       if none exists, registers a blank Python entry.
    2. ``name: ClassVar[Cls] = entry(field1=..., field2=...)`` —
       registers a Python-side entry and forwards the captured kwargs
       to the subclass's ``__init__`` as user-declared fields.
    3. ``name = auto()`` / ``name: ClassVar[Cls] = auto()`` —
       registers a Python-side entry that carries no user-declared
       fields beyond the auto-assigned ``value``/``name``. Semantically
       equivalent to ``entry()`` with no arguments, but spelled with a
       dedicated helper to keep simple enum bodies uncluttered and to
       give users a discoverable alternative to stdlib-style int sugar
       (which this module deliberately rejects — see Behavioral
       Changes).
- Mixed C++/Python entry registration: on a ``type_key`` whose C++
  type was registered with ``refl::init(false)`` (i.e. has no
  ``__ffi_init__``), the Python side still supports ``entry(...)`` /
  ``auto()`` for fresh Python-side variants. New variants are
  allocated via ``__ffi_new__`` (always registered by
  ``ObjectDef``'s default creator) and populated through the
  reflected ``FFIProperty.set`` frozen-setter escape hatch, exactly
  mirroring what ``reflection::EnumDef`` does in C++. A bare
  ``ClassVar[Cls]`` binder, by contrast, still means
  "bind-to-existing" and raises a descriptive ``RuntimeError`` when
  the named entry is not present in the C++ registry — ordinarily a
  typo, but the error message also points the user at
  ``auto()``/``entry(...)`` if the intent was to add a new variant.
- Both ``value`` (dense ordinal in declaration order) and ``name``
  (declaration key) are auto-populated on every entry; they are never
  user-supplied. ``entry(value=...)`` and ``entry(name=...)`` raise
  ``TypeError`` at class-body time.
- Per-class class-level reflection surface (``by_name``, ``by_value``,
  ``attr_dict``) is exposed through a new ``_ClassProperty`` descriptor
  — a minimal getter descriptor that receives the owning class, letting
  class-level attribute access work without a metaclass.
- Default repr for every ``EnumObj`` subclass is rendered by
  ``ReprPrinter`` (in ``src/ffi/extra/dataclass.cc``) as
  ``<type_key>.<name>`` — the common natural format for enum
  variants. The dispatch happens after the user-registered
  ``__ffi_repr__`` hook check, so explicit per-subclass repr
  overrides still take precedence. Complementarily, the built-in
  sentinels ``MISSING`` and ``KWARGS`` render as ``<MISSING>`` /
  ``<KWARGS>`` via a pointer-identity fast path ahead of any
  type-keyed lookup, replacing the generic ``ffi.Object`` framing.

Public Interfaces:
- New C++ public headers:
  * ``include/tvm/ffi/enum.h`` — ``EnumObj``/``Enum`` + two
    column-name string constants ``kEnumEntriesAttrName`` (=
    ``"__ffi_enum_entries__"``) and ``kEnumAttrsAttrName`` (=
    ``"__ffi_enum_attrs__"``). Both constants are the source of truth
    for the column names; Python mirrors them as
    ``ENUM_ENTRIES_ATTR`` / ``ENUM_ATTRS_ATTR``.
  * ``include/tvm/ffi/reflection/enum_def.h`` — ``refl::EnumDef<Obj>``
    builder with ``.set_attr(name, value)`` chaining and getters for
    ``instance()`` / ``ordinal()``.
- ``include/tvm/ffi/tvm_ffi.h`` now transitively includes both new
  headers, so consumers of the aggregate header get the enum API
  without extra work.
- New Python symbols exported from ``tvm_ffi.dataclasses``:
  ``Enum``, ``EnumAttrMap``, ``entry``, ``auto``. ``auto`` is a
  zero-arg helper returning the same ``_EnumEntry`` sentinel as
  ``entry()``; it is listed alongside ``entry``/``field``/``Field`` in
  ``@dataclass_transform(field_specifiers=...)`` so that
  ``name = auto()`` and ``name: ClassVar[Cls] = auto()`` both
  type-check as field-specifier patterns. Class surface on subclasses:
  ``Cls.get(name)`` / ``Cls.entries()`` / ``Cls.def_attr(name,
  *, default=...)`` + three class-level property views ``Cls.by_name``
  (``Dict[str, Enum]``), ``Cls.by_value`` (``List[Enum]`` indexed by
  ordinal), ``Cls.attr_dict`` (``Dict[str, List[Any]]``).
- ``EnumAttrMap`` round-trips through the unified
  ``__ffi_enum_attrs__`` column instead of per-attribute columns; a
  single Dict column is shared across all attribute names on a given
  class, and each value is a List indexed by variant ordinal.
- ``@dataclass_transform(field_specifiers=(Field, field, entry, auto))``
  on ``Enum`` lets ``ClassVar[Cls] = entry(...)`` and
  ``name = auto()`` type-check as proper field-specifier patterns
  under typing-aware tools.
- C++ test-support type ``testing.TestEnumVariant`` (in
  ``src/ffi/testing/testing.cc``) now extends ``EnumObj`` rather than a
  bare ``Object``; it registers two entries ``Alpha``/``Beta`` via
  ``refl::EnumDef`` with a ``code`` attribute. This is the canonical
  end-to-end demonstration of the builder.

UI/UX:
- none (library-only change; no CLI, REPL, or user-visible UI surface).

Behavioral Changes:
- ``TVMFFITypeRegisterAttr`` again raises ``RuntimeError`` on duplicate
  ``(type_index, attr_name)`` writes. This is a reversal of the
  previously-relaxed "silent overwrite" behaviour: the invariant that
  a TypeAttr slot is registered once is restored and is now load-
  bearing for the ``EnumDef`` register-once-then-mutate protocol. The
  thrown message tells callers to register a mutable container
  (``Dict``/``List``) once and mutate it in place on subsequent calls
  — exactly the pattern the enum builders use internally.
- Distributed enum-entry registration — whether across TUs or across
  Python subclasses re-binding the same type key — now converges on
  shared ``Dict``/``List`` containers. There is no "last writer wins"
  semantics; instead, each registrant atomically appends to the shared
  live containers, and duplicate ``(type, instance_name)`` collisions
  are explicit ``RuntimeError``s at ``EnumDef`` construction.
- Variant declaration forms were narrowed relative to the earlier draft
  of this feature. Bare-int sugar (``ok = 0`` auto-promotion), the
  simple-integer form ``entry(0)``, and the implicit
  ``value: int`` field synthesis are no longer supported. ``value``
  is always auto-assigned as the dense ordinal and is now a concrete
  read-only field on ``EnumObj`` itself. This is a design revision of
  work that has not yet shipped to users (the PR is still open), not a
  breaking change to a released API.
- Integer-literal sugar (e.g. ``ok = 0``) is deliberately *not*
  supported. The auto-ordinal policy owns the ``value`` slot, so a
  user-supplied int would either silently duplicate that assignment
  or conflict with it; both outcomes are worse than a hard
  rejection. ``auto()`` is the intended replacement — it makes the
  "no extra fields" intent explicit without fighting the auto-
  ordinal contract. The ``entry()`` docstring cross-references
  ``auto()``, and the ``Enum`` docstring's "Declaration forms"
  section calls out this rejection inline so discoverability does
  not depend on following ``See Also`` links.
- ``repr(variant)`` on any ``EnumObj`` subclass now produces
  ``<type_key>.<name>`` instead of the generic
  ``type_key(field1=..., field2=...)`` form that ``ReprPrinter``
  would otherwise derive from reflected fields. Attribute-carrying
  variants still surface their fields via attribute access; they
  just no longer leak into the default repr, which is much easier
  to read in REPL output and in nested containers like
  ``by_name``/``by_value``.
- ``repr(core.MISSING)`` and ``repr(core.KWARGS)`` now render as
  ``<MISSING>`` / ``<KWARGS>`` (angle-bracket sentinel convention)
  rather than the generic ``ffi.Object`` fallback. Pointer-identity
  dispatch keeps the change free of type-lookup overhead.
- Bare ``ClassVar[Cls]`` binders on a cxx-backed enum that name no
  existing C++ entry now raise a descriptive ``RuntimeError`` listing
  the known entries and the ``ClassVar`` / ``auto()`` / ``entry(...)``
  syntax, instead of falling through MRO to the ``Enum`` base's
  ``init=False`` ``TypeError`` guard with its misleading "cannot be
  constructed directly" message.
- A cxx-backed enum (``type_key`` already registered in C++) now
  accepts mixed registration: bare ``ClassVar[Cls]`` binders bind to
  existing C++ entries, while ``entry(...)`` / ``auto()`` sentinels
  register *new* Python-side entries whose ordinals continue the
  dense sequence past the C++ count. Previously only bare-binding
  was allowed; new Python entries were rejected.

Docs:
- No user-facing RST/Markdown doc page updated in this change. The
  ``dataclass_reflection.rst`` toctree entries for ``py_class`` /
  ``c_class`` are still commented out, so no sibling ``Enum`` section
  is authored yet. All three declaration forms (bare ``ClassVar``,
  ``entry(...)``, ``auto()``), the auto-assigned ``value``/``name``
  fields, the explicit rejection of integer-literal sugar, and the
  ``by_name``/``by_value``/``attr_dict`` class-level views are fully
  documented in the ``Enum``/``entry``/``auto`` docstrings and the
  C++ header Doxygen comments — including the new Doxygen block on
  the two-arg ``EnumObj(int64_t value, String name)`` constructor
  that was missing before. Follow-up: publish a unified dataclass +
  enum reference once the broader dataclass doc lands.

Tests:
- Executed: ``uv run pytest tests/python/test_dataclass_enum.py -q``
  — 38/38 passing. Covers all three declaration forms (Python-side
  ``ClassVar = entry(...)``, bare-``ClassVar[Cls]`` binding, and the
  ``auto()`` helper in both annotated and bare-assignment shapes),
  explicit rejection of ``entry(value=...)`` / ``entry(name=...)``,
  auto-ordinal assignment, frozen-singleton identity, ``Cls.get`` /
  ``Cls.entries`` / ``Cls.by_name`` / ``Cls.by_value`` /
  ``Cls.attr_dict`` surface, ``def_attr`` round-trips through the
  unified ``__ffi_enum_attrs__`` column (including missing-default
  behaviour, ``__contains__``, cross-enum foreign-variant rejection,
  and fresh-wrapper lookup via ``Cls.get``), direct TypeAttr column
  verification for both ``__ffi_enum_entries__`` and
  ``__ffi_enum_attrs__``, the C++-backed happy path against
  ``testing.TestEnumVariant``'s ``refl::EnumDef``-registered
  ``Alpha``/``Beta`` entries with their ``code`` attribute, six
  dedicated ``auto()`` tests
  (``test_auto_basic_no_annotation``,
  ``test_auto_with_classvar_annotation``,
  ``test_auto_mixed_with_bare_classvar`` — bare ``ClassVar`` binders
  come first in annotation order, then sentinel entries in class-body
  order, with deterministic dense ordinals;
  ``test_auto_mixed_with_entry`` — composition with attribute-carrying
  ``entry(...)`` variants on the same class;
  ``test_auto_rejects_already_registered_name`` — asserts ``auto()``
  is register-not-bind and colliding with a C++-registered entry
  name raises; ``test_auto_returns_fresh_sentinels`` — each call
  yields a distinct ``_EnumEntry`` with empty ``args``/``kwargs``),
  and seven new tests covering the repr and mixed-registration
  behaviour: ``test_default_repr_python_backed``,
  ``test_default_repr_cxx_backed``,
  ``test_default_repr_in_nested_container`` (Dict/List recursion
  through ``by_name`` / ``by_value``),
  ``test_default_repr_with_attribute_carrying_variant``,
  ``test_missing_and_kwargs_sentinel_repr``,
  ``test_cxx_backed_binder_typo_raises_descriptive_error`` (asserts
  the error names the unknown entry, the type key, the known C++
  entries, and the ``ClassVar`` syntax),
  ``test_cxx_backed_mixed_entries_via_auto`` (bare ``ClassVar``
  binders coexist with ``auto()`` entries on
  ``testing.TestEnumVariant``; new ordinals extend past the C++
  count), and ``test_cxx_backed_python_entry_accepts_def_attr``
  (``def_attr`` writes widen the attrs column to cover new
  Python-side ordinals while C++ entries retain defaults).
- Executed: ``pre-commit run --files <all staged>`` — all hooks pass
  (ASF header, file types, end-of-file / trailing whitespace, ruff
  check/format, ty, clang-format).
- Executed: full Python suite ``uv run pytest tests/python -q`` —
  2246 passed, 16 skipped, 3 xfailed. No regressions from the repr
  change or the mixed-registration path.
- CI-driven operational fixes applied in this amend (no behavior
  delta, no test churn):
  * Added a ``/*\! \brief ... */`` Doxygen block on
    ``EnumObj(int64_t value, String name)`` in
    ``include/tvm/ffi/enum.h`` so the doc-build (Doxygen) job no
    longer rejects an undocumented public constructor at
    ``enum.h:71``.
  * Annotated the intentional RAII temporary
    ``refl::ObjectDef<TestEnumVariantObj>(refl::init(false));`` in
    ``src/ffi/testing/testing.cc`` with a
    ``// NOLINT(bugprone-unused-raii)`` suppression plus a short
    preceding comment explaining that the destructor *is* the
    payload (it registers the type), so the clang-tidy job no
    longer fires on what is by construction a correct-use pattern.
  * Centralised the Windows ``/bigobj`` flag inside the
    ``tvm_ffi_add_msvc_flags`` macro in
    ``cmake/Utils/Library.cmake`` so every MSVC target picks it up,
    and removed the now-redundant per-target
    ``target_compile_options(tvm_ffi_objs PRIVATE /bigobj)`` in
    ``CMakeLists.txt``. This fixes the Windows MSVC
    ``error C1128: number of sections exceeded object file format
    limit`` on ``src/ffi/testing/testing.cc`` (caused by heavy
    reflection-template instantiations exceeding the default COFF
    section limit) and guards against the same failure recurring on
    any other TU that grows past the threshold.

Untested Edge Cases:
- C++ GoogleTest suite (``tests/cpp/``) was not re-run. The C++ delta
  touches: (a) restoring ``RegisterAttr``'s duplicate-throw — for which
  no existing test asserts either the throw or the silent-overwrite
  behaviour being reverted; (b) introducing the ``EnumObj``/``Enum``
  root and ``EnumDef`` builder, covered end-to-end by the Python
  tests against ``testing.TestEnumVariant``; (c) the test-only
  refactor of ``TestEnumVariant`` from a bare ``Object`` to an
  ``EnumObj`` subclass; (d) the ``ReprPrinter`` additions for
  ``EnumObj`` subclasses and MISSING/KWARGS — verified via the
  Python suite but not from a C++ GoogleTest. Regression risk is
  low but formally unverified.
- Rust test suite (``cargo test`` under ``rust/``) was not executed.
  No Rust bindings touched; risk is low but untested.
- Cross-module TypeAttr convergence (two independently-loaded Python
  modules registering entries under the same type key from different
  processes / plugin-host isolation contexts) is exercised only
  within a single process; multi-process scenarios remain uncovered.

Refs: apache#554
@junrushao junrushao force-pushed the junrushao/2026-04-17/py-enum branch 2 times, most recently from 9cd16d7 to 9339fe1 Compare April 18, 2026 07:39
@junrushao junrushao marked this pull request as ready for review April 18, 2026 08:21
@junrushao junrushao changed the title feat(dataclasses): add Enum base with __init_subclass__ + TypeAttr-backed entries feat(core): Introduce Attribute-Carrying Language-Agnostic Enums Apr 18, 2026
Comment thread include/tvm/ffi/enum.h
@junrushao junrushao force-pushed the junrushao/2026-04-17/py-enum branch from 9339fe1 to b89c2f9 Compare April 18, 2026 21:06
@junrushao
Copy link
Copy Markdown
Member Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces cross-language enum support to the TVM FFI, providing C++ base classes, registration utilities, and a Python Enum dataclass for shared registries. The review feedback identifies a critical memory management bug in the Cython bindings where an incorrect DecRef call on a non-owning view could lead to memory corruption. Additionally, the feedback points out a misleading docstring regarding attribute registration, suggests optimizing the entries() iterator by leveraging the by_value property, and notes a potential inconsistency in EnumAttrMap's membership checks when handling None values.

Comment thread python/tvm_ffi/cython/object.pxi Outdated
Comment on lines +764 to +765
if temp.type_index >= kTVMFFIStaticObjectBegin and temp.v_obj != NULL:
TVMFFIObjectDecRef(<TVMFFIObjectHandle>temp.v_obj)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This TVMFFIObjectDecRef call is likely to cause memory corruption or use-after-free. TVMFFIPyPyObjectToFFIAny with TVMFFIPyArgSetterFactory_ produces a non-owning view of the object (it does not increment the reference count). Calling DecRef in the finally block will decrement the reference count of the original object passed as value, potentially destroying it prematurely. The CAny.__init__ implementation in the same file correctly uses TVMFFIAnyViewToOwnedAny to acquire ownership from such a view without a corresponding DecRef on the view itself.

        # Remove the DecRef call as temp is a non-owning view
        pass

Comment thread python/tvm_ffi/cython/object.pxi Outdated
Comment thread python/tvm_ffi/dataclasses/enum.py Outdated
Comment thread python/tvm_ffi/dataclasses/enum.py
- `_register_type_attr`: drop the spurious `TVMFFIObjectDecRef` in the
  `finally` block.  `TVMFFIPyPyObjectToFFIAny` produces a non-owning
  view; `TVMFFITypeRegisterAttr` stores via an `AnyView`-to-`Any`
  assignment that incref's internally, so no caller-side refcount
  management is needed.  Update the docstring to reflect the new
  "raises on duplicate" behavior of the underlying C++ registrar.
- `Enum.entries`: simplify to `iter(cls.by_value)` (already
  ordinal-indexed) instead of a sort-on-every-call.
- `EnumAttrMap.__setitem__`: reject `None` with `TypeError` — `None`
  is the column's "unset" sentinel (matching C++
  `EnumDef::set_attr` `Any(nullptr)` padding), so an explicit
  `attr[variant] = None` would be indistinguishable from unset and
  silently break `__contains__` / `__getitem__`.  Document the
  restriction in `def_attr` and `EnumAttrMap`.
- Add regression test for the `None`-rejection behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@junrushao
Copy link
Copy Markdown
Member Author

Thanks for the review! Addressed all four in fe191101:

1. _register_type_attr TVMFFIObjectDecRef (critical) — confirmed and removed. TVMFFIPyPyObjectToFFIAny + TVMFFIPyArgSetterFactory_ produces a non-owning view (same pattern as CAny.__init__ at object.pxi:818-825, which converts via TVMFFIAnyViewToOwnedAny precisely because temp isn't owned). On the store side, RegisterTypeAttr at src/ffi/object.cc:293 does AnyView value_view = AnyView::CopyFromTVMFFIAny(*value) and then slot = value_view (line 330) — the Any::operator=(AnyView) assignment incref's, so the table holds its own owning reference and no caller-side refcount management is needed.

2. _register_type_attr docstring (medium) — fixed. Docstring now says "raises RuntimeError if already registered" and directs callers to the "register a mutable container once, mutate in place" pattern, matching src/ffi/object.cc:324-329.

3. Enum.entries() inefficiency (medium) — fixed. Simplified to return iter(cls.by_value)by_value is already ordinal-indexed.

4. EnumAttrMap.__contains__ None-sentinel ambiguity (medium) — enforced in __setitem__. None is reserved as the column's "unset" sentinel to match C++ EnumDef::set_attr's Any(nullptr) padding (enum_def.h:124), so attr[variant] = None now raises TypeError rather than silently desyncing __contains__ / __getitem__. Documented the restriction in def_attr and EnumAttrMap; added test_def_attr_rejects_none_write as a regression test.

All 2247 existing Python tests still pass.

@junrushao junrushao merged commit 6066f60 into apache:main Apr 18, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] Cross-Language Enum Support (Enum base class)

3 participants