Skip to content

feat_query_normalizer_typed_pipeline: Typed normalizer pipeline (ordered step list) + JS snippet + smart-quote contractions #404

@SoundMindsAI

Description

@SoundMindsAI

Why

  • Problem: Phase 1 ships query normalization as a single reserved Categorical param (query_normalizer) with exactly four hard-coded bundles (none, lowercase, lowercase+trim, lowercase+trim+expand_contractions), an English-only ASCII-apostrophe contraction dictionary, and a Python-only PR-body reference snippet. Three follow-on capabilities are deferred in Phase 1's decision log until operator signal motivates them: operators cannot compose arbitrary ordered step sequences (e.g., [trim, lowercase, expand_contractions] without the bundles' fixed nesting); operators with Node/Bun query layers must hand-translate the Python snippet (D-6); and queries carrying the Unicode right-single-quote U+2019 () miss contraction expansion that ASCII U+0027 (') queries get (D-7).
  • Outcome: A new typed search-space member NormalizerPipelineParam lets a template declare an ordered list of normalization steps; the Optuna loop samples over the powerset of declared steps and proposes the winning sequence. The four Phase 1 bundles become desugar aliases the validator expands into step sequences, so Phase 1's wire contract continues to validate unchanged. The PR body offers both a Python and a JS/TypeScript reference snippet for the winning sequence, and the contraction matcher expands both ASCII and smart-quote apostrophes.

…(see linked file for the rest)

Status

Definition of done

Spec defines AC-1: normalize_pipeline applies steps in canonical order, AC-2: NormalizerPipelineParam parses via the discriminator + rejects duplicates, AC-3: Cardinality counts the powerset, AC-4: Sampler suggests over powerset labels, AC-5: Bundle backward-compat (I-3), AC-6: Smart-quote expansion (FR-3), AC-7: PR body emits both snippets, AC-8: Three-way snippet parity (I-2), AC-9: Frontend builder row (FR-6), AC-10: compute_default_params empty-pipeline default (FR-7), AC-12: normalizer_pipeline rejected under a non-reserved key (D-10), AC-13: New (non-bundle) winning label is fully handled, AC-11: End-to-end — typed pipeline study against the live stack. Each must pass before merge.

Artifacts

How to execute — ⚠️ BLOCKED on #403

Do NOT /impl-execute until #403 (feat_query_normalization_tuning) merges — the spec calls Phase 1 a total blocker. The feature_spec.md + implementation_plan.md are cross-model-reviewed design-ahead artifacts. Once #403 ships:

/impl-execute docs/00_overview/planned_features/02_mvp2/feat_query_normalizer_typed_pipeline/implementation_plan.md --all

Notes

This issue is part of the MVP2 backlog issue-coverage sweep (2026-06-02) — every active MVP2 folder should have a tracking issue so external contributors can discover the work without grep-ing the planned-features tree. If you pick this up, drop a comment so others don't duplicate; if you find the linked idea/spec stale, run /idea-preflight first to refresh it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    blockedCannot start — gated on another issue/condition (e.g. design-ahead Phase N)mvp2MVP2 backlog itemtype/featureFeature — new product capability

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions