Skip to content

[pcre] Self-referencing backreferences not supported (e.g. (a\1?){4}) #39

@jbachorik

Description

@jbachorik

Summary

Groups that reference themselves (e.g., (a\1?)) require the backref to resolve against the current partial capture of the group being built. This is fundamentally different from normal backrefs and is not supported.

Failing PCRE Tests (Category 4 — 3 tests)

  • ^(a\1?)(a\1?)(a\2?)(a\3?)$ — should match aaaa, aaaaaa
  • ^(a\1?){4}$ — should match aaaa

Expected gain: +3 PCRE conformance tests

Root Cause

Quantifiers currently do not implement "last-iteration semantics" — they don't update the group's captured value on each iteration before the next iteration begins. Without this, \1 inside a quantified group always reads the group value from the previous complete match, not the current partial one.

Fix requires implementing per-iteration group capture updates in RecursiveDescentBytecodeGenerator.visitQuantifier().

Implementation Notes

  • Tracked in doc/plans/pcre-conformance-roadmap.md as Phase 4, item 4.1
  • Difficulty: Very High — may require architectural changes
  • Tests are skipped by default; run with -Dreggie.test.knownFailures=true
  • File: RecursiveDescentBytecodeGenerator.java

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions