Skip to content

[pcre] Nested groups with literal digits and backreferences produce wrong captures #34

@jbachorik

Description

@jbachorik

Summary

Patterns that mix deeply nested capturing groups, literal digit characters immediately following a backref, and multiple independent groups fail to extract the correct group values.

Failing PCRE Test

  • Pattern: (cat(a(ract|tonic)|erpillar)) \1()2(3)
  • Input: cataract cataract23
  • Expected: group 1 = cataract, group 2 = ata, group 3 = ract, group 4 = ``, group 5 = 3
  • Actual: wrong values or match failure

Expected gain: +2 PCRE conformance tests (Category 9)

Root Cause

The backref \1 followed by the literal 2 is likely parsed or matched incorrectly (ambiguity between \12 and \1 + 2). Additionally, the empty group () and subsequent literal group (3) may not be tracked correctly in the current group-capture logic.

Implementation Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions