Summary
The pattern "([^\\"]+|\\.)*" fails to extract the correct group value. The tagged DFA tracks groups character-by-character but doesn't handle the case where alternation with + appears inside a * quantifier correctly.
Failing PCRE Test
- Pattern:
"([^\\"]+|\\.)*"
- Input:
"1234\"5678"
- Expected: group 1 =
5678 (last iteration capture)
- Actual: wrong group value
Expected gain: +1 PCRE conformance test (Category 9)
Root Cause
DFA_UNROLLED_WITH_GROUPS tracks group capture position per character, not per quantifier iteration. When a * outer quantifier drives an inner alternation containing +, the iteration boundary resets are lost — the final group capture ends up pointing to the wrong position.
Implementation Notes
Summary
The pattern
"([^\\"]+|\\.)*"fails to extract the correct group value. The tagged DFA tracks groups character-by-character but doesn't handle the case where alternation with+appears inside a*quantifier correctly.Failing PCRE Test
"([^\\"]+|\\.)*""1234\"5678"5678(last iteration capture)Expected gain: +1 PCRE conformance test (Category 9)
Root Cause
DFA_UNROLLED_WITH_GROUPStracks group capture position per character, not per quantifier iteration. When a*outer quantifier drives an inner alternation containing+, the iteration boundary resets are lost — the final group capture ends up pointing to the wrong position.Implementation Notes
doc/plans/pcre-conformance-roadmap.mdas Phase 2, item 2.5DFAUnrolledBytecodeGenerator.java(tagged DFA group tracking)