Skip to content

Commit a59b863

Browse files
committed
gh-148762: speed up SRE_AT_BEGINNING_LINE regexes
`SRE(search)` has an early exit for `SRE_AT_BEGINNING` and `SRE_AT_BEGINNING_STRING`, but lacks fast-forward for `SRE_AT_BEGINNING_LINE`. This means that a regex of the following form is slow: ``` re.compile("^foo", re.MULTILINE) ``` The current implementation does a character-by-character loop that calls `SRE(match)` each time. This is rather expensive function call. This commit * ensures `SRE(match)` is only called right after a newline * optimizes fast-forwarding to the next newline by calling `memchr` in the UCS1 case This can lead to 10x or even 100x speedups in the no-match case with long lines, while not causing overhead in the case of short lines. Signed-off-by: Harmen Stoppels <harmenstoppels@gmail.com>
1 parent 7ce737e commit a59b863

File tree

2 files changed

+37
-6
lines changed

2 files changed

+37
-6
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Multiline regexes starting with a caret, such as ``re.compile("^foo",
2+
re.MULTILINE)``, now run significantly faster.

Modules/_sre/sre_lib.h

Lines changed: 35 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1854,12 +1854,41 @@ SRE(search)(SRE_STATE* state, SRE_CODE* pattern)
18541854
state->start = state->ptr = ptr = end;
18551855
return 0;
18561856
}
1857-
while (status == 0 && ptr < end) {
1858-
ptr++;
1859-
RESET_CAPTURE_GROUP();
1860-
TRACE(("|%p|%p|SEARCH\n", pattern, ptr));
1861-
state->start = state->ptr = ptr;
1862-
status = SRE(match)(state, pattern, 0);
1857+
if (pattern[0] == SRE_OP_AT && pattern[1] == SRE_AT_BEGINNING_LINE) {
1858+
/* skip to line boundaries */
1859+
while (status == 0 && ptr < end) {
1860+
ptr++;
1861+
if (!SRE_IS_LINEBREAK((int) ptr[-1])) {
1862+
#if SIZEOF_SRE_CHAR == 1
1863+
ptr = (SRE_CHAR *)memchr(ptr, '\n', end - ptr);
1864+
if (!ptr) {
1865+
break;
1866+
}
1867+
#else
1868+
while (ptr < end && !SRE_IS_LINEBREAK((int) *ptr)) {
1869+
ptr++;
1870+
}
1871+
if (ptr >= end) {
1872+
break;
1873+
}
1874+
#endif
1875+
/* advance to after the new line character */
1876+
ptr++;
1877+
}
1878+
RESET_CAPTURE_GROUP();
1879+
TRACE(("|%p|%p|SEARCH\n", pattern, ptr));
1880+
state->start = state->ptr = ptr;
1881+
status = SRE(match)(state, pattern, 0);
1882+
}
1883+
}
1884+
else {
1885+
while (status == 0 && ptr < end) {
1886+
ptr++;
1887+
RESET_CAPTURE_GROUP();
1888+
TRACE(("|%p|%p|SEARCH\n", pattern, ptr));
1889+
state->start = state->ptr = ptr;
1890+
status = SRE(match)(state, pattern, 0);
1891+
}
18631892
}
18641893
}
18651894

0 commit comments

Comments
 (0)