gh-151788: Speed up http.server directory listing by using os.scandir() by mjbommar · Pull Request #151789 · python/cpython

mjbommar · 2026-06-20T13:01:16Z

Issue: Speed up SimpleHTTPRequestHandler.list_directory() by using os.scandir() #151788

SimpleHTTPRequestHandler.list_directory() called os.path.isdir() (a stat) and os.path.islink() (an lstat) for every entry — two stat-family syscalls per file. That is wasted work on any filesystem and dominates listing time for large directories; on network filesystems such as NFS, where each call is a round-trip, it becomes severe.

This switches to os.scandir(), whose DirEntry objects report the entry type from the directory read itself (POSIX d_type / NFS READDIRPLUS), eliminating the per-entry stats in the common case. CPython already made this exact migration for os.walk(), glob, and pathlib.Path.iterdir() (gh-117727); http.server was simply missed.

Behavior preserved

DirEntry.is_dir() / is_symlink() match os.path.isdir / os.path.islink semantics — same follow-symlinks behavior and same return-False-on-error behavior — verified across real dirs/files, symlink-to-dir (still rendered with @ but linked with /), symlink-to-file, and broken symlinks. The existing Lib/test/test_httpservers.py suite passes unchanged (92/92), including the undecodable/unencodable filename cases that exercise surrogate-escaped names.

Benchmark

Directory with 1000 files + 1000 dirs:

	value
stat-family syscalls (`strace`), old	4088
stat-family syscalls, new	88 (constant interpreter startup; per-entry loop ≈ 0)
local filesystem wall-clock	~10× faster
emulated NFS (per-`stat` latency injected)	listing drops from seconds to ~2 ms

Caveat (no overselling): the large NFS figures assume the filesystem reports entry types in the directory read (d_type; local filesystems and NFS with READDIRPLUS, the Linux default). If a mount returns DT_UNKNOWN, os.scandir() falls back to one cached lstat per entry — still fewer calls than today, and never worse.

This PR was prepared with AI assistance (Claude Code). I reviewed the change and benchmarks and can explain it in my own words.

…candir SimpleHTTPRequestHandler.list_directory() called os.path.isdir() and os.path.islink() for every entry, issuing two stat-family syscalls per file. This is wasted work on any filesystem and dominates listing time for large directories; on network filesystems such as NFS, where each call is a round-trip, it becomes severe. Use os.scandir(), whose DirEntry objects report the type from the directory read itself (d_type / READDIRPLUS), eliminating the per-entry stats in the common case and never doing more work than before. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

bedevere-app · 2026-06-20T14:36:58Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

- Wrap entry.is_dir()/is_symlink() in try/except OSError, falling back to False to exactly mirror os.path.isdir()/islink() (matches os.walk()). - Reword the NEWS entry to avoid speedup-multiplier claims; describe it as improving list_directory() on systems with slow stat calls. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Remove the performance rationale from the inline comment; keep only the note explaining the OSError fallback (correctness, not performance). - Reword the NEWS entry to state the mechanism without magnitude claims. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

mjbommar · 2026-06-21T10:50:45Z

I have made the requested changes; please review again

bedevere-app · 2026-06-21T10:50:50Z

Thanks for making the requested changes!

@picnixz: please review the changes made to this pull request.

picnixz · 2026-06-21T11:08:25Z

Please fix the CI first before asking for a rereview.

Drop the :meth: cross-reference to list_directory, which is not a documented method and fails the docs "new NEWS nits" check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

mjbommar · 2026-06-21T11:39:20Z

@picnixz sorry, thought this was just test flake. The expected checks are all green now other than your review

- Drop the local `name` variable in favor of entry.name. - Reword the OSError comment per review. - Move the "Append /" comment above the is_dir check. - Reword the NEWS entry per review (fix call -> calls). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

picnixz · 2026-06-21T12:37:55Z

+                displayname = entry.name + "/"
+                linkname = entry.name + "/"
+            if is_symlink:
+                displayname = entry.name + "@"


Can you avoid having an LLM doing those suggestions please?

Sure, AFK today but can do that tonight. Emacs doesn't work so well on phone. You want PR clean without Claude?

What I (personally) want usually is:

human interaction (I don't want to communicate with an agent, that's out the question. I don't care whether the text has been written by the AI as long as it's not a C/C without any human review before sending it but I don't want an automated conversation)

commits that are always reviewed by a human before even committing them; in general this means that the AI is giving you the rough idea or contents but you are responsible for cleaning it up, and addressing things that could help the next review. In this case, I considered the name = entry.name to be un-necessary so reducing the diff only required to keep the same code and inlining that access. It didn't mean to rewrite all the occurrences with entry.name.

Our policy is that AI is a tool that can be used but it's not something meant to generate PRs for users that would later take credit for (I'm not saying you are, I'm saying that just how AI PRs usually go: most of the time, people just want to have a contribution and so they simply leave their agent in automode).

So, the problem generally is not the AI, it's more that whenever there is an LLM, that LLM tends to choose things that are either useless, weird, and that would have been avoided if a human had directly written the lines. If you prefer writing your code through Claude, it's ok, but keep in mind that reviewers are not responsible for guiding the AI, it's the contributor. If we keep telling their agent through our review "no, not like that", it's as if we were directly using an agent in the least efficient way, and hence we are both losing our time.

Ah. FWIW, Claude pointed out that your suggestion resulted in a NameError at string concat below, so it proposed this as easiest way to avoid name global and I accepted it

Oh, my bad actually it should have been displayname = linkname = name = entry.name. But yeah, in this case, I would have preferred to know why it decided to do so. And then I would have suggested:

displayname = linkname = name = entry.name

instead of the two lines. That's why I prefer that you tell me this first before changing otherwise I would get confused.

So to reiterate: there is nothing wrong with using Claude (the only think I don't like about it is the long summaries in every PR and trivial things it reports like "all tests pass" which is something I would expect for a PR otherwise it doesn't make sense). It's just that having a 3rd actor usually creates more back and forth between the reviewer and the reviewee.

picnixz · 2026-06-21T13:17:19Z

-            fullname = os.path.join(path, name)
-            displayname = linkname = name
+        for entry in entries:
+            displayname = linkname = entry.name


Suggested change

displayname = linkname = entry.name

displayname = linkname = name = entry.name

picnixz · 2026-06-21T13:17:28Z

+                displayname = entry.name + "/"
+                linkname = entry.name + "/"
+            if is_symlink:
+                displayname = entry.name + "@"


Suggested change

displayname = entry.name + "/"

linkname = entry.name + "/"

if is_symlink:

displayname = entry.name + "@"

displayname = name + "/"

linkname = name + "/"

if is_symlink:

displayname = name + "@"

bedevere-app Bot mentioned this pull request Jun 20, 2026

Speed up SimpleHTTPRequestHandler.list_directory() by using os.scandir() #151788

Open

bedevere-app Bot added the awaiting review label Jun 20, 2026

picnixz requested changes Jun 20, 2026

View reviewed changes

Comment thread Lib/http/server.py Outdated

Comment thread Misc/NEWS.d/next/Library/2026-06-20-09-00-26.gh-issue-151788.Gr8IxK.rst Outdated

Comment thread Lib/http/server.py Outdated

bedevere-app Bot removed the awaiting review label Jun 20, 2026

bedevere-app Bot added the awaiting changes label Jun 20, 2026

mjbommar and others added 2 commits June 20, 2026 14:55

bedevere-app Bot added awaiting change review and removed awaiting changes labels Jun 21, 2026

bedevere-app Bot requested a review from picnixz June 21, 2026 10:50

pythongh-151788: Fix NEWS entry doc reference

1138200

Drop the :meth: cross-reference to list_directory, which is not a documented method and fails the docs "new NEWS nits" check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>