Skip to content

gh-151788: Speed up http.server directory listing by using os.scandir()#151789

Open
mjbommar wants to merge 5 commits into
python:mainfrom
mjbommar:fix/httpserver-scandir-listing
Open

gh-151788: Speed up http.server directory listing by using os.scandir()#151789
mjbommar wants to merge 5 commits into
python:mainfrom
mjbommar:fix/httpserver-scandir-listing

Conversation

@mjbommar

Copy link
Copy Markdown
Contributor

SimpleHTTPRequestHandler.list_directory() called os.path.isdir() (a stat) and os.path.islink() (an lstat) for every entry — two stat-family syscalls per file. That is wasted work on any filesystem and dominates listing time for large directories; on network filesystems such as NFS, where each call is a round-trip, it becomes severe.

This switches to os.scandir(), whose DirEntry objects report the entry type from the directory read itself (POSIX d_type / NFS READDIRPLUS), eliminating the per-entry stats in the common case. CPython already made this exact migration for os.walk(), glob, and pathlib.Path.iterdir() (gh-117727); http.server was simply missed.

Behavior preserved

DirEntry.is_dir() / is_symlink() match os.path.isdir / os.path.islink semantics — same follow-symlinks behavior and same return-False-on-error behavior — verified across real dirs/files, symlink-to-dir (still rendered with @ but linked with /), symlink-to-file, and broken symlinks. The existing Lib/test/test_httpservers.py suite passes unchanged (92/92), including the undecodable/unencodable filename cases that exercise surrogate-escaped names.

Benchmark

Directory with 1000 files + 1000 dirs:

value
stat-family syscalls (strace), old 4088
stat-family syscalls, new 88 (constant interpreter startup; per-entry loop ≈ 0)
local filesystem wall-clock ~10× faster
emulated NFS (per-stat latency injected) listing drops from seconds to ~2 ms

Caveat (no overselling): the large NFS figures assume the filesystem reports entry types in the directory read (d_type; local filesystems and NFS with READDIRPLUS, the Linux default). If a mount returns DT_UNKNOWN, os.scandir() falls back to one cached lstat per entry — still fewer calls than today, and never worse.


This PR was prepared with AI assistance (Claude Code). I reviewed the change and benchmarks and can explain it in my own words.

…candir

SimpleHTTPRequestHandler.list_directory() called os.path.isdir() and
os.path.islink() for every entry, issuing two stat-family syscalls per
file. This is wasted work on any filesystem and dominates listing time
for large directories; on network filesystems such as NFS, where each
call is a round-trip, it becomes severe.

Use os.scandir(), whose DirEntry objects report the type from the
directory read itself (d_type / READDIRPLUS), eliminating the per-entry
stats in the common case and never doing more work than before.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread Lib/http/server.py Outdated
Comment thread Misc/NEWS.d/next/Library/2026-06-20-09-00-26.gh-issue-151788.Gr8IxK.rst Outdated
Comment thread Lib/http/server.py Outdated
@bedevere-app

bedevere-app Bot commented Jun 20, 2026

Copy link
Copy Markdown

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

mjbommar and others added 2 commits June 20, 2026 14:55
- Wrap entry.is_dir()/is_symlink() in try/except OSError, falling back to
  False to exactly mirror os.path.isdir()/islink() (matches os.walk()).
- Reword the NEWS entry to avoid speedup-multiplier claims; describe it as
  improving list_directory() on systems with slow stat calls.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Remove the performance rationale from the inline comment; keep only the
  note explaining the OSError fallback (correctness, not performance).
- Reword the NEWS entry to state the mechanism without magnitude claims.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mjbommar

Copy link
Copy Markdown
Contributor Author

I have made the requested changes; please review again

@bedevere-app

bedevere-app Bot commented Jun 21, 2026

Copy link
Copy Markdown

Thanks for making the requested changes!

@picnixz: please review the changes made to this pull request.

@bedevere-app bedevere-app Bot requested a review from picnixz June 21, 2026 10:50
@picnixz

picnixz commented Jun 21, 2026

Copy link
Copy Markdown
Member

Please fix the CI first before asking for a rereview.

Drop the :meth: cross-reference to list_directory, which is not a
documented method and fails the docs "new NEWS nits" check.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mjbommar

Copy link
Copy Markdown
Contributor Author

@picnixz sorry, thought this was just test flake. The expected checks are all green now other than your review

Comment thread Lib/http/server.py Outdated
Comment thread Lib/http/server.py Outdated
Comment thread Lib/http/server.py
Comment thread Misc/NEWS.d/next/Library/2026-06-20-09-00-26.gh-issue-151788.Gr8IxK.rst Outdated
- Drop the local `name` variable in favor of entry.name.
- Reword the OSError comment per review.
- Move the "Append /" comment above the is_dir check.
- Reword the NEWS entry per review (fix call -> calls).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread Lib/http/server.py
Comment on lines +916 to +919
displayname = entry.name + "/"
linkname = entry.name + "/"
if is_symlink:
displayname = entry.name + "@"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you avoid having an LLM doing those suggestions please?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, AFK today but can do that tonight. Emacs doesn't work so well on phone. You want PR clean without Claude?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I (personally) want usually is:

  • human interaction (I don't want to communicate with an agent, that's out the question. I don't care whether the text has been written by the AI as long as it's not a C/C without any human review before sending it but I don't want an automated conversation)
  • commits that are always reviewed by a human before even committing them; in general this means that the AI is giving you the rough idea or contents but you are responsible for cleaning it up, and addressing things that could help the next review. In this case, I considered the name = entry.name to be un-necessary so reducing the diff only required to keep the same code and inlining that access. It didn't mean to rewrite all the occurrences with entry.name.

Our policy is that AI is a tool that can be used but it's not something meant to generate PRs for users that would later take credit for (I'm not saying you are, I'm saying that just how AI PRs usually go: most of the time, people just want to have a contribution and so they simply leave their agent in automode).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the problem generally is not the AI, it's more that whenever there is an LLM, that LLM tends to choose things that are either useless, weird, and that would have been avoided if a human had directly written the lines. If you prefer writing your code through Claude, it's ok, but keep in mind that reviewers are not responsible for guiding the AI, it's the contributor. If we keep telling their agent through our review "no, not like that", it's as if we were directly using an agent in the least efficient way, and hence we are both losing our time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. FWIW, Claude pointed out that your suggestion resulted in a NameError at string concat below, so it proposed this as easiest way to avoid name global and I accepted it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my bad actually it should have been displayname = linkname = name = entry.name. But yeah, in this case, I would have preferred to know why it decided to do so. And then I would have suggested:

displayname = linkname = name = entry.name

instead of the two lines. That's why I prefer that you tell me this first before changing otherwise I would get confused.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to reiterate: there is nothing wrong with using Claude (the only think I don't like about it is the long summaries in every PR and trivial things it reports like "all tests pass" which is something I would expect for a PR otherwise it doesn't make sense). It's just that having a 3rd actor usually creates more back and forth between the reviewer and the reviewee.

Comment thread Lib/http/server.py
fullname = os.path.join(path, name)
displayname = linkname = name
for entry in entries:
displayname = linkname = entry.name

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
displayname = linkname = entry.name
displayname = linkname = name = entry.name

Comment thread Lib/http/server.py
Comment on lines +916 to +919
displayname = entry.name + "/"
linkname = entry.name + "/"
if is_symlink:
displayname = entry.name + "@"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
displayname = entry.name + "/"
linkname = entry.name + "/"
if is_symlink:
displayname = entry.name + "@"
displayname = name + "/"
linkname = name + "/"
if is_symlink:
displayname = name + "@"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants