resolving a specific performance issue (hang) during startup.#259
resolving a specific performance issue (hang) during startup.#259pfichtner wants to merge 3 commits into
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideAdds a network timeout and makes log file initialization lazy to prevent startup hangs in restricted or flaky network environments. Sequence diagram for lazy log file initializationsequenceDiagram
actor Pytest
participant Approvaltests
participant ApprovedFilesLog
Pytest->>Approvaltests: import approvaltests
Note over Approvaltests: No clear_log_file called at import
Pytest->>ApprovedFilesLog: log(approved_file)
ApprovedFilesLog->>ApprovedFilesLog: get_approved_files_log()
ApprovedFilesLog->>ApprovedFilesLog: exists()
alt [log file does not exist]
ApprovedFilesLog->>ApprovedFilesLog: clear_log_file()
end
ApprovedFilesLog->>ApprovedFilesLog: get_approved_files_log()
ApprovedFilesLog->>ApprovedFilesLog: open(mode="a")
ApprovedFilesLog-->>Pytest: append approved_file
Sequence diagram for download_script_from_common_repo_if_needed with timeoutsequenceDiagram
participant Caller
participant LogCommons
participant Requests
Caller->>LogCommons: download_script_from_common_repo_if_needed(script_name_with_suffix)
alt [script not present]
LogCommons->>Requests: get(url, timeout=5)
alt [response.ok]
LogCommons->>LogCommons: script_path.write_text(response.text)
else [response not ok]
LogCommons-->>Caller: return False
end
else [script already present]
LogCommons-->>Caller: return False
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- Deferring
clear_log_file()intolog()changes semantics from 'clear once per process import' to 'never clear if the file already exists', meaning logs will now accumulate across runs; if the intent is just to avoid side effects at import, consider clearing on the firstlog()call unconditionally (or via a per-process flag) rather than only when the file is absent. - Adding
timeout=5torequests.getis helpful, but you may also want to catchrequests.Timeout/requests.RequestExceptionaround the call so that a timeout or network error doesn’t surface as an exception to callers that merely import or use logging, and optionally make the timeout duration configurable if different environments need different values.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Deferring `clear_log_file()` into `log()` changes semantics from 'clear once per process import' to 'never clear if the file already exists', meaning logs will now accumulate across runs; if the intent is just to avoid side effects at import, consider clearing on the first `log()` call unconditionally (or via a per-process flag) rather than only when the file is absent.
- Adding `timeout=5` to `requests.get` is helpful, but you may also want to catch `requests.Timeout`/`requests.RequestException` around the call so that a timeout or network error doesn’t surface as an exception to callers that merely import or use logging, and optionally make the timeout duration configurable if different environments need different values.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This reverts commit 8cb85d1.
|
Does the |
Yes, the environment variable provides a helpful immediate workaround for an individual user who knows it exists and it worked for me personally. However, I believe the PR is still essential for the library's long-term health and the community's user experience for several reasons:
|
|
Just for clarification, I was not clear enough. The issue is that approvaltests is not 'opt-in' at the project level; once installed, its top-level network calls penalize every pytest run in the environment. Even a project with zero ApprovalTests will hang for several minutes on startup because pytest automatically imports the plugin to check for configuration, hitting the OS-level TCP timeout on the synchronous GitHub requests. My guess is that the solution isn't to force someone to set an approvaltests-specific environment variable in every project that uses pytest, just because approvaltests has been installed globally. |
Description
This PR fixes a significant hang (2-5 minutes) when starting
pytestin environments with restricted, air-gapped, or flaky network access.The issue was that
approvaltestsattempted to download maintenance scripts from GitHub immediately upon being imported (triggered bypytestplugin discovery), without specifying a network timeout. This forced the process to wait for the operating system's default TCP timeout (often several minutes) before proceeding.The solution
timeout=5to therequests.getcall inlog_commons.py. This prevents indefinite hangs if the network is available but the destination (raw.githubusercontent.com) is unreachable or dropping packets.2. Lazy Initialization: Removed top-level calls toclear_log_file()inapproved_file_log.pyandfailed_comparison_log.py. These actions are now deferred and called lazily only whenlog()is first invoked. This ensures that theimport approvaltestsoperation is side-effect-free and does not initiate network I/O.Affected tests:
No existing tests are negatively affected. These changes improve the robustness of the library's internal logging and maintenance script management without changing the core verification logic.
Summary by Sourcery
Mitigate startup hangs by making approvaltests logging initialization lazy and adding a network timeout when downloading maintenance scripts.
Bug Fixes:
- Avoid side effects during module import by deferring log file initialization until the first log write.Enhancements: