Skip to content

resolving a specific performance issue (hang) during startup.#259

Open
pfichtner wants to merge 3 commits into
approvals:mainfrom
pfichtner:patch-1
Open

resolving a specific performance issue (hang) during startup.#259
pfichtner wants to merge 3 commits into
approvals:mainfrom
pfichtner:patch-1

Conversation

@pfichtner
Copy link
Copy Markdown

@pfichtner pfichtner commented May 21, 2026

Description

This PR fixes a significant hang (2-5 minutes) when starting pytest in environments with restricted, air-gapped, or flaky network access.

The issue was that approvaltests attempted to download maintenance scripts from GitHub immediately upon being imported (triggered by pytest plugin discovery), without specifying a network timeout. This forced the process to wait for the operating system's default TCP timeout (often several minutes) before proceeding.

The solution

  1. Network Timeouts: Added an explicit timeout=5 to the requests.get call in log_commons.py. This prevents indefinite hangs if the network is available but the destination (raw.githubusercontent.com) is unreachable or dropping packets.
    2. Lazy Initialization: Removed top-level calls to clear_log_file() in approved_file_log.py and failed_comparison_log.py. These actions are now deferred and called lazily only when log() is first invoked. This ensures that the import approvaltests operation is side-effect-free and does not initiate network I/O.

Affected tests:
No existing tests are negatively affected. These changes improve the robustness of the library's internal logging and maintenance script management without changing the core verification logic.

Summary by Sourcery

Mitigate startup hangs by making approvaltests logging initialization lazy and adding a network timeout when downloading maintenance scripts.

Bug Fixes:

  • Prevent long hangs in restricted or flaky network environments by adding a timeout to the maintenance script download request.
    - Avoid side effects during module import by deferring log file initialization until the first log write.

Enhancements:

  • Ensure approved and failed comparison log files are created and cleared lazily on first use rather than at import time.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 21, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Adds a network timeout and makes log file initialization lazy to prevent startup hangs in restricted or flaky network environments.

Sequence diagram for lazy log file initialization

sequenceDiagram
    actor Pytest
    participant Approvaltests
    participant ApprovedFilesLog

    Pytest->>Approvaltests: import approvaltests
    Note over Approvaltests: No clear_log_file called at import

    Pytest->>ApprovedFilesLog: log(approved_file)
    ApprovedFilesLog->>ApprovedFilesLog: get_approved_files_log()
    ApprovedFilesLog->>ApprovedFilesLog: exists()
    alt [log file does not exist]
        ApprovedFilesLog->>ApprovedFilesLog: clear_log_file()
    end
    ApprovedFilesLog->>ApprovedFilesLog: get_approved_files_log()
    ApprovedFilesLog->>ApprovedFilesLog: open(mode="a")
    ApprovedFilesLog-->>Pytest: append approved_file
Loading

Sequence diagram for download_script_from_common_repo_if_needed with timeout

sequenceDiagram
    participant Caller
    participant LogCommons
    participant Requests

    Caller->>LogCommons: download_script_from_common_repo_if_needed(script_name_with_suffix)
    alt [script not present]
        LogCommons->>Requests: get(url, timeout=5)
        alt [response.ok]
            LogCommons->>LogCommons: script_path.write_text(response.text)
        else [response not ok]
            LogCommons-->>Caller: return False
        end
    else [script already present]
        LogCommons-->>Caller: return False
    end
Loading

File-Level Changes

Change Details Files
Make log file initialization lazy so imports are side-effect-free and do not trigger I/O until logging is used.
  • Remove top-level log file clearing that ran on module import.
  • Initialize approved files log file on first log() call by checking for file existence and clearing/creating it if missing.
  • Initialize failed comparison log file on first log() call by checking for file existence and clearing/creating it if missing.
approvaltests/internals/logs/approved_file_log.py
approvaltests/internals/logs/failed_comparison_log.py
Harden remote script download against network issues by adding an explicit timeout.
  • Add timeout=5 seconds to the requests.get call that downloads maintenance scripts from GitHub.
  • Keep existing behavior of downloading and writing scripts when the response is successful while preventing long OS-level TCP timeouts.
approvaltests/internals/logs/log_commons.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Deferring clear_log_file() into log() changes semantics from 'clear once per process import' to 'never clear if the file already exists', meaning logs will now accumulate across runs; if the intent is just to avoid side effects at import, consider clearing on the first log() call unconditionally (or via a per-process flag) rather than only when the file is absent.
  • Adding timeout=5 to requests.get is helpful, but you may also want to catch requests.Timeout/requests.RequestException around the call so that a timeout or network error doesn’t surface as an exception to callers that merely import or use logging, and optionally make the timeout duration configurable if different environments need different values.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Deferring `clear_log_file()` into `log()` changes semantics from 'clear once per process import' to 'never clear if the file already exists', meaning logs will now accumulate across runs; if the intent is just to avoid side effects at import, consider clearing on the first `log()` call unconditionally (or via a per-process flag) rather than only when the file is absent.
- Adding `timeout=5` to `requests.get` is helpful, but you may also want to catch `requests.Timeout`/`requests.RequestException` around the call so that a timeout or network error doesn’t surface as an exception to callers that merely import or use logging, and optionally make the timeout duration configurable if different environments need different values.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@JayBazuzi
Copy link
Copy Markdown
Contributor

Does the APPROVALTESTS_DISABLE_SCRIPT_DOWNLOADS environment variable help you in your situation?

@pfichtner
Copy link
Copy Markdown
Author

Does the APPROVALTESTS_DISABLE_SCRIPT_DOWNLOADS environment variable help you in your situation?

Yes, the environment variable provides a helpful immediate workaround for an individual user who knows it exists and it worked for me personally. However, I believe the PR is still essential for the library's long-term health and the community's user experience for several reasons:

  1. Network Resilience: Even when users want the scripts to download, relying on the OS default TCP timeout (which can be several minutes) is dangerous. Adding a sensible, configurable timeout (5s by default) makes the library much more robust.
  2. CI/CD Efficiency: In CI environments, this could result in tests taking much longer than necessary without anyone noticing
  3. Sane Defaults: New users shouldn't have to 'discover' an obscure environment variable to prevent a 5-minute startup hang. The library should be 'fast by default' without requiring manual environment configuration.

@pfichtner
Copy link
Copy Markdown
Author

Just for clarification, I was not clear enough.

The issue is that approvaltests is not 'opt-in' at the project level; once installed, its top-level network calls penalize every pytest run in the environment. Even a project with zero ApprovalTests will hang for several minutes on startup because pytest automatically imports the plugin to check for configuration, hitting the OS-level TCP timeout on the synchronous GitHub requests.
So even in projects that do not use approvaltests at all you have to set APPROVALTESTS_DISABLE_SCRIPT_DOWNLOADS just because approvaltests initialization code gets executed during pytest's initialization which then may block for minutes in that case.

My guess is that the solution isn't to force someone to set an approvaltests-specific environment variable in every project that uses pytest, just because approvaltests has been installed globally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants