feat: add timeout and URL scheme validation to load_web_page#4887
feat: add timeout and URL scheme validation to load_web_page#4887cchinchilla-dev wants to merge 3 commits intogoogle:mainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Hi @cchinchilla-dev , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Your PR has been received by the team and is currently under review. We will provide feedback as soon as we have an update to share. |
|
Hi @Jacksunwei , can you please review this |
|
@rohityan, @Jacksunwei — Just following up to see if anything else is needed from my end. Happy to make any adjustments. |
|
I independently reproduced this issue and can confirm the behavior described. Tested on Windows with Python 3.11 against three failure modes — non-routable IP, invalid DNS, and a slow endpoint. All three result in unhandled exceptions with connect timeout=None confirmed in the traceback. Happy to share full reproduction logs if helpful for the review. |
Link to Issue or Description of Change
Problem
load_web_page()callsrequests.get()without atimeoutparameter. If the target server is unresponsive, the agent hangs indefinitely. Additionally, the function accepts any URL scheme (file://,ftp://, etc.) and does not handleTimeoutorConnectionErrorexceptions.Solution
timeout=10(via module-level_DEFAULT_TIMEOUT_SECONDS) torequests.get()httporhttpsbefore making the requestrequests.exceptions.Timeoutandrequests.exceptions.ConnectionErrorwith descriptive error messagesDesign note: The timeout is exposed as a module-level constant rather than a function parameter to avoid exposing it to the LLM's function calling schema. It can be overridden via
load_web_page._DEFAULT_TIMEOUT_SECONDS = 30if needed.Testing Plan
Unit Tests:
11 new tests added in
tests/unittests/tools/test_load_web_page.py:test_invalid_scheme_file- rejectsfile://URLstest_invalid_scheme_ftp- rejectsftp://URLstest_invalid_scheme_empty- rejects malformed URLstest_timeout_returns_error_message- handlesTimeoutexceptiontest_connection_error_returns_error_message- handlesConnectionErrorexceptiontest_successful_request- verifies successful fetch with mocked BeautifulSouptest_failed_request_non_200- handles non-200 status codestest_timeout_parameter_passed- verifies timeout=10 is passed to requests.get()test_allow_redirects_false- verifies SSRF protection is preservedtest_http_scheme_accepted- acceptshttp://URLstest_https_scheme_accepted- acceptshttps://URLsManual End-to-End (E2E) Tests:
N/A - This is an internal hardening change. The function signature is unchanged.
Checklist
Additional context
This complements the existing SSRF protection (
allow_redirects=False) already present in the function.