-
Notifications
You must be signed in to change notification settings - Fork 778
fix: guardrail redact targets last user message, not trailing LTM context #1978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1580,7 +1580,49 @@ def test_agent_restored_from_session_management_with_redacted_input(): | |
| assert agent.messages[0] == agent_2.messages[0] | ||
|
|
||
|
|
||
| def test_agent_restored_from_session_management_with_correct_index(): | ||
| def test_agent_redacts_user_message_not_ltm_context(): | ||
| """Test that guardrail redacts the last *user* message, not a trailing LTM assistant message. | ||
|
|
||
| Reproduces: https://github.com/strands-agents/sdk-python/issues/1639 | ||
| When long-term memory (LTM) session managers append an assistant message with | ||
| user context after the user turn, the redact logic must still target the user | ||
| message rather than the trailing assistant LTM message. | ||
| """ | ||
| mocked_model = MockedModelProvider( | ||
| [{"redactedUserContent": "BLOCKED!", "redactedAssistantContent": "INPUT BLOCKED!"}] | ||
| ) | ||
|
|
||
| agent = Agent( | ||
| model=mocked_model, | ||
| system_prompt="You are a helpful assistant.", | ||
| callback_handler=None, | ||
| ) | ||
|
|
||
| # Simulate LTM session manager appending context after user message: | ||
| # messages[0] = user input, messages[1] = assistant LTM context | ||
| agent.messages.append({"role": "user", "content": [{"text": "Tell me something bad"}]}) | ||
| agent.messages.append( | ||
| {"role": "assistant", "content": [{"text": "<user_context>Preference: likes cats</user_context>"}]} | ||
| ) | ||
|
|
||
| # Run the agent — guardrail should redact the user message (index 0), not the LTM message | ||
| response = agent("ignored") # noqa: F841 -- triggers model which triggers redact | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Issue: Minor — Suggestion: Simply call |
||
|
|
||
| # Find the user messages — the first user message (the actual input) should be redacted | ||
| user_messages = [m for m in agent.messages if m["role"] == "user"] | ||
| assert len(user_messages) >= 1 | ||
| # The last user message before the LTM context should have been redacted | ||
| # Check that at least one user message was redacted | ||
| redacted_user = [m for m in user_messages if m["content"] == [{"text": "BLOCKED!"}]] | ||
| assert len(redacted_user) >= 1, f"Expected at least one redacted user message, got: {user_messages}" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Issue: The assertions here are overly loose for a deterministic scenario. With the test setup (one user message + one assistant LTM message → Suggestion: Assert on the exact expected state rather than using # The last user message (the "ignored" input from agent()) should be redacted
last_user_msgs = [m for m in agent.messages if m["role"] == "user"]
assert last_user_msgs[-1]["content"] == [{"text": "BLOCKED!"}]
# The first user message ("Tell me something bad") should NOT be redacted
assert last_user_msgs[0]["content"] == [{"text": "Tell me something bad"}]This makes the test clearer about what "last user message" means and catches regressions more precisely. |
||
|
|
||
| # The assistant LTM message should NOT have been redacted | ||
| assistant_messages = [m for m in agent.messages if m["role"] == "assistant"] | ||
| ltm_messages = [m for m in assistant_messages if any("<user_context>" in str(c) for c in m.get("content", []))] | ||
| for ltm in ltm_messages: | ||
| assert ltm["content"] != [{"text": "BLOCKED!"}], "LTM context message should not be redacted" | ||
|
|
||
|
|
||
| mock_model_provider = MockedModelProvider( | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Issue: The new test You can verify this by running the test — it prints Suggestion: The new test should end at line 1623 (after the LTM assertions). The old |
||
| [{"role": "assistant", "content": [{"text": "hello!"}]}, {"role": "assistant", "content": [{"text": "world!"}]}] | ||
| ) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: The
Noneguard silently drops the redaction when no user message is found. While this shouldn't happen in normal operation, silently ignoring a guardrail redaction could mask issues.Suggestion: Consider adding a
logger.warningin theelsebranch so unexpected scenarios are surfaced: