fix: guardrail redact targets last user message, not trailing LTM context#1978
fix: guardrail redact targets last user message, not trailing LTM context#1978giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
Conversation
When long-term memory (LTM) session managers like AgentCoreMemorySessionManager append an assistant message containing user context after the user turn, the guardrail redaction logic incorrectly redacted the LTM context instead of the actual user input. Root cause: the redact handler used `self.messages[-1]` which assumes the last message is the user's input. With LTM enabled, the message list looks like: [0] user: 'Tell me something bad' ← should be redacted [1] assistant: '<user_context>...</user_context>' ← was being redacted The fix replaces `self.messages[-1]` with a reverse search for the last message with `role == 'user'`, matching the pattern already used by `_find_last_user_text_message_index()` in the Bedrock model for guardrail_latest_message wrapping. Closes strands-agents#1639
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| assert ltm["content"] != [{"text": "BLOCKED!"}], "LTM context message should not be redacted" | ||
|
|
||
|
|
||
| mock_model_provider = MockedModelProvider( |
There was a problem hiding this comment.
Issue: The new test test_agent_redacts_user_message_not_ltm_context accidentally absorbed the entire body of the existing test_agent_restored_from_session_management_with_correct_index function. The diff replaced the old def line but the old function's body (lines 1626-1648) now runs as dead code inside the new test.
You can verify this by running the test — it prints hello!world! which comes from the old test's MockedModelProvider and agent("Hello!") calls that are now orphaned code within your new test function.
Suggestion: The new test should end at line 1623 (after the LTM assertions). The old test_agent_restored_from_session_management_with_correct_index function definition needs to be preserved as a separate function. This likely happened because the PR branch was forked from a slightly different base. Please rebase onto main and ensure the new test is inserted before the existing test_agent_restored_from_session_management_with_correct_index, not replacing it.
| # The last user message before the LTM context should have been redacted | ||
| # Check that at least one user message was redacted | ||
| redacted_user = [m for m in user_messages if m["content"] == [{"text": "BLOCKED!"}]] | ||
| assert len(redacted_user) >= 1, f"Expected at least one redacted user message, got: {user_messages}" |
There was a problem hiding this comment.
Issue: The assertions here are overly loose for a deterministic scenario. With the test setup (one user message + one assistant LTM message → agent("ignored") adds another user message → guardrail redacts the last user), you know exactly which message should be redacted and the exact final state of agent.messages.
Suggestion: Assert on the exact expected state rather than using >= 1 checks. For example:
# The last user message (the "ignored" input from agent()) should be redacted
last_user_msgs = [m for m in agent.messages if m["role"] == "user"]
assert last_user_msgs[-1]["content"] == [{"text": "BLOCKED!"}]
# The first user message ("Tell me something bad") should NOT be redacted
assert last_user_msgs[0]["content"] == [{"text": "Tell me something bad"}]This makes the test clearer about what "last user message" means and catches regressions more precisely.
| ) | ||
|
|
||
| # Run the agent — guardrail should redact the user message (index 0), not the LTM message | ||
| response = agent("ignored") # noqa: F841 -- triggers model which triggers redact |
There was a problem hiding this comment.
Issue: Minor — response is assigned but unused, requiring a noqa suppression.
Suggestion: Simply call agent("ignored") without assigning to a variable, which eliminates the need for the noqa comment.
| ) | ||
| if self._session_manager: | ||
| self._session_manager.redact_latest_message(self.messages[-1], self) | ||
| if last_user_msg is not None: |
There was a problem hiding this comment.
Issue: The None guard silently drops the redaction when no user message is found. While this shouldn't happen in normal operation, silently ignoring a guardrail redaction could mask issues.
Suggestion: Consider adding a logger.warning in the else branch so unexpected scenarios are surfaced:
if last_user_msg is not None:
...
else:
logger.warning("no user message found to redact | guardrail redaction skipped")|
Assessment: Request Changes The core fix in However, the test file has a critical issue that must be resolved before merge. Review Details
The fix itself is clean and the approach is sound 👍 |
Issue
Closes #1639
Problem
When guardrail redaction is enabled (
guardrail_redact_input=True) together with a long-term memory (LTM) session manager likeAgentCoreMemorySessionManager, the redact logic incorrectly modifies the LTM context message instead of the user's input.The LTM session manager appends an assistant message after the user turn:
The redact handler used
self.messages[-1], which blindly picked the last message regardless of role.Root Cause
In
agent.py, the guardrail redaction code assumedself.messages[-1]is always the user's input:With LTM enabled,
messages[-1]is the assistant's context message, not the user's input.Solution
Replaced
self.messages[-1]with a reverse search for the last message withrole == 'user':This matches the pattern already used by
_find_last_user_text_message_index()in the Bedrock model forguardrail_latest_messagewrapping.Testing
test_agent_redacts_user_message_not_ltm_context: Simulates the LTM scenario with a trailing assistant context message, verifies the user message is redacted and the LTM context is preservedChanges
src/strands/agent/agent.py: Changed guardrail redact handler to find last user-role messagetests/strands/agent/test_agent.py: Added test for LTM + guardrail interaction