Skip to content

fix: guardrail redact targets last user message, not trailing LTM context#1978

Open
giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
giulio-leone:fix/guardrail-redact-last-user-message
Open

fix: guardrail redact targets last user message, not trailing LTM context#1978
giulio-leone wants to merge 1 commit intostrands-agents:mainfrom
giulio-leone:fix/guardrail-redact-last-user-message

Conversation

@giulio-leone
Copy link
Copy Markdown
Contributor

Issue

Closes #1639

Problem

When guardrail redaction is enabled (guardrail_redact_input=True) together with a long-term memory (LTM) session manager like AgentCoreMemorySessionManager, the redact logic incorrectly modifies the LTM context message instead of the user's input.

The LTM session manager appends an assistant message after the user turn:

messages[0]: {role: 'user',      content: [{text: 'Tell me something bad'}]}      ← should be redacted
messages[1]: {role: 'assistant', content: [{text: '<user_context>...</user_context>'}]}  ← was being redacted

The redact handler used self.messages[-1], which blindly picked the last message regardless of role.

Root Cause

In agent.py, the guardrail redaction code assumed self.messages[-1] is always the user's input:

self.messages[-1]['content'] = self._redact_user_content(
    self.messages[-1]['content'], ...
)

With LTM enabled, messages[-1] is the assistant's context message, not the user's input.

Solution

Replaced self.messages[-1] with a reverse search for the last message with role == 'user':

last_user_msg = next(
    (m for m in reversed(self.messages) if m['role'] == 'user'),
    None,
)

This matches the pattern already used by _find_last_user_text_message_index() in the Bedrock model for guardrail_latest_message wrapping.

Testing

  • Added test_agent_redacts_user_message_not_ltm_context: Simulates the LTM scenario with a trailing assistant context message, verifies the user message is redacted and the LTM context is preserved
  • All 8 guardrail-related tests pass
  • All 113 agent tests pass

Changes

  • src/strands/agent/agent.py: Changed guardrail redact handler to find last user-role message
  • tests/strands/agent/test_agent.py: Added test for LTM + guardrail interaction

⚠️ This reopens #1884 which was accidentally closed due to fork deletion.

When long-term memory (LTM) session managers like
AgentCoreMemorySessionManager append an assistant message containing
user context after the user turn, the guardrail redaction logic
incorrectly redacted the LTM context instead of the actual user input.

Root cause: the redact handler used `self.messages[-1]` which assumes
the last message is the user's input.  With LTM enabled, the message
list looks like:

  [0] user: 'Tell me something bad'       ← should be redacted
  [1] assistant: '<user_context>...</user_context>'  ← was being redacted

The fix replaces `self.messages[-1]` with a reverse search for the
last message with `role == 'user'`, matching the pattern already used
by `_find_last_user_text_message_index()` in the Bedrock model for
guardrail_latest_message wrapping.

Closes strands-agents#1639
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/strands/agent/agent.py 80.00% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

assert ltm["content"] != [{"text": "BLOCKED!"}], "LTM context message should not be redacted"


mock_model_provider = MockedModelProvider(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The new test test_agent_redacts_user_message_not_ltm_context accidentally absorbed the entire body of the existing test_agent_restored_from_session_management_with_correct_index function. The diff replaced the old def line but the old function's body (lines 1626-1648) now runs as dead code inside the new test.

You can verify this by running the test — it prints hello!world! which comes from the old test's MockedModelProvider and agent("Hello!") calls that are now orphaned code within your new test function.

Suggestion: The new test should end at line 1623 (after the LTM assertions). The old test_agent_restored_from_session_management_with_correct_index function definition needs to be preserved as a separate function. This likely happened because the PR branch was forked from a slightly different base. Please rebase onto main and ensure the new test is inserted before the existing test_agent_restored_from_session_management_with_correct_index, not replacing it.

# The last user message before the LTM context should have been redacted
# Check that at least one user message was redacted
redacted_user = [m for m in user_messages if m["content"] == [{"text": "BLOCKED!"}]]
assert len(redacted_user) >= 1, f"Expected at least one redacted user message, got: {user_messages}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The assertions here are overly loose for a deterministic scenario. With the test setup (one user message + one assistant LTM message → agent("ignored") adds another user message → guardrail redacts the last user), you know exactly which message should be redacted and the exact final state of agent.messages.

Suggestion: Assert on the exact expected state rather than using >= 1 checks. For example:

# The last user message (the "ignored" input from agent()) should be redacted
last_user_msgs = [m for m in agent.messages if m["role"] == "user"]
assert last_user_msgs[-1]["content"] == [{"text": "BLOCKED!"}]

# The first user message ("Tell me something bad") should NOT be redacted
assert last_user_msgs[0]["content"] == [{"text": "Tell me something bad"}]

This makes the test clearer about what "last user message" means and catches regressions more precisely.

)

# Run the agent — guardrail should redact the user message (index 0), not the LTM message
response = agent("ignored") # noqa: F841 -- triggers model which triggers redact
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Minor — response is assigned but unused, requiring a noqa suppression.

Suggestion: Simply call agent("ignored") without assigning to a variable, which eliminates the need for the noqa comment.

)
if self._session_manager:
self._session_manager.redact_latest_message(self.messages[-1], self)
if last_user_msg is not None:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: The None guard silently drops the redaction when no user message is found. While this shouldn't happen in normal operation, silently ignoring a guardrail redaction could mask issues.

Suggestion: Consider adding a logger.warning in the else branch so unexpected scenarios are surfaced:

if last_user_msg is not None:
    ...
else:
    logger.warning("no user message found to redact | guardrail redaction skipped")

@github-actions
Copy link
Copy Markdown

Assessment: Request Changes

The core fix in agent.py is correct and well-motivated — replacing self.messages[-1] with a reverse search for the last user-role message is the right approach and aligns with existing patterns in the codebase. The PR description is thorough and clearly explains the problem and solution.

However, the test file has a critical issue that must be resolved before merge.

Review Details
  • Critical — Test absorbs existing test: The new test_agent_redacts_user_message_not_ltm_context accidentally replaced the def line of the existing test_agent_restored_from_session_management_with_correct_index and absorbed its body as dead code. This deletes an existing test and runs unrelated assertions silently inside the new test. This is likely a rebase issue — please rebase onto the latest main and ensure the new test is inserted cleanly alongside the existing one.
  • Testing: The test assertions could be more precise — using exact expected state rather than >= 1 checks would make the test more robust and readable.
  • Observability: Consider logging a warning when no user message is found for redaction, rather than silently skipping.

The fix itself is clean and the approach is sound 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] guardrail_redact_input override ltm_msg instead of the last user message

2 participants