-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Azure AI Evaluation's RedTeam.scan() method decodes **all** encoded attack prompts when storing the result files #47228
Copy link
Copy link
Open
Labels
EvaluationIssues related to the client library for Azure AI EvaluationIssues related to the client library for Azure AI EvaluationService AttentionWorkflow: This issue is responsible by Azure service team.Workflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.Issues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamWorkflow: This issue needs attention from Azure service team or SDK teamquestionThe issue doesn't require a change to the product in order to be resolved. Most issues start as thatThe issue doesn't require a change to the product in order to be resolved. Most issues start as that
Metadata
Metadata
Labels
EvaluationIssues related to the client library for Azure AI EvaluationIssues related to the client library for Azure AI EvaluationService AttentionWorkflow: This issue is responsible by Azure service team.Workflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.Issues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamWorkflow: This issue needs attention from Azure service team or SDK teamquestionThe issue doesn't require a change to the product in order to be resolved. Most issues start as thatThe issue doesn't require a change to the product in order to be resolved. Most issues start as that
Type
Fields
Give feedbackNo fields configured for issues without a type.
Name: azure-ai-evaluation
Version: 1.16.5
Describe the bug
Azure AI Evaluation's
RedTeam.scan()method decodes all encoded attack prompts before storing them inevaluation_results.jsonandresults.json, regardless of encoding strategy (flip, base64, morse, etc.). This makes it impossible to verify what the target agent actually received. Theattack_techniquemetadata is correctly set, but the conversation payloads inattack_details[].conversationare decoded back to their original form, losing fidelity regarding the actual attack surface.Screenshot
Captured prompts received by target: flip-encoded - entries 3 and 4
corresponding evaluation_results.json
To Reproduce
Steps to reproduce the behavior:
RedTeamscan targeting any remote agent/endpoint with multiple encoding-based attack strategies (e.g.,[AttackStrategy.Flip, AttackStrategy.Base64, AttackStrategy.Morse])await red_team.scan(target=callback, attack_strategies=[AttackStrategy.Flip, ...])evaluation_results.json[attack_details][].conversation[].contentExpected behavior
"...edocne txet si siht", it should appear in evaluation results as flip-encoded"...aW52YWxpZCBiYXNlNjQ=", it should appear as base64-encoded".... . .-.. .-.. ---", it should appear in morseattack_details[].conversationshould contain the exact payloads that were sent to the targetCurrent behavior
evaluation_results.jsonevaluation_results.json[attack_details][].conversationattack_techniquelabels are correct (flip, base64, morse), but the conversation content doesn't match the encodingImpact
Additional context