feat(async): Add daily task to archive stale Slack channels#206
feat(async): Add daily task to archive stale Slack channels#206rgibert wants to merge 10 commits into
Conversation
Add a django-q2 scheduled task that runs daily to find bot-created Slack channels where the workspace retention policy has deleted all message history. For each stale channel, posts an archival notice and archives the channel. Changes: - Add archive_channel() method and limit param to get_channel_history on SlackService - Add archive_stale_channels task with per-channel error isolation - Self-disables the schedule if Slack client is unavailable - Add is_archived to get_channel_info return dict - Data migration to register the schedule Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/SeyXSRtESuCIjsSjmOGgO3MM_MZHcZftkU8Td81uFP8
Three bugs fixed: 1. get_channel_history(limit=1) swallowed exceptions and returned [], which the task treated as "no messages" and archived active channels. Now exceptions propagate and the per-channel handler skips the channel. 2. The archival notice was posted before calling archive_channel. If archiving failed, the notice remained in the channel misleading users. Now the notice timestamp is captured and the message is deleted on failed archive. 3. A failed archive after a successful notice post would permanently prevent future archiving -- the bot's own notice became "history" causing the channel to be skipped on every subsequent run. Fixed by deleting the notice on failure (issue 2 fix). Also adds SlackService.delete_message() wrapping chat_delete. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/Sq3ianFbApEXKqxUP2-TSgUOWGcOaf_XYnSDxYWDsKo
Add a 2-second delay between channels to stay under Slack's Tier 3 rate limit (~50 req/min). Without this, workspaces with >25 incident channels would routinely hit rate limits on every daily run. Also skip archiving when post_message fails (returns None) instead of archiving the channel with no warning to users. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/tRJBNFSwD_KABR2S1OuvR2T7oKVr_Hwgtfk-OUCIRyw
Two issues fixed: 1. If delete_message failed after a failed archive, the bot's own notice remained as the only message in the channel, causing every future run to skip it permanently. Now the history check uses limit=5 and filters out bot messages (by bot_id), so leftover bot notices don't prevent future archival attempts. 2. If archive_channel raised a non-SlackApiError (e.g. ConnectionError), the exception hit the outer handler which never called delete_message. Restructured so the archive attempt has its own try/except that always reaches the notice cleanup on any failure mode. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/niq15OUwbdrxpEKAdaoIszEmbWDRe369PsqC1_qlA9Q
The previous bot_id filter ignored messages from any bot, which could skip messages from other integrations that legitimately indicate channel activity. Now uses auth_test to resolve the bot's own identity and only filters messages matching that specific bot_id. Messages from other bots are treated as real activity. Also adds a cached bot_id property to SlackService backed by auth_test, called once per task run. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/l1u4EK0km7aDME7RkCQJB5vxCzc3c61oPc4oOSn7rws
Filter to only terminal-status incidents (Done, Cancelled), fetch full channel history instead of only the 5 most recent messages, and check thread replies for human activity before archiving. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/et_CT8X1CKbstT_wBeP1lKRqDdU55TAd5nPbC_Yfhbc
Move the None bot_id guard to an early return instead of folding it into the list comprehension where it caused every message to match, preventing any channel from being archived. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/3m8ZgzizSqg0HSivytiWzMW0Yuh3h10sQe0WTQSV0lc
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 2fae4e2. Configure here.
| raise RuntimeError( | ||
| f"archive_channel returned False for {channel_id}" | ||
| ) | ||
| archived += 1 |
There was a problem hiding this comment.
History failures trigger wrongful archive
High Severity
The archive_stale_channels task may incorrectly archive Slack channels. SlackService.get_channel_history and get_thread_replies silently return empty or partial results on Slack API errors. This causes the task to misinterpret channels as having no human activity, leading to premature archival of channels with active discussions.
Reviewed by Cursor Bugbot for commit 2fae4e2. Configure here.
| ) | ||
| return response.get("messages", []) | ||
| messages: list[dict[str, Any]] = [] | ||
| cursor: str | None = None |
There was a problem hiding this comment.
Bug: The paginated path in get_channel_history swallows exceptions and returns [], causing the archive_stale_channels task to incorrectly archive channels on API failure.
Severity: HIGH
Suggested Fix
The try...except block in the paginated loop of get_channel_history should re-raise the exception after logging it. This will allow the calling archive_stale_channels task to catch the exception and correctly skip the channel, preventing it from being archived due to a transient API error. The non-paginated path already propagates exceptions, and the paginated path should behave consistently.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: src/firetower/integrations/services/slack.py#L464-L467
Potential issue: When `get_channel_history` is called without a `limit`, it uses a
paginated approach. This path contains a `try...except` block that catches all
exceptions from the `conversations_history` API call, logs them, and returns an empty
list of messages. The calling task, `archive_stale_channels`, interprets this empty list
as a sign of inactivity and proceeds to archive the channel. This is incorrect behavior,
as a transient API error (like a timeout or rate limit) could cause a recently active
channel to be archived. The intended behavior is for the exception to propagate so the
task can skip the channel instead of archiving it.
Also affects:
src/firetower/incidents/tasks.py:134~134src/firetower/integrations/services/slack.py:503~506


Add a django-q2 scheduled task that runs daily to find bot-created Slack channels where the workspace retention policy has deleted all message history. For each stale channel, posts an archival notice and archives the channel.
The task queries ExternalLink records (type=SLACK) to find channels Firetower created, checks each via conversations_info and conversations_history(limit=1), and archives any channel with zero remaining messages. Per-channel errors are isolated so one failure doesn't abort the run. If the Slack client is unavailable (no bot token), the task disables its own schedule.
SlackService changes: added archive_channel(), a limit parameter to get_channel_history (avoids paginating all messages when we only need to check existence), and is_archived to get_channel_info.
Resolves RELENG-20
Agent transcript: https://claudescope.sentry.dev/share/onjkTdIu9bD3cgBvGExA2meMCtHFG8S8cpHdLO1DEk8