From 3856d3454f32c7b6da216b52656cc6620eea4860 Mon Sep 17 00:00:00 2001 From: StreetLevelTech1 Date: Thu, 21 May 2026 10:07:10 +0100 Subject: [PATCH] Add architecture and agentic readiness review report --- docs/architecture_review_2026-05-20.md | 139 +++++++++++++++++++++++++ 1 file changed, 139 insertions(+) create mode 100644 docs/architecture_review_2026-05-20.md diff --git a/docs/architecture_review_2026-05-20.md b/docs/architecture_review_2026-05-20.md new file mode 100644 index 00000000..61862bcb --- /dev/null +++ b/docs/architecture_review_2026-05-20.md @@ -0,0 +1,139 @@ +# StrideBot Architecture & Agentic Readiness Review (2026-05-20) + +## Scope +Review axes: +- architecture +- AI-agent design +- async/task orchestration +- Telegram scaling +- security +- prompt/memory +- code smells +- performance +- production readiness +- refactor suggestions +- proactive workflow support +- agentic evolution +- memory future-proofing +- tool/data abstraction +- event system for continuous intelligence +- repo scalability for multi-agent finance platform + +## Executive Summary +- **Production-readiness score:** **6.8/10**. +- **Strengths:** clear module boundaries, scheduler intelligence primitives, defensive sanitization for Telegram HTML, persistent DB tables for cooldown/content dedupe. +- **Key risks:** synchronous I/O inside async runtime paths, in-memory rate limiting/caps that do not scale horizontally, fragmented state between process memory and database, and limited abstraction for tool/provider orchestration. +- **Agentic trajectory:** currently **assisted-automation**, not full agentic autonomy. The scheduler and breaking signal pipeline provide a good foundation but lack durable planning/memory loops. + +## 1) Architecture Review +### What is working +1. Central runtime entry and explicit wiring in `bot/bot.py` keeps startup control in one place. +2. Intelligence and scheduled behavior are separated in `bot/scheduler.py`. +3. Persistent state exists for cooldowns and content hashes in `bot/database.py` (`scheduler_cooldowns`, `posted_content`) to reduce repeat notifications across restarts. + +### Architectural pressure points +1. **Tight coupling between orchestration and providers:** modules import each other directly (`scheduler -> ai/crypto/db`), which raises blast radius for changes. +2. **Shared mutable process state:** rate-limit maps, AI cap counters, and cooldown caches are in-memory globals; this undermines multi-instance reliability. +3. **Mixed concerns inside single files:** `scheduler.py` owns signal logic, formatting, image rendering, and Telegram delivery in one module. + +## 2) AI-Agent Design Review +- Current AI layer is primarily request/response prompt assembly and tier-routing (`bot/ai.py`). +- There is no explicit agent loop with: + - goal decomposition, + - tool planning policy, + - durable episodic memory, + - self-critique/evaluation pass, + - action-state reconciliation. +- The scheduler can proactively publish, but it does not yet behave like a persistent decision-making agent. + +**Conclusion:** architecture supports **proactive posting**, but not robust **agentic reasoning workflows** yet. + +## 3) Async & Task Orchestration Analysis +### Risks +1. `clear_webhook()` wraps async call with `asyncio.run` during module import/start sequence; this can fail in already-running loops and couples startup order to network I/O. +2. `ai.py` uses synchronous Groq and Tavily client calls (`Groq`, `TavilyClient`) from functions consumed by async bot handlers; this can block event loop throughput under load. +3. Database layer is sync (`psycopg2` / `sqlite3`) and called widely from async handler paths, risking latency spikes and head-of-line blocking. + +### Orchestration maturity +- Good: cooldown thresholds and queue constants for intelligence signals in `scheduler.py` indicate deliberate background pipeline behavior. +- Missing: task-level isolation (worker queues), bounded concurrency controls (semaphores), and cancellation-aware retries around independent fan-out operations. + +## 4) Telegram Bot Scaling Concerns +1. In-memory rate limiting map (`_user_timestamps`) is per-process only; with multiple replicas, users can bypass limits by shard hopping. +2. Global daily AI cap is process-local (`_daily_ai_requests`) and resets by instance/date, not cluster-wide. +3. Admin alerting swallows exceptions silently in `notify_admin`, which can hide prolonged telemetry failure. +4. Logging to local rotating file can be weak in containerized ephemeral storage unless centralized logs are guaranteed. + +## 5) Security Audit +### Positives +- Environment variable enforcement for critical credentials. +- URL sanitization restricts anchors to `http(s)` and escapes quotes. +- HTML escaping helper for Telegram content. + +### Risks +1. AI/tool output ingestion paths rely on best-effort sanitization; no centralized output policy engine for markdown/html templating invariants. +2. Import-time side effects (network + signal wiring + webhook clear) increase startup attack/failure surface. +3. Fallback to SQLite in production-like environments can create silent divergence in SQL behavior and constraints if Postgres unavailable. + +## 6) Prompt & Memory Critique +- Prompt stack is modular (`prompts/system.py`, `tiers.py`, `analysis.py`, `scheduler.py`), which is good. +- Memory is primarily short conversation history + DB history; lacks semantic memory indexing, retrieval policy, and confidence/recency weighting. +- History trimming is char-budget based, not information-theoretic; can drop critical commitments or user preferences without salience logic. + +**Future-proof verdict:** memory system is **serviceable short-term**, **not future-proof** for multi-agent personalized finance intelligence. + +## 7) Code Smells +1. Large monolithic scheduler module with mixed layers (rendering, intelligence, transport). +2. Global mutable state spread across modules. +3. Broad `except Exception` blocks in critical paths reduce observability granularity. +4. Partial duplication of compatibility helpers and legacy aliases that make behavior contracts fuzzy over time. + +## 8) Performance Bottlenecks +1. Blocking provider/database calls in async handlers. +2. Potential repeated external calls without shared cache layers for frequently requested market data. +3. Heavy prompt payloads/history concatenation per request without structured caching for system/tier scaffolding. + +## 9) Production Readiness Scorecard +- Reliability: **7/10** +- Async safety: **5/10** +- Security posture: **7/10** +- Scalability: **6/10** +- Observability: **6/10** +- Agentic evolution readiness: **6/10** + +**Overall:** **6.8/10** + +## 10) Exact Refactor Suggestions (Incremental, Low-Risk) +1. Add `bot/providers/` abstraction layer: + - `market_provider.py`, `llm_provider.py`, `news_provider.py` interfaces. + - Keep existing modules as adapters first; no behavior changes. +2. Add async-safe DB access facade: + - Keep `database.py` API, but route handler calls through `asyncio.to_thread` wrapper initially. +3. Move process-local limits to DB/Redis-backed counters: + - rate limits, daily cap, and signal cooldowns should be cross-instance consistent. +4. Split scheduler by concern: + - `scheduler/signals.py`, `scheduler/rendering.py`, `scheduler/delivery.py`, `scheduler/jobs.py`. +5. Introduce task orchestration primitives: + - bounded semaphore per external provider, + - standardized timeout/retry policy, + - `asyncio.gather` for independent upstream calls. +6. Implement memory service boundary: + - `memory/store.py` (episodic), `memory/retrieval.py` (ranked recall), `memory/profile.py` (user preference/state). +7. Add structured event bus contracts: + - typed events for `market_signal_detected`, `narrative_shift`, `whale_activity`, `research_task_created`. + +## 11) Proactive Workflow & Agentic Evolution Verdict +- **Supports proactive workflows today?** Partially yes (scheduled publishing + signal triggers). +- **Evolving toward agentic behavior?** Yes, but still early-stage. +- **Needed for true agentic platform:** durable planner/executor loop, tool registry abstraction, long-horizon memory, event-sourced intelligence graph, and multi-worker orchestration. + +## 12) Multi-Agent Finance Platform Scalability +- Current folder layout is understandable but still single-runtime centric. +- To scale into multi-agent finance architecture, add top-level domains: + - `agents/` (planner, researcher, monitor, publisher) + - `events/` (schemas + bus) + - `providers/` (data/tool adapters) + - `memory/` (profiles, embeddings, episodic logs) + - `workflows/` (pipelines and policies) + +This can be done incrementally while preserving current behavior.