Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
139 changes: 139 additions & 0 deletions docs/architecture_review_2026-05-20.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# StrideBot Architecture & Agentic Readiness Review (2026-05-20)

## Scope
Review axes:
- architecture
- AI-agent design
- async/task orchestration
- Telegram scaling
- security
- prompt/memory
- code smells
- performance
- production readiness
- refactor suggestions
- proactive workflow support
- agentic evolution
- memory future-proofing
- tool/data abstraction
- event system for continuous intelligence
- repo scalability for multi-agent finance platform

## Executive Summary
- **Production-readiness score:** **6.8/10**.
- **Strengths:** clear module boundaries, scheduler intelligence primitives, defensive sanitization for Telegram HTML, persistent DB tables for cooldown/content dedupe.
- **Key risks:** synchronous I/O inside async runtime paths, in-memory rate limiting/caps that do not scale horizontally, fragmented state between process memory and database, and limited abstraction for tool/provider orchestration.
- **Agentic trajectory:** currently **assisted-automation**, not full agentic autonomy. The scheduler and breaking signal pipeline provide a good foundation but lack durable planning/memory loops.

## 1) Architecture Review
### What is working
1. Central runtime entry and explicit wiring in `bot/bot.py` keeps startup control in one place.
2. Intelligence and scheduled behavior are separated in `bot/scheduler.py`.
3. Persistent state exists for cooldowns and content hashes in `bot/database.py` (`scheduler_cooldowns`, `posted_content`) to reduce repeat notifications across restarts.

### Architectural pressure points
1. **Tight coupling between orchestration and providers:** modules import each other directly (`scheduler -> ai/crypto/db`), which raises blast radius for changes.
2. **Shared mutable process state:** rate-limit maps, AI cap counters, and cooldown caches are in-memory globals; this undermines multi-instance reliability.
3. **Mixed concerns inside single files:** `scheduler.py` owns signal logic, formatting, image rendering, and Telegram delivery in one module.

## 2) AI-Agent Design Review
- Current AI layer is primarily request/response prompt assembly and tier-routing (`bot/ai.py`).
- There is no explicit agent loop with:
- goal decomposition,
- tool planning policy,
- durable episodic memory,
- self-critique/evaluation pass,
- action-state reconciliation.
- The scheduler can proactively publish, but it does not yet behave like a persistent decision-making agent.

**Conclusion:** architecture supports **proactive posting**, but not robust **agentic reasoning workflows** yet.

## 3) Async & Task Orchestration Analysis
### Risks
1. `clear_webhook()` wraps async call with `asyncio.run` during module import/start sequence; this can fail in already-running loops and couples startup order to network I/O.
2. `ai.py` uses synchronous Groq and Tavily client calls (`Groq`, `TavilyClient`) from functions consumed by async bot handlers; this can block event loop throughput under load.
3. Database layer is sync (`psycopg2` / `sqlite3`) and called widely from async handler paths, risking latency spikes and head-of-line blocking.

### Orchestration maturity
- Good: cooldown thresholds and queue constants for intelligence signals in `scheduler.py` indicate deliberate background pipeline behavior.
- Missing: task-level isolation (worker queues), bounded concurrency controls (semaphores), and cancellation-aware retries around independent fan-out operations.

## 4) Telegram Bot Scaling Concerns
1. In-memory rate limiting map (`_user_timestamps`) is per-process only; with multiple replicas, users can bypass limits by shard hopping.
2. Global daily AI cap is process-local (`_daily_ai_requests`) and resets by instance/date, not cluster-wide.
3. Admin alerting swallows exceptions silently in `notify_admin`, which can hide prolonged telemetry failure.
4. Logging to local rotating file can be weak in containerized ephemeral storage unless centralized logs are guaranteed.

## 5) Security Audit
### Positives
- Environment variable enforcement for critical credentials.
- URL sanitization restricts anchors to `http(s)` and escapes quotes.
- HTML escaping helper for Telegram content.

### Risks
1. AI/tool output ingestion paths rely on best-effort sanitization; no centralized output policy engine for markdown/html templating invariants.
2. Import-time side effects (network + signal wiring + webhook clear) increase startup attack/failure surface.
3. Fallback to SQLite in production-like environments can create silent divergence in SQL behavior and constraints if Postgres unavailable.

## 6) Prompt & Memory Critique
- Prompt stack is modular (`prompts/system.py`, `tiers.py`, `analysis.py`, `scheduler.py`), which is good.
- Memory is primarily short conversation history + DB history; lacks semantic memory indexing, retrieval policy, and confidence/recency weighting.
- History trimming is char-budget based, not information-theoretic; can drop critical commitments or user preferences without salience logic.

**Future-proof verdict:** memory system is **serviceable short-term**, **not future-proof** for multi-agent personalized finance intelligence.

## 7) Code Smells
1. Large monolithic scheduler module with mixed layers (rendering, intelligence, transport).
2. Global mutable state spread across modules.
3. Broad `except Exception` blocks in critical paths reduce observability granularity.
4. Partial duplication of compatibility helpers and legacy aliases that make behavior contracts fuzzy over time.

## 8) Performance Bottlenecks
1. Blocking provider/database calls in async handlers.
2. Potential repeated external calls without shared cache layers for frequently requested market data.
3. Heavy prompt payloads/history concatenation per request without structured caching for system/tier scaffolding.

## 9) Production Readiness Scorecard
- Reliability: **7/10**
- Async safety: **5/10**
- Security posture: **7/10**
- Scalability: **6/10**
- Observability: **6/10**
- Agentic evolution readiness: **6/10**

**Overall:** **6.8/10**

## 10) Exact Refactor Suggestions (Incremental, Low-Risk)
1. Add `bot/providers/` abstraction layer:
- `market_provider.py`, `llm_provider.py`, `news_provider.py` interfaces.
- Keep existing modules as adapters first; no behavior changes.
2. Add async-safe DB access facade:
- Keep `database.py` API, but route handler calls through `asyncio.to_thread` wrapper initially.
3. Move process-local limits to DB/Redis-backed counters:
- rate limits, daily cap, and signal cooldowns should be cross-instance consistent.
4. Split scheduler by concern:
- `scheduler/signals.py`, `scheduler/rendering.py`, `scheduler/delivery.py`, `scheduler/jobs.py`.
5. Introduce task orchestration primitives:
- bounded semaphore per external provider,
- standardized timeout/retry policy,
- `asyncio.gather` for independent upstream calls.
6. Implement memory service boundary:
- `memory/store.py` (episodic), `memory/retrieval.py` (ranked recall), `memory/profile.py` (user preference/state).
7. Add structured event bus contracts:
- typed events for `market_signal_detected`, `narrative_shift`, `whale_activity`, `research_task_created`.

## 11) Proactive Workflow & Agentic Evolution Verdict
- **Supports proactive workflows today?** Partially yes (scheduled publishing + signal triggers).
- **Evolving toward agentic behavior?** Yes, but still early-stage.
- **Needed for true agentic platform:** durable planner/executor loop, tool registry abstraction, long-horizon memory, event-sourced intelligence graph, and multi-worker orchestration.

## 12) Multi-Agent Finance Platform Scalability
- Current folder layout is understandable but still single-runtime centric.
- To scale into multi-agent finance architecture, add top-level domains:
- `agents/` (planner, researcher, monitor, publisher)
- `events/` (schemas + bus)
- `providers/` (data/tool adapters)
- `memory/` (profiles, embeddings, episodic logs)
- `workflows/` (pipelines and policies)

This can be done incrementally while preserving current behavior.