Skip to content

saiprasanth-git/repomind

Repository files navigation

RepoMind 🧠

Talk to any GitHub repository like it's a senior engineer who knows every file.

Python FastAPI React TypeScript PostgreSQL License

Live Demo

API (Cloud Run): https://repomind-backend-340292594504.us-central1.run.app/docs

Pre-indexed repo: fastapi/fastapi — 2526 files, 12514 chunks


What Is This?

RepoMind is a codebase-aware AI assistant. Paste any public GitHub URL and ask questions about the code in plain English. Get answers with direct citations to exact files and line numbers.

It also includes a research experiment comparing two architectures for code understanding:

  • RAG (Retrieval-Augmented Generation) — find relevant chunks, answer from those
  • Long-Context — send the entire repo to Gemini 1.5 Pro (2M token context window)

Read the non-technical explanation Read the architecture deep-dive Read the research experiment


Quick Start (Local Development)

Prerequisites

  • Docker + Docker Compose
  • Node.js 20+
  • A Google AI Studio API key (get one free)

1. Clone & configure

git clone https://github.com/yourusername/repomind
cd repomind

# Copy environment template
cp backend/.env.example backend/.env

# Edit backend/.env and add your API keys
nano backend/.env

2. Start the backend + database

docker compose up

This starts:

  • PostgreSQL 16 with pgvector on port 5432
  • FastAPI backend on port 8000
  • Swagger API docs at http://localhost:8000/docs

3. Start the frontend

cd frontend
npm install
npm run dev

Frontend runs at http://localhost:5173


Project Structure

repomind/
├── backend/                    # Python + FastAPI
│   ├── app/
│   │   ├── api/routes/         # HTTP endpoints
│   │   ├── core/               # Config, settings
│   │   ├── db/                 # Database setup
│   │   ├── engines/            # RAG + Long-Context engines  [Phase 2]
│   │   ├── ingestion/          # Clone → Parse → Chunk → Embed → Store
│   │   ├── models/             # SQLAlchemy ORM models
│   │   └── schemas/            # Pydantic request/response types
│   ├── tests/                  # Pytest test suite
│   ├── Dockerfile
│   └── requirements.txt
│
├── frontend/                   # React + TypeScript + Vite  [Phase 3]
│   └── src/
│       ├── components/         # UI components
│       ├── pages/              # Route-level pages
│       ├── services/           # API client
│       └── stores/             # Zustand state
│
├── experiments/                # Research experiment  [Phase 4]
│   ├── scripts/                # Eval harness
│   ├── data/                   # Test questions + ground truth
│   └── results/                # Metrics, charts, analysis
│
├── docs/
│   ├── plain-english/          # Non-technical documentation
│   └── technical/              # Architecture, API reference, experiment
│
└── docker-compose.yml          # Local development environment

API Endpoints (Phase 1)

Method Endpoint Description
GET /health Liveness check
GET /health/db Database connectivity check
GET /api/v1/repos List all indexed repositories
POST /api/v1/repos Start indexing a new repo
GET /api/v1/repos/{id} Get repo details
GET /api/v1/repos/{id}/status Poll ingestion progress
GET /api/v1/repos/{id}/tree Get file tree
DELETE /api/v1/repos/{id} Delete a repo

Query and patch endpoints added in Phase 2.


Architecture

GitHub URL
    │
    ▼
┌─────────────────────────────────────────┐
│          Ingestion Pipeline              │
│  Clone → Parse → Chunk → Embed → Store  │
└───────────────────┬─────────────────────┘
                    │
                    ▼
         PostgreSQL + pgvector
                    │
         ┌──────────┴──────────┐
         │                     │
    RAG Engine          Long-Context Engine
    (retrieval)         (full-repo context)
         │                     │
         └──────────┬──────────┘
                    │
                Gemini 1.5 Pro
                    │
              Answer + Sources

Research Experiment

We ran 50 questions across 5 real open-source repositories to answer:

When does long-context prompting beat RAG? When does RAG win?

Key findings:

  • RAG wins on speed (3-5x faster) and cost (8-12x cheaper)
  • Long-context wins on accuracy for questions requiring cross-file reasoning
  • The crossover point is around ~150k tokens of relevant context
  • Question type matters more than repo size

Full methodology and results: EXPERIMENT.md


Built In

  • Stack: Python, FastAPI, React, TypeScript, PostgreSQL, pgvector, LangChain, Gemini 1.5 Pro, Docker, GCP Cloud Run

License

MIT — use it, learn from it, build on it.

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors