Real-time meeting intelligence — transcription, screen reading, AI summaries, and a chatbot that knows your meeting.
FloatNote is a desktop-first meeting assistant that quietly runs in the background while you work. It captures your microphone, reads your screen during presentations, and turns everything into searchable, queryable meeting memory — powered by local Whisper transcription and HuggingFace LLMs.
| Feature | Description |
|---|---|
| 🎤 Live Transcription | Streams audio from your mic through OpenAI Whisper (base model) in real-time |
| 🖥️ Screen OCR | Captures slide content as it changes, extracting text and keywords automatically |
| 🧠 AI Summarization | Generates meeting summaries via BART/Pegasus on HuggingFace Inference API (or local fallback) |
| 💬 Meeting Chatbot | Ask questions about any past meeting — answers grounded in a FAISS vector store via RAG |
| 🗃️ Persistent Storage | All transcripts, OCR captures, and action items saved to SQLite via async SQLAlchemy |
| ⚡ Action Item Extraction | NLP pipeline (spaCy) detects tasks and assignees from spoken text |
| 🖥️ Electron Desktop App | Optional Electron wrapper for a native windowed experience |
FloatNote/
├── backend/
│ ├── main.py # FastAPI app + WebSocket server
│ ├── requirements.txt
│ ├── ai_modules/
│ │ ├── stt/
│ │ │ └── whisper_engine.py # Audio capture + Whisper transcription
│ │ ├── ocr/
│ │ │ ├── ocr_processor.py # Screen capture + Tesseract OCR
│ │ │ └── keyword_filter.py # Keyword post-processing
│ │ ├── summarizer/
│ │ │ └── summarizer.py # HuggingFace summarization (BART/Pegasus)
│ │ ├── chatbot/
│ │ │ └── chatbot.py # LangChain RAG chatbot (FAISS + Qwen LLM)
│ │ └── utils/
│ │ └── nlp_processor.py # spaCy NLP pipeline
│ └── database/
│ ├── models.py # SQLAlchemy models (Meeting, Transcript, ActionItem)
│ ├── crud.py # Async database operations
│ └── view_db.py # Database viewer utility
└── frontend/
├── react-app/ # Vite + React 19 + Tailwind CSS UI
│ └── src/App.jsx # Main dashboard (WebSocket client)
└── electron/
└── main.js # Electron wrapper (loads localhost:5173)
- Python 3.10+
- Node.js 18+
- Tesseract OCR (for screen reading)
git clone https://github.com/Parth-Gupta-github/FloatNote.git
cd FloatNotepython -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r backend/requirements.txt
⚠️ First run downloads the Whisperbasemodel (~150MB) and the spaCyen_core_web_smmodel automatically.
Windows:
Download the installer from the Tesseract at UB Mannheim wiki, then install via winget:
winget install UB-Mannheim.TesseractOCRThen verify the path in backend/ai_modules/ocr/ocr_processor.py:
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"Create a .env file inside backend/:
# Required for AI summarization and chatbot
HUGGINGFACEHUB_API_TOKEN=hf_...
# Required for keyword filtering (Groq LLM)
GROQ_API_KEY=gsk_...💡 A HuggingFace API token is required for summarization and the chatbot. Get one free at huggingface.co/settings/tokens.
💡 A Groq API key is required for LLM-powered keyword filtering. Without it, keywords fall back to simple deduplication.
.\.venv\Scripts\Activate.ps1
python backend/main.pyThe server starts at http://localhost:8000 and immediately begins listening to your microphone.
cd frontend/react-app
npm install
npm run devOpen http://localhost:5173 in your browser.
cd frontend/electron
npm install
npm start| Method | Endpoint | Description |
|---|---|---|
WS |
/ws |
Real-time audio + OCR stream (connects and starts a meeting) |
GET |
/meetings/latest/summary |
Summarize the most recent meeting |
GET |
/meetings/{id}/summary |
Summarize a specific meeting by ID |
POST |
/meetings/latest/chat |
Ask a question about the latest meeting |
POST |
/meetings/{id}/chat |
Ask a question about a specific meeting |
{
"question": "What action items were assigned to me?"
}{
"type": "connected",
"meeting_id": 42
}{
"text": "Let's align on the Q3 roadmap.",
"keywords": ["roadmap", "Q3"],
"actions": [{ "task": "Share roadmap draft", "assignee": "MIC" }],
"ocr": { "text": "Slide: Roadmap Overview", "keywords": ["roadmap"] },
"meeting_id": 42
}| Component | Default Model | Configurable |
|---|---|---|
| Transcription | openai/whisper-base (local) |
Change model size in whisper_engine.py |
| Summarization | facebook/bart-large-cnn (HF API) |
HF_SUMMARIZER_REPO_ID env var |
| Chatbot LLM | Qwen/Qwen2.5-7B-Instruct (HF API) |
HUGGINGFACE_CHAT_MODEL env var |
| Keyword Filtering | llama-3.3-70b-versatile (Groq API) |
Hardcoded in keyword_filter.py |
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 (local) |
Hardcoded in chatbot.py |
| NLP / Action Items | en_core_web_sm (spaCy, local) |
— |
Supported summarizer models:
facebook/bart-large-cnngoogle/pegasus-xsumsshleifer/distilbart-cnn-12-6
FloatNote uses SQLite (backend/database/meeting_assistant.db) with async SQLAlchemy.
meetings
id, title, start_time, summary
transcripts
id, meeting_id → meetings.id, timestamp, text, keywords, source (MIC / OCR / SPEAKER_xx)
action_items
id, meeting_id → meetings.id, description, assignee, status
To inspect the database directly:
python backend/database/view_db.py| Variable | Default | Description |
|---|---|---|
HUGGINGFACEHUB_API_TOKEN |
— | Required. HF API token |
GROQ_API_KEY |
— | Required. Groq API token for keyword filtering |
HUGGINGFACE_CHAT_MODEL |
Qwen/Qwen2.5-7B-Instruct |
Chat LLM repo ID |
HF_SUMMARIZER_REPO_ID |
facebook/bart-large-cnn |
Summarizer model repo ID |
ENABLE_OCR |
true |
Enable/disable screen capture |
OCR_INTERVAL_SECONDS |
3.0 |
How often to poll for screen changes |
OCR_CHANGE_THRESHOLD |
0.02 |
Minimum pixel-change ratio to trigger OCR |
HOST |
0.0.0.0 |
Backend bind host |
PORT |
8000 |
Backend bind port |
Backend
- FastAPI + Uvicorn — async web server + WebSockets
- OpenAI Whisper — local speech-to-text
- Tesseract OCR + pytesseract — screen reading
- Groq API (
llama-3.3-70b-versatile) — LLM-powered keyword filtering - LangChain + FAISS — RAG chatbot
- HuggingFace Inference API — summarization + chat LLM
- spaCy — action item extraction + NLP
- SQLAlchemy (async) + SQLite — database
Frontend
- React 19 + Vite — UI framework
- Tailwind CSS — styling
- Electron — optional desktop wrapper
- Windows-only OCR path — the Tesseract path in
ocr_processor.pydefaults to a Windows path. Linux/macOS users must update it or ensuretesseractis onPATH. - Single monitor — OCR captures monitor index
1by default. Adjustmonitor_indexinOCRProcessorfor multi-monitor setups. - Max 3 WebSocket clients — concurrent client connections are capped to prevent resource exhaustion.
- HF API latency — summarization and chat responses depend on HuggingFace Inference API availability and may be slow on free tier.