FloatNote 🎙️

Real-time meeting intelligence — transcription, screen reading, AI summaries, and a chatbot that knows your meeting.

FloatNote is a desktop-first meeting assistant that quietly runs in the background while you work. It captures your microphone, reads your screen during presentations, and turns everything into searchable, queryable meeting memory — powered by local Whisper transcription and HuggingFace LLMs.

✨ Features

Feature	Description
🎤 Live Transcription	Streams audio from your mic through OpenAI Whisper (`base` model) in real-time
🖥️ Screen OCR	Captures slide content as it changes, extracting text and keywords automatically
🧠 AI Summarization	Generates meeting summaries via BART/Pegasus on HuggingFace Inference API (or local fallback)
💬 Meeting Chatbot	Ask questions about any past meeting — answers grounded in a FAISS vector store via RAG
🗃️ Persistent Storage	All transcripts, OCR captures, and action items saved to SQLite via async SQLAlchemy
⚡ Action Item Extraction	NLP pipeline (spaCy) detects tasks and assignees from spoken text
🖥️ Electron Desktop App	Optional Electron wrapper for a native windowed experience

🏗️ Architecture

FloatNote/
├── backend/
│   ├── main.py                        # FastAPI app + WebSocket server
│   ├── requirements.txt
│   ├── ai_modules/
│   │   ├── stt/
│   │   │   └── whisper_engine.py      # Audio capture + Whisper transcription
│   │   ├── ocr/
│   │   │   ├── ocr_processor.py       # Screen capture + Tesseract OCR
│   │   │   └── keyword_filter.py      # Keyword post-processing
│   │   ├── summarizer/
│   │   │   └── summarizer.py          # HuggingFace summarization (BART/Pegasus)
│   │   ├── chatbot/
│   │   │   └── chatbot.py             # LangChain RAG chatbot (FAISS + Qwen LLM)
│   │   └── utils/
│   │       └── nlp_processor.py       # spaCy NLP pipeline
│   └── database/
│       ├── models.py                  # SQLAlchemy models (Meeting, Transcript, ActionItem)
│       ├── crud.py                    # Async database operations
│       └── view_db.py                 # Database viewer utility
└── frontend/
    ├── react-app/                     # Vite + React 19 + Tailwind CSS UI
    │   └── src/App.jsx                # Main dashboard (WebSocket client)
    └── electron/
        └── main.js                    # Electron wrapper (loads localhost:5173)

🚀 Setup & Installation

Prerequisites

Python 3.10+
Node.js 18+
Tesseract OCR (for screen reading)

1. Clone the repo

git clone https://github.com/Parth-Gupta-github/FloatNote.git
cd FloatNote

2. Install Python dependencies

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r backend/requirements.txt

⚠️ First run downloads the Whisper base model (~150MB) and the spaCy en_core_web_sm model automatically.

3. Install Tesseract OCR

Windows:

Download the installer from the Tesseract at UB Mannheim wiki, then install via winget:

winget install UB-Mannheim.TesseractOCR

Then verify the path in backend/ai_modules/ocr/ocr_processor.py:

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

4. Configure environment variables

Create a .env file inside backend/:

# Required for AI summarization and chatbot
HUGGINGFACEHUB_API_TOKEN=hf_...

# Required for keyword filtering (Groq LLM)
GROQ_API_KEY=gsk_...

💡 A HuggingFace API token is required for summarization and the chatbot. Get one free at huggingface.co/settings/tokens.

💡 A Groq API key is required for LLM-powered keyword filtering. Without it, keywords fall back to simple deduplication.

5. Start the backend

.\.venv\Scripts\Activate.ps1
python backend/main.py

The server starts at http://localhost:8000 and immediately begins listening to your microphone.

6. Start the frontend

cd frontend/react-app
npm install
npm run dev

Open http://localhost:5173 in your browser.

7. (Optional) Run as Electron desktop app

cd frontend/electron
npm install
npm start

📡 API Reference

Method	Endpoint	Description
`WS`	`/ws`	Real-time audio + OCR stream (connects and starts a meeting)
`GET`	`/meetings/latest/summary`	Summarize the most recent meeting
`GET`	`/meetings/{id}/summary`	Summarize a specific meeting by ID
`POST`	`/meetings/latest/chat`	Ask a question about the latest meeting
`POST`	`/meetings/{id}/chat`	Ask a question about a specific meeting

Chat request body

{
  "question": "What action items were assigned to me?"
}

WebSocket message format (incoming)

{
  "type": "connected",
  "meeting_id": 42
}

{
  "text": "Let's align on the Q3 roadmap.",
  "keywords": ["roadmap", "Q3"],
  "actions": [{ "task": "Share roadmap draft", "assignee": "MIC" }],
  "ocr": { "text": "Slide: Roadmap Overview", "keywords": ["roadmap"] },
  "meeting_id": 42
}

🤖 AI Models

Component	Default Model	Configurable
Transcription	`openai/whisper-base` (local)	Change model size in `whisper_engine.py`
Summarization	`facebook/bart-large-cnn` (HF API)	`HF_SUMMARIZER_REPO_ID` env var
Chatbot LLM	`Qwen/Qwen2.5-7B-Instruct` (HF API)	`HUGGINGFACE_CHAT_MODEL` env var
Keyword Filtering	`llama-3.3-70b-versatile` (Groq API)	Hardcoded in `keyword_filter.py`
Embeddings	`sentence-transformers/all-MiniLM-L6-v2` (local)	Hardcoded in `chatbot.py`
NLP / Action Items	`en_core_web_sm` (spaCy, local)	—

Supported summarizer models:

facebook/bart-large-cnn
google/pegasus-xsum
sshleifer/distilbart-cnn-12-6

🗄️ Database Schema

FloatNote uses SQLite (backend/database/meeting_assistant.db) with async SQLAlchemy.

meetings
  id, title, start_time, summary

transcripts
  id, meeting_id → meetings.id, timestamp, text, keywords, source (MIC / OCR / SPEAKER_xx)

action_items
  id, meeting_id → meetings.id, description, assignee, status

To inspect the database directly:

python backend/database/view_db.py

⚙️ Configuration Reference

Variable	Default	Description
`HUGGINGFACEHUB_API_TOKEN`	—	Required. HF API token
`GROQ_API_KEY`	—	Required. Groq API token for keyword filtering
`HUGGINGFACE_CHAT_MODEL`	`Qwen/Qwen2.5-7B-Instruct`	Chat LLM repo ID
`HF_SUMMARIZER_REPO_ID`	`facebook/bart-large-cnn`	Summarizer model repo ID
`ENABLE_OCR`	`true`	Enable/disable screen capture
`OCR_INTERVAL_SECONDS`	`3.0`	How often to poll for screen changes
`OCR_CHANGE_THRESHOLD`	`0.02`	Minimum pixel-change ratio to trigger OCR
`HOST`	`0.0.0.0`	Backend bind host
`PORT`	`8000`	Backend bind port

🛠️ Tech Stack

Backend

FastAPI + Uvicorn — async web server + WebSockets
OpenAI Whisper — local speech-to-text
Tesseract OCR + pytesseract — screen reading
Groq API (llama-3.3-70b-versatile) — LLM-powered keyword filtering
LangChain + FAISS — RAG chatbot
HuggingFace Inference API — summarization + chat LLM
spaCy — action item extraction + NLP
SQLAlchemy (async) + SQLite — database

Frontend

React 19 + Vite — UI framework
Tailwind CSS — styling
Electron — optional desktop wrapper

🐛 Known Issues & Limitations

Windows-only OCR path — the Tesseract path in ocr_processor.py defaults to a Windows path. Linux/macOS users must update it or ensure tesseract is on PATH.
Single monitor — OCR captures monitor index 1 by default. Adjust monitor_index in OCRProcessor for multi-monitor setups.
Max 3 WebSocket clients — concurrent client connections are capped to prevent resource exhaustion.
HF API latency — summarization and chat responses depend on HuggingFace Inference API availability and may be slow on free tier.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
Tech Requirements.docx		Tech Requirements.docx
run.ps1		run.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FloatNote 🎙️

✨ Features

🏗️ Architecture

🚀 Setup & Installation

Prerequisites

1. Clone the repo

2. Install Python dependencies

3. Install Tesseract OCR

4. Configure environment variables

5. Start the backend

6. Start the frontend

7. (Optional) Run as Electron desktop app

📡 API Reference

Chat request body

WebSocket message format (incoming)

🤖 AI Models

🗄️ Database Schema

⚙️ Configuration Reference

🛠️ Tech Stack

🐛 Known Issues & Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FloatNote 🎙️

✨ Features

🏗️ Architecture

🚀 Setup & Installation

Prerequisites

1. Clone the repo

2. Install Python dependencies

3. Install Tesseract OCR

4. Configure environment variables

5. Start the backend

6. Start the frontend

7. (Optional) Run as Electron desktop app

📡 API Reference

Chat request body

WebSocket message format (incoming)

🤖 AI Models

🗄️ Database Schema

⚙️ Configuration Reference

🛠️ Tech Stack

🐛 Known Issues & Limitations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages