🗞️ Personalized Daily News Summarizer (RAG + LLM)

This project is a personalized news summarization app that uses Retrieval-Augmented Generation (RAG) to fetch, filter, and summarize daily news articles according to a user's interests and preferred summary style.

Check out the full report and deployment here!

🔍 Overview

Pulls recent news articles using public APIs or RSS feeds (src/fetch_urls.py)
- Scrapes those articles (scrape_full_articles.py)
DOES NOT chunk articles due to context issues within them; news articles tend to be short, so context window would not be hit (src/embed_articles.py)
Retrieves only the most relevant articles based on your preferences
Summarizes them using a customizable LLM-based summarizer (src/query_and_summarize.py)
Automated to re-run data fetching (fetch_urls -> embed_articles) scripts at UTC-Midnight for up-to-date data (.github/workflows/refresh_news.yml)
Collects user feedback (👍/👎 or rewrite requests) to adapt over time (src/query_and_summarize_personalized.py)
Delivers clean, readable digests in your preferred tone and format (src/rlhf_finetune.py)

💻 Example Usage

1. News Summariser

_{*Note: all prompts were selected from top headlines when creating this project.}

Example 1: Politics Example 2: Economy Example 3: Science

2. Custom Tone Adaption

The following guidelines were used to tune the model's summary delivery:

Factual with Context: The summary must include key facts along with context, rather than just bare information.
Slightly Casual with Personality: The tone should have a hint of casualness and personality without sacrificing professionalism.
Data-Driven: Ensure the summary cites clear data points or uses factual evidence.

🔨 Roadblocks + Solutions

Roadblock	Solution
No inherent 'politics' category in NewsAPI	Created custom category using keyword search via the everything endpoint
LLM Output irrelevant to user query	Retrieved articles via semantic similarity (FAISS), applied a similarity threshold, and returned a fallback "unsure" answer if no context was strong enough
Bias transparency	Potential addition(s): Keep database of sources with bias scores, train new agent to scan for bias and generate score.
Dated articles	Potential addition(s): Implement system to weight articles via date, or omit after certain timeframe.

_{*Note: all potential additions have not been added yet, and are stated to address gaps in project application.}

⚙️ Features

🔎 Semantic Article Retrieval using vector similarity (FAISS / Chroma)
🤖 Custom Summarization Styles (bullet points, casual, academic, etc.)
📰 News API or RSS Integration (e.g., NewsAPI, The Guardian)
📥 User Preference Profiles (topics, style, summary length)
🧠 LLM Integration using OpenAI and transformer models
📊 Feedback Logging for tone/style fine-tuning [In Progress...]

🛠️ Tech Stack

Component	Tool
Language	Python
Retrieval	FAISS
Embeddings	sentence-transformers / OpenAI Embeddings
LLMs	transformers (t5-base)
Summarization	LangChain / Custom Prompt Templates
UI (Optional)	Flask
Data Storage	JSON / CSV

📥 Example Usage

python main.py --user_profile config/user_1.json

Roadblocks and Solutions

Retrieving 3 articles, but only one relevant --> extract distance score and find threshold

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.github/workflows		.github/workflows
data		data
images		images
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
reset_venv.sh		reset_venv.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🗞️ Personalized Daily News Summarizer (RAG + LLM)

🔍 Overview

💻 Example Usage

1. News Summariser

2. Custom Tone Adaption

🔨 Roadblocks + Solutions

⚙️ Features

🛠️ Tech Stack

📥 Example Usage

Roadblocks and Solutions

About

Releases

Packages

Contributors 2

Languages

scottpitcher/personalised_news_summariser_rag_llm

Folders and files

Latest commit

History

Repository files navigation

🗞️ Personalized Daily News Summarizer (RAG + LLM)

🔍 Overview

💻 Example Usage

1. News Summariser

2. Custom Tone Adaption

🔨 Roadblocks + Solutions

⚙️ Features

🛠️ Tech Stack

📥 Example Usage

Roadblocks and Solutions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages