Built a complete sales intelligence system for a fictional Indian e-commerce company using MySQL for data modeling and analysis, and Excel for interactive dashboards. The project covers the full analyst workflow — schema design, data generation, SQL analysis, and executive-level reporting.
Dataset: 10,000 customers | 500 products | 43,722 orders | 76,378 order line items
Time Period: January 2022 – December 2024
Tools: MySQL 8.0, Excel, Python (data generation)
| Problem | Analysis Done | Key Finding |
|---|---|---|
| Which customers are about to churn? | RFM Segmentation | 729 customers classified as "Lost" with avg spend of ₹3L+ |
| How much revenue is leaking? | Returns & Cancellation Analysis | 21% monthly revenue lost — ₹11Cr/month average |
| Which categories are most profitable? | Margin Analysis | Home & Kitchen has higher margin (46.67%) than Electronics (42.33%) despite lower revenue |
| Are we retaining customers? | Cohort Retention Analysis | Only ~10% customers return after Month 1 |
| Which payment methods drive revenue? | Payment Method Analysis | UPI dominates at 41% of total revenue |
- Database: MySQL 8.0
- Analysis: SQL (CTEs, Window Functions, Subqueries)
- Visualization: Microsoft Excel (Pivot Tables, Charts, Conditional Formatting)
- Data Generation: Python (Faker, Pandas, NumPy)
customers orders order_items products
----------- ---------- ----------- ----------
customer_id → customer_id item_id product_id
customer_name order_id → order_id ← product_id
email order_date product_id product_name
city status quantity category
state payment_method unit_price cost_price
gender discount_pct returned selling_price
age delivery_date stock_quantity
signup_date
is_active
WITH monthly_revenue AS (
SELECT
DATE_FORMAT(o.order_date, '%Y-%m') AS order_month,
SUM(oi.quantity * oi.unit_price * (1 - o.discount_pct / 100)) AS revenue -- discount adjusted revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.status = 'Completed' -- only count delivered orders
GROUP BY DATE_FORMAT(o.order_date, '%Y-%m')
)
SELECT
order_month,
ROUND(revenue, 2) AS total_revenue,
ROUND((revenue - LAG(revenue) OVER (ORDER BY order_month))
/ LAG(revenue) OVER (ORDER BY order_month) * 100, 2) AS mom_growth_pct -- month over month growth %
FROM monthly_revenue
ORDER BY order_month;WITH rfm_base AS (
SELECT
c.customer_id,
c.customer_name,
DATEDIFF('2024-12-31', MAX(o.order_date)) AS recency_days, -- days since last order
COUNT(DISTINCT o.order_id) AS frequency, -- total orders placed
ROUND(SUM(oi.quantity * oi.unit_price), 2) AS monetary -- total amount spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.status = 'Completed'
GROUP BY c.customer_id, c.customer_name
),
rfm_scores AS (
SELECT *,
NTILE(5) OVER (ORDER BY recency_days ASC) AS r_score, -- lower recency = better
NTILE(5) OVER (ORDER BY frequency DESC) AS f_score, -- higher frequency = better
NTILE(5) OVER (ORDER BY monetary DESC) AS m_score -- higher spend = better
FROM rfm_base
)
SELECT *,
CASE
WHEN r_score >= 4 AND f_score >= 4 AND m_score >= 4 THEN 'Champion'
WHEN r_score >= 3 AND f_score >= 3 THEN 'Loyal Customer'
WHEN r_score >= 4 AND f_score <= 2 THEN 'New Customer'
WHEN r_score <= 2 AND f_score >= 3 THEN 'At Risk'
WHEN r_score = 1 AND f_score = 1 THEN 'Lost'
ELSE 'Needs Attention'
END AS segment
FROM rfm_scores;WITH first_order AS (
SELECT
customer_id,
MIN(order_date) AS cohort_date,
DATE_FORMAT(MIN(order_date), '%Y-%m') AS cohort_month -- first purchase month
FROM orders
WHERE status = 'Completed'
GROUP BY customer_id
),
cohort_data AS (
SELECT
f.customer_id,
f.cohort_month,
TIMESTAMPDIFF(MONTH, f.cohort_date, o.order_date) AS month_number -- months since first order
FROM first_order f
JOIN orders o ON f.customer_id = o.customer_id
WHERE o.status = 'Completed'
)
SELECT
cohort_month,
month_number,
COUNT(DISTINCT customer_id) AS active_customers
FROM cohort_data
WHERE month_number BETWEEN 0 AND 11
GROUP BY cohort_month, month_number
ORDER BY cohort_month, month_number;| Sheet | Content |
|---|---|
| Executive Summary | KPI cards + 3 business recommendations |
| Revenue Trends | Line chart — monthly revenue with MoM growth |
| Customer Segments | RFM Pie chart — 6 behavioral segments |
| Category Performance | Horizontal bar chart — profit by category |
| Cohort Heatmap | Green-Red retention heatmap — 36 cohorts × 12 months |
| Payment Analysis | Donut chart — revenue split by payment method |
| Leakage Analysis | Red area chart — monthly revenue leakage % |
1. Retention Crisis
Only ~10% of customers return after their first purchase. The cohort heatmap shows a sharp drop from Month_0 to Month_1 across all cohorts. Recommendation: implement a post-purchase loyalty program and targeted Month-1 re-engagement email campaign.
2. Revenue Leakage
An average of 21% monthly revenue is lost to returns and cancellations — approximately ₹11 Crore per month. Peak leakage was 27.57% in August 2022. Electronics category drives the highest return volume (2,341 orders). Recommendation: stricter return policy on Electronics + quality checks.
3. Hidden Profit Opportunity
Home & Kitchen has the highest profit margin (46.67%) but ranks 2nd in revenue. Increasing marketing spend on this category can improve overall profitability without needing volume increase in Electronics.
ecommerce-sales-intelligence/
│
├── sql/
│ ├── 01_schema.sql # database schema with indexes
│ ├── 02_data_generation.py # synthetic data generator
│ ├── 03_import_data.sql # CSV import scripts
│ └── 04_phase2_queries.sql # all 8 business SQL queries
│
└── dashboard/
└── ECommerce_Sales_Intelligence_Dashboard.xlsx
- Install MySQL 8.0 and run
01_schema.sqlto create the database - Run
python 02_data_generation.pyto generate CSV files - Run
03_import_data.sqlto load data into MySQL - Open
04_phase2_queries.sqlto explore all analyses - Open the Excel dashboard for visualizations
Built as a portfolio project to demonstrate end-to-end data analytics skills — schema design, SQL querying, customer segmentation, and business intelligence reporting.