Skip to content

Kalpana-Chaurasiya21/Project_MySQL_Excel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

E-Commerce Sales Intelligence Dashboard

End-to-End Data Analytics Project | MySQL + Excel


Project Overview

Built a complete sales intelligence system for a fictional Indian e-commerce company using MySQL for data modeling and analysis, and Excel for interactive dashboards. The project covers the full analyst workflow — schema design, data generation, SQL analysis, and executive-level reporting.

Dataset: 10,000 customers | 500 products | 43,722 orders | 76,378 order line items
Time Period: January 2022 – December 2024
Tools: MySQL 8.0, Excel, Python (data generation)


Business Problems Solved

Problem Analysis Done Key Finding
Which customers are about to churn? RFM Segmentation 729 customers classified as "Lost" with avg spend of ₹3L+
How much revenue is leaking? Returns & Cancellation Analysis 21% monthly revenue lost — ₹11Cr/month average
Which categories are most profitable? Margin Analysis Home & Kitchen has higher margin (46.67%) than Electronics (42.33%) despite lower revenue
Are we retaining customers? Cohort Retention Analysis Only ~10% customers return after Month 1
Which payment methods drive revenue? Payment Method Analysis UPI dominates at 41% of total revenue

Technical Stack

  • Database: MySQL 8.0
  • Analysis: SQL (CTEs, Window Functions, Subqueries)
  • Visualization: Microsoft Excel (Pivot Tables, Charts, Conditional Formatting)
  • Data Generation: Python (Faker, Pandas, NumPy)

Database Schema

customers        orders           order_items      products
-----------      ----------       -----------      ----------
customer_id  →   customer_id      item_id          product_id
customer_name    order_id     →   order_id     ←   product_id
email            order_date       product_id       product_name
city             status           quantity         category
state            payment_method   unit_price       cost_price
gender           discount_pct     returned         selling_price
age              delivery_date                     stock_quantity
signup_date
is_active

SQL Analyses Built

1. Monthly Revenue Trend with MoM Growth

WITH monthly_revenue AS (
    SELECT
        DATE_FORMAT(o.order_date, '%Y-%m') AS order_month,
        SUM(oi.quantity * oi.unit_price * (1 - o.discount_pct / 100)) AS revenue -- discount adjusted revenue
    FROM orders o
    JOIN order_items oi ON o.order_id = oi.order_id
    WHERE o.status = 'Completed' -- only count delivered orders
    GROUP BY DATE_FORMAT(o.order_date, '%Y-%m')
)
SELECT
    order_month,
    ROUND(revenue, 2) AS total_revenue,
    ROUND((revenue - LAG(revenue) OVER (ORDER BY order_month))
        / LAG(revenue) OVER (ORDER BY order_month) * 100, 2) AS mom_growth_pct -- month over month growth %
FROM monthly_revenue
ORDER BY order_month;

2. RFM Customer Segmentation

WITH rfm_base AS (
    SELECT
        c.customer_id,
        c.customer_name,
        DATEDIFF('2024-12-31', MAX(o.order_date)) AS recency_days, -- days since last order
        COUNT(DISTINCT o.order_id) AS frequency,                   -- total orders placed
        ROUND(SUM(oi.quantity * oi.unit_price), 2) AS monetary     -- total amount spent
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    JOIN order_items oi ON o.order_id = oi.order_id
    WHERE o.status = 'Completed'
    GROUP BY c.customer_id, c.customer_name
),
rfm_scores AS (
    SELECT *,
        NTILE(5) OVER (ORDER BY recency_days ASC) AS r_score, -- lower recency = better
        NTILE(5) OVER (ORDER BY frequency DESC)   AS f_score, -- higher frequency = better
        NTILE(5) OVER (ORDER BY monetary DESC)    AS m_score  -- higher spend = better
    FROM rfm_base
)
SELECT *,
    CASE
        WHEN r_score >= 4 AND f_score >= 4 AND m_score >= 4 THEN 'Champion'
        WHEN r_score >= 3 AND f_score >= 3                  THEN 'Loyal Customer'
        WHEN r_score >= 4 AND f_score <= 2                  THEN 'New Customer'
        WHEN r_score <= 2 AND f_score >= 3                  THEN 'At Risk'
        WHEN r_score = 1  AND f_score = 1                   THEN 'Lost'
        ELSE 'Needs Attention'
    END AS segment
FROM rfm_scores;

3. Cohort Retention Analysis

WITH first_order AS (
    SELECT
        customer_id,
        MIN(order_date) AS cohort_date,
        DATE_FORMAT(MIN(order_date), '%Y-%m') AS cohort_month -- first purchase month
    FROM orders
    WHERE status = 'Completed'
    GROUP BY customer_id
),
cohort_data AS (
    SELECT
        f.customer_id,
        f.cohort_month,
        TIMESTAMPDIFF(MONTH, f.cohort_date, o.order_date) AS month_number -- months since first order
    FROM first_order f
    JOIN orders o ON f.customer_id = o.customer_id
    WHERE o.status = 'Completed'
)
SELECT
    cohort_month,
    month_number,
    COUNT(DISTINCT customer_id) AS active_customers
FROM cohort_data
WHERE month_number BETWEEN 0 AND 11
GROUP BY cohort_month, month_number
ORDER BY cohort_month, month_number;

Excel Dashboard — 7 Sheets

Sheet Content
Executive Summary KPI cards + 3 business recommendations
Revenue Trends Line chart — monthly revenue with MoM growth
Customer Segments RFM Pie chart — 6 behavioral segments
Category Performance Horizontal bar chart — profit by category
Cohort Heatmap Green-Red retention heatmap — 36 cohorts × 12 months
Payment Analysis Donut chart — revenue split by payment method
Leakage Analysis Red area chart — monthly revenue leakage %

Key Business Insights

1. Retention Crisis
Only ~10% of customers return after their first purchase. The cohort heatmap shows a sharp drop from Month_0 to Month_1 across all cohorts. Recommendation: implement a post-purchase loyalty program and targeted Month-1 re-engagement email campaign.

2. Revenue Leakage
An average of 21% monthly revenue is lost to returns and cancellations — approximately ₹11 Crore per month. Peak leakage was 27.57% in August 2022. Electronics category drives the highest return volume (2,341 orders). Recommendation: stricter return policy on Electronics + quality checks.

3. Hidden Profit Opportunity
Home & Kitchen has the highest profit margin (46.67%) but ranks 2nd in revenue. Increasing marketing spend on this category can improve overall profitability without needing volume increase in Electronics.


Project Structure

ecommerce-sales-intelligence/
│
├── sql/
│   ├── 01_schema.sql                    # database schema with indexes
│   ├── 02_data_generation.py            # synthetic data generator
│   ├── 03_import_data.sql               # CSV import scripts
│   └── 04_phase2_queries.sql            # all 8 business SQL queries
│
└── dashboard/
    └── ECommerce_Sales_Intelligence_Dashboard.xlsx

How to Run

  1. Install MySQL 8.0 and run 01_schema.sql to create the database
  2. Run python 02_data_generation.py to generate CSV files
  3. Run 03_import_data.sql to load data into MySQL
  4. Open 04_phase2_queries.sql to explore all analyses
  5. Open the Excel dashboard for visualizations

Built as a portfolio project to demonstrate end-to-end data analytics skills — schema design, SQL querying, customer segmentation, and business intelligence reporting.

About

End-to-end data analytics projects using MySQL and Excel | RFM Segmentation | Cohort Analysis | Sales Intelligence Dashboard

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors