Skip to content

Rudresh99/SQL_DataWarehouse_Analytics_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Warehouse and Analytics Project

Welcome to the Data Warehouse and Analytics Project repository! 🚀 This project demonstrates a comprehensive data warehousing and analytics solution, from building a data warehouse to generating actionable insights. Designed as a portfolio project, it highlights industry best practices in data engineering and analytics.


🏗️ Data Architecture

The data architecture for this project follows Medallion Architecture Bronze, Silver, and Gold layers: Data Architecture

  1. Bronze Layer: Stores raw data as-is from the source systems. Data is ingested from CSV Files into MYSQL Database.
  2. Silver Layer: This layer includes data cleansing, standardization, and normalization processes to prepare data for analysis.
  3. Gold Layer: Houses business-ready data modeled into a star schema required for reporting and analytics.

📖 Project Overview

This project involves:

  1. Data Architecture: Designing a Modern Data Warehouse Using Medallion Architecture Bronze, Silver, and Gold layers.
  2. ETL Pipelines: Extracting, transforming, and loading data from source systems into the warehouse.
  3. Data Modeling: Developing fact and dimension tables optimized for analytical queries.
  4. Analytics & Reporting: Creating SQL-based reports and dashboards for actionable insights.

🎯 This repository is an excellent resource for professionals and students looking to showcase expertise in:

  • SQL Development
  • Data Architect
  • Data Engineering
  • ETL Pipeline Developer
  • Data Modeling
  • Data Analytics

🚀 Project Requirements

Building the Data Warehouse (Data Engineering)

Objective

Develop a modern data warehouse using MySQL Workbench to consolidate sales data, enabling analytical reporting and informed decision-making.

Specifications

  • Data Sources : Import data from two source systems (ERP and CRM) provided as CSV files.
  • Data Quality: Cleanse and resolve data quality issues prior to analysis.
  • Integration: Combine both sources into a single, user-friendly data model designed for analytical queries.
  • Scope: Focus on the latest dataset only; historization of data is not required.
  • Documentation: Provide clear documentation of the data model to support both business stakeholders and analytics teams.

BI: Analytics & Reporting (Data Analysis)

SQL Data Analytics Repository

This is a collection of SQL scripts for data exploration, analytics, and reporting.

📊 Overview

Designed for data analysts and BI professionals, the repository provides practical SQL queries to efficiently explore, segment, and analyze data stored in relational databases.

Each script focuses on a specific analytical theme while demonstrating SQL best practices and real-world querying techniques.

🔑 Analytical Techniques Covered

It covers a wide range of analytical techniques, including:

  • Database exploration
  • Measures and metrics
  • Time-based trend analysis
  • Cumulative analytics
  • Segmentation
  • Other essential SQL-based analyses

Data Analysis Roadmap


📂 Repository Structure for Data Warehouse Project

data-warehouse-project/
│
├── datasets/                           # Raw datasets used for the project (ERP and CRM data)
│
├── docs/                               # Project documentation and architecture details                   
│   ├── data_architecture.drawio        # Draw.io file shows the project's architecture
│   ├── data_catalog.md                 # Catalog of datasets, including field descriptions and metadata
│   ├── data_flow.drawio                # Draw.io file for the data flow diagram
│   ├── data_models.drawio              # Draw.io file for data models (star schema)
│   ├── naming-conventions.md           # Consistent naming guidelines for tables, columns, and files
│
├── scripts/                            # SQL scripts for ETL and transformations
│   ├── bronze/                         # Scripts for extracting and loading raw data
│   ├── silver/                         # Scripts for cleaning and transforming data
│   ├── gold/                           # Scripts for creating analytical models
│
├── tests/                              # Test scripts and quality files
│
├── README.md                           # Project overview and instructions
├── LICENSE                             # License information for the repository
├── .gitignore                          # Files and directories to be ignored by Git
└── requirements.txt                    # Dependencies and requirements for the project

🛠️ Important Links & Tools:

Notion Link: http://incredible-yew-798.notion.site/Data-Warehouse-Project-With-Baraa-380645718ee480e98bb7e027648158fd

🌟 Special Thanks - Baraa Khatib Salkini 🙌🙌

For explaining all SQL concept and project in details and touching the real world Data Engineering and Data analysis projects like data warehousing, data segmentation, part to whole analysis and many more.

🛡️ License

This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.

🌟 About Me

Hi there 👋

I'm Rudresh Joshi,

I'm a Enthusiastic Data Engineer with 5+ years of experience designing and building scalable data platforms and ETL pipelines. I enjoy solving complex data problems by combining data engineering best practices with cloud technologies and modern analytics tools.

About

Building a modern data warehouse using Medallion Architecture with MySQL Workbench , including ETL Process, data modelling, and analytics.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages