Components of a Star Schema
-
Fact Table:
- The central table in the star schema.
- Contains quantitative data (measures or metrics) such as sales, revenue, or quantity.
- Each row in the fact table represents a specific event or transaction.
- Connected to dimension tables via foreign keys.
- A fact table for a retail store might store sales transactions:
Fact_Sales:Transaction_ID,Date_ID,Product_ID,Customer_ID,Store_ID,Quantity_Sold,Total_Amount.
-
Dimension Tables:
- Surround the fact table like the points of a star.
- Contain descriptive attributes (context or metadata) related to the facts.
- Used to filter, group, or label the data in the fact table.
Dim_Date:Date_ID,Date,Month,Quarter,Year,Day_of_Week.Dim_Product:Product_ID,Product_Name,Category,Brand,Price.Dim_Customer:Customer_ID,Customer_Name,City,State,Phone_Number.Dim_Store:Store_ID,Store_Name,City,Manager_Name.
Example: Star Schema for a Retail Store
Star Schema
Fact Table
Fact_Sales
| Transaction_ID | Date_ID | Product_ID | Customer_ID | Store_ID | Quantity_Sold | Total_Amount |
|---|---|---|---|---|---|---|
| 101 | 20250101 | 1 | 1001 | 501 | 2 | 5000 |
| 102 | 20250102 | 2 | 1002 | 502 | 1 | 1500 |
| 103 | 20250103 | 3 | 1003 | 503 | 5 | 1000 |
Dimension Tables
-
Dim_Date:Date_ID Date Month Quarter Year Day_of_Week 20250101 2025-01-01 January Q1 2025 Wednesday 20250102 2025-01-02 January Q1 2025 Thursday 20250103 2025-01-03 January Q1 2025 Friday -
Dim_Product:Product_ID Product_Name Category Brand Price 1 Saree Sarees Nalli 2500 2 Kurta-Pajama Ethnic Wear Fashion India 1500 3 Turmeric Powder Spices ABC 200 -
Dim_Customer:Customer_ID Customer_Name City State Phone_Number 1001 Priya Sharma Mumbai Maharashtra 9876543210 1002 Rajesh Patel Ahmedabad Gujarat 8765432109 1003 Anjali Singh Delhi Delhi 7654321098 -
Dim_Store:Store_ID Store_Name City Manager_Name 501 Nalli Silks Chennai Ramesh Kumar 502 Fashion India Bengaluru Sunita Reddy 503 ABC Spices Delhi Amit Sharma
How the Star Schema Works
-
Querying Data:
- Suppose you want to find the total sales of sarees in Mumbai for January 2025.
- The query would join the
Fact_Salestable with theDim_Date,Dim_Product, andDim_Customertables using their respective keys. - Example SQL Query:
-
Benefits:
- Simplified Queries: The star schema makes it easy to write and understand queries.
- Improved Performance: Optimized for read-heavy analytical workloads.
- Scalability: Can handle large datasets efficiently.
Advantages of Star Schema
- Query Performance: Fewer joins are required compared to normalized schemas, leading to faster queries.
- Simplicity: Easy to design and understand, even for non-technical users.
- Flexibility: New dimensions can be added without disrupting existing queries.
- Business-Friendly: Aligns well with how business users think about data (e.g., sales, customers, products).
Disadvantages of Star Schema
- Data Redundancy: Dimension tables may contain redundant data (e.g., repeating city names in
Dim_Customer). - Limited Flexibility for Complex Relationships: Not suitable for scenarios requiring many-to-many relationships between dimensions.
- Storage Overhead: Large dimension tables can consume significant storage space.