Ad – 728Γ—90
πŸ› οΈ Projects

E-Commerce Schema – SQL Project with Joins & Aggregation

In this intermediate project you'll design and build a realistic e-commerce database β€” users, products, categories, orders, and order line items β€” then write real analytical queries: top-selling products, revenue by category, customers with multiple orders, average order value, and monthly sales trends. This schema is also used in the follow-up Analytics Queries project.

⏱️ 60 min 🎯 Intermediate πŸ“… Updated 2026

Project Overview

An e-commerce platform needs to track:

  • Users β€” registered customers with email and signup date.
  • Categories β€” product taxonomy (Electronics, Books, Clothing, etc.).
  • Products β€” items for sale with price, category, and stock.
  • Orders β€” a customer's checkout event with a total and status.
  • Order Items β€” the individual lines within an order (product, quantity, unit price at time of purchase).

Schema Design

TablePKKey ColumnsForeign Keys
categoriesidname, parent_id (self-referential)parent_id β†’ categories
usersidemail, name, created_atβ€”
productsidname, price, category_id, stockcategory_id β†’ categories
ordersiduser_id, status, total, created_atuser_id β†’ users
order_items(order_id, product_id)quantity, unit_priceorder_id β†’ orders, product_id β†’ products

Create Tables

SQL – DDL
CREATE TABLE categories (
  id        INT PRIMARY KEY,
  name      VARCHAR(100) NOT NULL,
  parent_id INT,
  FOREIGN KEY (parent_id) REFERENCES categories(id)
);

CREATE TABLE users (
  id         INT PRIMARY KEY,
  name       VARCHAR(100) NOT NULL,
  email      VARCHAR(150) UNIQUE NOT NULL,
  created_at TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE products (
  id          INT PRIMARY KEY,
  name        VARCHAR(200)  NOT NULL,
  category_id INT           NOT NULL,
  price       DECIMAL(10,2) NOT NULL,
  stock       INT           NOT NULL DEFAULT 0,
  created_at  TIMESTAMP     NOT NULL DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (category_id) REFERENCES categories(id),
  CHECK (price >= 0),
  CHECK (stock >= 0)
);

CREATE TABLE orders (
  id         INT PRIMARY KEY,
  user_id    INT           NOT NULL,
  status     VARCHAR(20)   NOT NULL DEFAULT 'pending',
  total      DECIMAL(10,2) NOT NULL DEFAULT 0,
  created_at TIMESTAMP     NOT NULL DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (user_id) REFERENCES users(id),
  CHECK (status IN ('pending','processing','shipped','delivered','cancelled'))
);

CREATE TABLE order_items (
  order_id   INT           NOT NULL,
  product_id INT           NOT NULL,
  quantity   INT           NOT NULL,
  unit_price DECIMAL(10,2) NOT NULL,
  PRIMARY KEY (order_id, product_id),
  FOREIGN KEY (order_id)   REFERENCES orders(id)   ON DELETE CASCADE,
  FOREIGN KEY (product_id) REFERENCES products(id) ON DELETE RESTRICT,
  CHECK (quantity > 0)
);

-- Index FK columns for JOIN performance
CREATE INDEX idx_products_category ON products(category_id);
CREATE INDEX idx_orders_user       ON orders(user_id);
CREATE INDEX idx_orders_created    ON orders(created_at);
CREATE INDEX idx_items_product     ON order_items(product_id);

Sample Data

SQL – INSERT
INSERT INTO categories (id, name, parent_id) VALUES
  (1, 'Electronics', NULL),
  (2, 'Laptops',     1),
  (3, 'Phones',      1),
  (4, 'Books',       NULL),
  (5, 'Clothing',    NULL);

INSERT INTO users (id, name, email, created_at) VALUES
  (1, 'Alice',   'alice@example.com',  '2023-01-10 09:00:00'),
  (2, 'Bob',     'bob@example.com',    '2023-03-22 14:30:00'),
  (3, 'Carol',   'carol@example.com',  '2023-06-15 11:00:00'),
  (4, 'Dave',    'dave@example.com',   '2023-09-01 08:00:00'),
  (5, 'Eve',     'eve@example.com',    '2024-01-05 16:00:00');

INSERT INTO products (id, name, category_id, price, stock) VALUES
  (1,  'MacBook Pro 16"',    2, 2499.00, 15),
  (2,  'Dell XPS 15',        2, 1899.00, 20),
  (3,  'iPhone 15 Pro',      3,  999.00, 50),
  (4,  'Samsung Galaxy S24', 3,  849.00, 45),
  (5,  'Clean Code (book)',  4,   35.00, 100),
  (6,  'Dune (book)',        4,   18.00, 80),
  (7,  'T-Shirt (Medium)',   5,   25.00, 200),
  (8,  'Jeans (32x32)',      5,   65.00, 150),
  (9,  'iPad Pro 12.9"',     1,  1099.00, 30),
  (10, 'Wireless Keyboard',  1,   129.00, 75);

INSERT INTO orders (id, user_id, status, total, created_at) VALUES
  (1, 1, 'delivered',  2534.00, '2024-01-15 10:00:00'),
  (2, 1, 'delivered',  1899.00, '2024-02-20 11:00:00'),
  (3, 2, 'delivered',  1048.00, '2024-01-25 09:30:00'),
  (4, 2, 'shipped',     107.00, '2024-03-05 14:00:00'),
  (5, 3, 'delivered',  2499.00, '2024-02-10 08:00:00'),
  (6, 3, 'processing',  194.00, '2024-03-18 15:00:00'),
  (7, 4, 'delivered',   848.00, '2024-03-01 12:00:00'),
  (8, 5, 'pending',    1228.00, '2024-03-20 16:30:00');

INSERT INTO order_items (order_id, product_id, quantity, unit_price) VALUES
  (1,  1, 1, 2499.00), (1, 10, 1,  129.00),  -- Alice: MacBook + Keyboard
  (2,  2, 1, 1899.00),                        -- Alice: Dell XPS
  (3,  3, 1,  999.00), (3,  5, 1,   35.00), (3, 7, 1, 14.00),
  (4,  6, 2,   18.00), (4,  7, 2,   25.00), (4, 10, 1, 21.00),
  (5,  1, 1, 2499.00),                        -- Carol: MacBook
  (6,  5, 2,   35.00), (6,  6, 2,   18.00), (6, 8, 1, 65.00), (6, 7, 1, 25.00),
  (7,  4, 1,  849.00),
  (8,  9, 1, 1099.00), (8,  5, 2,   35.00), (8, 7, 2, 25.00), (8, 6, 1, 9.00);
Ad – 336Γ—280

JOIN Queries

Full order details with line items

SQL
SELECT
  o.id                           AS order_id,
  u.name                         AS customer,
  p.name                         AS product,
  c.name                         AS category,
  oi.quantity,
  oi.unit_price,
  oi.quantity * oi.unit_price    AS line_total,
  o.status,
  o.created_at
FROM orders o
JOIN users       u  ON u.id  = o.user_id
JOIN order_items oi ON oi.order_id = o.id
JOIN products    p  ON p.id  = oi.product_id
JOIN categories  c  ON c.id  = p.category_id
ORDER BY o.id, p.name;

Customers with more than one order

SQL
SELECT
  u.name,
  u.email,
  COUNT(o.id)      AS order_count,
  SUM(o.total)     AS lifetime_value
FROM users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name, u.email
HAVING COUNT(o.id) > 1
ORDER BY lifetime_value DESC;

Analytics Queries

Top-selling products by revenue

SQL
SELECT
  p.name                                         AS product,
  c.name                                         AS category,
  SUM(oi.quantity)                               AS units_sold,
  SUM(oi.quantity * oi.unit_price)               AS total_revenue,
  RANK() OVER (ORDER BY SUM(oi.quantity * oi.unit_price) DESC) AS revenue_rank
FROM order_items oi
JOIN products   p ON p.id = oi.product_id
JOIN categories c ON c.id = p.category_id
JOIN orders     o ON o.id = oi.order_id
WHERE o.status != 'cancelled'
GROUP BY p.id, p.name, c.name
ORDER BY total_revenue DESC;

Revenue by category

SQL
SELECT
  c.name                                   AS category,
  COUNT(DISTINCT o.id)                     AS order_count,
  SUM(oi.quantity * oi.unit_price)         AS revenue,
  ROUND(
    100.0 * SUM(oi.quantity * oi.unit_price) /
    SUM(SUM(oi.quantity * oi.unit_price)) OVER (), 2
  ) AS pct_of_total
FROM order_items oi
JOIN products   p ON p.id = oi.product_id
JOIN categories c ON c.id = p.category_id
JOIN orders     o ON o.id = oi.order_id
WHERE o.status != 'cancelled'
GROUP BY c.id, c.name
ORDER BY revenue DESC;

Average order value and order frequency per customer

SQL
SELECT
  u.name,
  COUNT(o.id)                AS total_orders,
  ROUND(AVG(o.total), 2)    AS avg_order_value,
  ROUND(SUM(o.total), 2)    AS total_spent,
  MIN(o.created_at::DATE)   AS first_order,
  MAX(o.created_at::DATE)   AS last_order
FROM users u
LEFT JOIN orders o ON o.user_id = u.id AND o.status != 'cancelled'
GROUP BY u.id, u.name
ORDER BY total_spent DESC NULLS LAST;

Monthly sales trend

SQL – PostgreSQL
SELECT
  DATE_TRUNC('month', o.created_at)   AS month,
  COUNT(DISTINCT o.id)                AS orders,
  COUNT(DISTINCT o.user_id)           AS unique_customers,
  SUM(o.total)                        AS revenue,
  ROUND(AVG(o.total), 2)             AS avg_order_value
FROM orders o
WHERE o.status != 'cancelled'
GROUP BY 1
ORDER BY 1;

Summary

πŸ“‹ Summary

  • Designed a 5-table normalized e-commerce schema: categories, users, products, orders, order_items.
  • Used composite PK on order_items, CHECK constraints on status/price/stock, ON DELETE CASCADE for items.
  • Indexed all FK columns and the orders.created_at column for time-series queries.
  • Wrote 4-table JOINs to produce full order detail reports.
  • Analytical queries: top-selling products with window RANK, revenue by category with percentage share, monthly trend with DATE_TRUNC.
  • This dataset is used as-is in the next project: Analytics Queries.

FAQ

Why store unit_price in order_items instead of using products.price? +

Product prices change over time. If you look up the current price from the products table when querying historical orders, the order total will appear wrong after any price update. Storing unit_price in order_items captures the price at the moment of purchase β€” this is a standard e-commerce schema pattern. The products table price is used only for the current listing price; the order_items price is the historical record.

How does ON DELETE CASCADE on order_items work? +

If you delete a row from orders, all related rows in order_items are automatically deleted. This is appropriate here because order items have no meaning without the parent order. The opposite direction (products) uses ON DELETE RESTRICT β€” you cannot delete a product that has been ordered, preserving the historical record. Always think carefully about which direction CASCADE is appropriate.

What is the purpose of the parent_id column in categories? +

parent_id creates a self-referential hierarchy in the categories table. "Laptops" (parent_id = 1) and "Phones" (parent_id = 1) are subcategories of "Electronics" (parent_id = NULL). This allows multiple levels of categorization without creating separate tables for each level. To query the full category hierarchy, use a recursive CTE β€” the same pattern as the employee hierarchy in the Exercises page.

How should I handle currency in a real system? +

For a production system: (1) Store all amounts as integers in the smallest currency unit (cents/pence) to avoid floating-point rounding errors β€” $9.99 stored as 999. (2) Add a currency column or a global currency setting. (3) DECIMAL(10,2) is acceptable for learning projects and small-scale apps, but be aware of rounding at scale. Never use FLOAT for money in any production system β€” floating-point precision issues will eventually cause calculation errors.