Geek Logbook

Tech sea log book

Why Small Tables Can Explode: Understanding JOIN Cardinality in SQL

It is common to assume that joining two small tables will produce a small result set. In practice, this assumption frequently fails. Even tables with only a handful of rows can generate unexpectedly large outputs due to cardinality effects.

This article explains why that happens, using a minimal reproducible example.


The Setup

Consider two tables:

Table A

A
A
B
C

Table B





A
A
C
B
NULL

Both tables are small. At first glance, one might expect a join to return approximately four rows. That intuition is incorrect.

SELECT *
FROM A
INNER JOIN B
  ON A.col = B.col;

What Happens?

  • A in Table A appears twice
  • A in Table B appears twice
  • SQL performs a pairwise match

Therefore:

  • Each A in A matches each A in B
  • Result: 2 × 2 = 4 rows for A alone

Additionally:

  • B matches once
  • C matches once
  • NULL does not match anything (because NULL = NULL is not true in SQL)

Final Row Count

ValueMatches
A4
B1
C1
Total6 rows

Two small tables. Six rows in the result. This is not an anomaly — it is expected relational algebra behavior.


Why This Happens

SQL JOINs are based on set theory and relational algebra, not row-by-row pairing.

When keys are not unique:

  • The join becomes a many-to-many relationship
  • SQL produces a partial Cartesian product

The formula is:

Rows produced per key = count_in_A × count_in_B

If a value appears:

  • 3 times in A
  • 5 times in B

You get: 3 × 5 = 15 rows

Even if both tables are small.


NULL Behavior

In SQL:

NULL = NULL

evaluates to UNKNOWN, not TRUE.

Therefore:

  • NULL values never match in a standard equality join
  • They silently disappear in INNER JOINs

To match NULLs intentionally:

ON A.col IS NOT DISTINCT FROM B.col

(Supported in PostgreSQL and some engines.)


Diagnosing Cardinality Problems

Before joining, always inspect key uniqueness.

Check duplicates

SELECT col, COUNT(*)
FROM B
GROUP BY col
HAVING COUNT(*) > 1;

Determine relationship type

  • One-to-one
  • One-to-many
  • Many-to-many

Only one-to-one joins preserve row counts.


Common Fixes

1. Deduplicate Before Joining

WITH b_dedup AS (
SELECT DISTINCT col
FROM B
)
SELECT *
FROM A
LEFT JOIN b_dedup
ON A.col = b_dedup.col;

Use when duplicates are irrelevant.


2. Select a Canonical Record

When duplicates are meaningful but you need one row:

WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY col ORDER BY updated_at DESC) AS rn
FROM B
)
SELECT *
FROM A
LEFT JOIN ranked r
ON A.col = r.col
AND r.rn = 1;

This enforces deterministic selection.


3. Redesign the Data Model

If a many-to-many relationship exists, model it explicitly with:

  • Junction tables
  • Aggregations
  • Precomputed summaries

Do not rely on accidental uniqueness.


Conceptual Takeaway

A JOIN does not “attach rows.”
It combines sets according to relational algebra rules.

Small input does not guarantee small output.
Cardinality drives row growth.

Understanding join multiplicity is fundamental in:

  • Data warehousing
  • Analytics engineering
  • OLTP systems
  • Dimensional modeling
  • Query optimization
Tags: