Why Small Tables Can Explode: Understanding JOIN Cardinality in SQL
It is common to assume that joining two small tables will produce a small result set. In practice, this assumption frequently fails. Even tables with only a handful of rows can generate unexpectedly large outputs due to cardinality effects.
This article explains why that happens, using a minimal reproducible example.
The Setup
Consider two tables:
Table A
A
A
B
C
Table B
A
A
C
B
NULL
Both tables are small. At first glance, one might expect a join to return approximately four rows. That intuition is incorrect.
SELECT *
FROM A
INNER JOIN B
ON A.col = B.col;
What Happens?
Ain Table A appears twiceAin Table B appears twice- SQL performs a pairwise match
Therefore:
- Each
Ain A matches eachAin B - Result: 2 × 2 = 4 rows for A alone
Additionally:
Bmatches onceCmatches onceNULLdoes not match anything (becauseNULL = NULLis not true in SQL)
Final Row Count
| Value | Matches |
|---|---|
| A | 4 |
| B | 1 |
| C | 1 |
| Total | 6 rows |
Two small tables. Six rows in the result. This is not an anomaly — it is expected relational algebra behavior.
Why This Happens
SQL JOINs are based on set theory and relational algebra, not row-by-row pairing.
When keys are not unique:
- The join becomes a many-to-many relationship
- SQL produces a partial Cartesian product
The formula is:
Rows produced per key = count_in_A × count_in_B
If a value appears:
- 3 times in A
- 5 times in B
You get: 3 × 5 = 15 rows
Even if both tables are small.
NULL Behavior
In SQL:
NULL = NULL
evaluates to UNKNOWN, not TRUE.
Therefore:
NULLvalues never match in a standard equality join- They silently disappear in INNER JOINs
To match NULLs intentionally:
ON A.col IS NOT DISTINCT FROM B.col
(Supported in PostgreSQL and some engines.)
Diagnosing Cardinality Problems
Before joining, always inspect key uniqueness.
Check duplicates
SELECT col, COUNT(*)
FROM B
GROUP BY col
HAVING COUNT(*) > 1;
Determine relationship type
- One-to-one
- One-to-many
- Many-to-many
Only one-to-one joins preserve row counts.
Common Fixes
1. Deduplicate Before Joining
WITH b_dedup AS (
SELECT DISTINCT col
FROM B
)
SELECT *
FROM A
LEFT JOIN b_dedup
ON A.col = b_dedup.col;
Use when duplicates are irrelevant.
2. Select a Canonical Record
When duplicates are meaningful but you need one row:
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY col ORDER BY updated_at DESC) AS rn
FROM B
)
SELECT *
FROM A
LEFT JOIN ranked r
ON A.col = r.col
AND r.rn = 1;
This enforces deterministic selection.
3. Redesign the Data Model
If a many-to-many relationship exists, model it explicitly with:
- Junction tables
- Aggregations
- Precomputed summaries
Do not rely on accidental uniqueness.
Conceptual Takeaway
A JOIN does not “attach rows.”
It combines sets according to relational algebra rules.
Small input does not guarantee small output.
Cardinality drives row growth.
Understanding join multiplicity is fundamental in:
- Data warehousing
- Analytics engineering
- OLTP systems
- Dimensional modeling
- Query optimization