Why Small Tables Can Explode: Understanding JOIN Cardinality in SQL

By - Geek Logbook
Posted on 2026-02-20
Posted in Programming

Why Small Tables Can Explode: Understanding JOIN Cardinality in SQL

It is common to assume that joining two small tables will produce a small result set. In practice, this assumption frequently fails. Even tables with only a handful of rows can generate unexpectedly large outputs due to cardinality effects.

This article explains why that happens, using a minimal reproducible example.

The Setup

Consider two tables:

Table A

A
A
B
C

Table B

A
A
C
B
NULL

Both tables are small. At first glance, one might expect a join to return approximately four rows. That intuition is incorrect.

SELECT *
FROM A
INNER JOIN B
  ON A.col = B.col;

What Happens?

A in Table A appears twice
A in Table B appears twice
SQL performs a pairwise match

Therefore:

Each A in A matches each A in B
Result: 2 × 2 = 4 rows for A alone

Additionally:

B matches once
C matches once
NULL does not match anything (because NULL = NULL is not true in SQL)

Final Row Count

Value	Matches
A	4
B	1
C	1
Total	6 rows

Two small tables. Six rows in the result. This is not an anomaly — it is expected relational algebra behavior.

Why This Happens

SQL JOINs are based on set theory and relational algebra, not row-by-row pairing.

When keys are not unique:

The join becomes a many-to-many relationship
SQL produces a partial Cartesian product

The formula is:

Rows produced per key = count_in_A × count_in_B

If a value appears:

3 times in A
5 times in B

You get: 3 × 5 = 15 rows

Even if both tables are small.

NULL Behavior

In SQL:

NULL = NULL

evaluates to UNKNOWN, not TRUE.

Therefore:

NULL values never match in a standard equality join
They silently disappear in INNER JOINs

To match NULLs intentionally:

ON A.col IS NOT DISTINCT FROM B.col

(Supported in PostgreSQL and some engines.)

Diagnosing Cardinality Problems

Before joining, always inspect key uniqueness.

Check duplicates

SELECT col, COUNT(*)
FROM B
GROUP BY col
HAVING COUNT(*) > 1;

Determine relationship type

One-to-one
One-to-many
Many-to-many

Only one-to-one joins preserve row counts.

Common Fixes

1. Deduplicate Before Joining

WITH b_dedup AS (
  SELECT DISTINCT col
  FROM B
)
SELECT *
FROM A
LEFT JOIN b_dedup
  ON A.col = b_dedup.col;

Use when duplicates are irrelevant.

2. Select a Canonical Record

When duplicates are meaningful but you need one row:

WITH ranked AS (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY col ORDER BY updated_at DESC) AS rn
  FROM B
)
SELECT *
FROM A
LEFT JOIN ranked r
  ON A.col = r.col
 AND r.rn = 1;

This enforces deterministic selection.

3. Redesign the Data Model

If a many-to-many relationship exists, model it explicitly with:

Junction tables
Aggregations
Precomputed summaries

Do not rely on accidental uniqueness.

Conceptual Takeaway

A JOIN does not “attach rows.”
It combines sets according to relational algebra rules.

Small input does not guarantee small output.
Cardinality drives row growth.

Understanding join multiplicity is fundamental in:

Data warehousing
Analytics engineering
OLTP systems
Dimensional modeling
Query optimization

Tags:SQL

Geek Logbook

Recent Posts

Categories

Archives

Why Small Tables Can Explode: Understanding JOIN Cardinality in SQL

The Setup

Table A

Table B

What Happens?

Final Row Count

Why This Happens

NULL Behavior

Diagnosing Cardinality Problems

Check duplicates

Determine relationship type

Common Fixes

1. Deduplicate Before Joining

2. Select a Canonical Record

3. Redesign the Data Model

Conceptual Takeaway

Previous Article

Next Article