
Database normalization is the process of structuring tables to reduce data redundancy and prevent data anomalies. Without it, a single schema mistake can make your data inconsistent in ways that are difficult to detect and expensive to fix.
This guide walks through the three most commonly used normal forms - 1NF, 2NF, and 3NF - with concrete before-and-after table examples. If you are building your database fundamentals, pair this with Relational Databases Explained: Tables, Rows, and Keys, Primary Key vs Foreign Key Explained with Examples, and What Are Database Indexes? How They Improve Performance. For the full roadmap, use the Database Fundamentals series as the pillar page and the Database tag as the category page.
Table of Contents
Open Table of Contents
- What Is Database Normalization?
- Why Normalization Matters: Data Anomalies
- First Normal Form (1NF): Atomic Values
- Second Normal Form (2NF): No Partial Dependencies
- Third Normal Form (3NF): No Transitive Dependencies
- BCNF: One Step Further
- When to Denormalize
- Normalization at a Glance
- Interview Questions
- Conclusion
- References
- YouTube Videos
What Is Database Normalization?
Database normalization is a systematic approach to organizing columns and tables in a relational database. The goal is to eliminate redundant data (storing the same fact more than once) and ensure data dependencies are logical (only storing related data in a table).
The process is built on a series of rules called normal forms. Each normal form is cumulative - to be in 2NF, a table must already satisfy 1NF. To be in 3NF, it must already satisfy 2NF.
flowchart TD
A[Unnormalized Table] --> B[1NF\nAtomic values + unique rows]
B --> C[2NF\nNo partial dependencies]
C --> D[3NF\nNo transitive dependencies]
D --> E[BCNF\nEvery determinant is a candidate key]
In most production systems, reaching 3NF is the practical target.
Why Normalization Matters: Data Anomalies
Without normalization, poorly structured tables cause three types of anomalies.
Insert Anomaly
You cannot add data without also adding unrelated data.
| order_id | customer_name | customer_city | product_name | price |
|---|---|---|---|---|
| 101 | Alice | New York | Keyboard | 49.99 |
| 102 | Bob | Chicago | Monitor | 299.99 |
If you want to add a new customer before they place an order, you have no row to put them in because order_id is required.
Update Anomaly
Changing one fact requires updating multiple rows.
If Alice moves from New York to Austin, you must update every row where customer_name = 'Alice'. Missing even one row makes the data inconsistent.
Delete Anomaly
Deleting one fact accidentally removes another.
If you delete order 101 (the only order from Alice), you lose Alice’s customer record entirely.
All three anomalies share one root cause: unrelated facts are stored in the same table.
First Normal Form (1NF): Atomic Values
A table is in First Normal Form when:
- Every column holds atomic (indivisible) values - no lists, sets, or comma-separated strings in a cell.
- Every row is unique (there is a primary key).
Violation Example
| order_id | customer | products |
|---|---|---|
| 101 | Alice | Keyboard, Mouse |
| 102 | Bob | Monitor |
The products column holds multiple values in one cell. This violates 1NF.
1NF Fix: One Value Per Cell
Split the multi-value column into separate rows.
| order_id | customer | product |
|---|---|---|
| 101 | Alice | Keyboard |
| 101 | Alice | Mouse |
| 102 | Bob | Monitor |
Now every cell holds one atomic value and you can use (order_id, product) as a composite primary key. The table is in 1NF.
SQL for a 1NF Table
CREATE TABLE order_items (
order_id INT,
customer VARCHAR(100),
product VARCHAR(100),
PRIMARY KEY (order_id, product)
);
Second Normal Form (2NF): No Partial Dependencies
A table is in Second Normal Form when:
- It is already in 1NF.
- Every non-key column depends on the entire primary key, not just part of it.
Partial dependency only applies to tables with a composite primary key.
Violation Example
| order_id | product_id | product_name | quantity |
|---|---|---|---|
| 101 | P01 | Keyboard | 2 |
| 101 | P02 | Mouse | 1 |
| 102 | P01 | Keyboard | 3 |
The composite primary key is (order_id, product_id).
quantitydepends on bothorder_idandproduct_id- correct.product_namedepends only onproduct_id- a partial dependency. This violates 2NF.
If the keyboard’s name changes, you must update every row with product_id = P01.
2NF Fix: Move Partial Dependencies to Their Own Table
order_items (fact about each order line):
| order_id | product_id | quantity |
|---|---|---|
| 101 | P01 | 2 |
| 101 | P02 | 1 |
| 102 | P01 | 3 |
products (fact about each product):
| product_id | product_name |
|---|---|
| P01 | Keyboard |
| P02 | Mouse |
Now product_name lives in one place. Updating it updates it everywhere.
SQL for 2NF Tables
CREATE TABLE products (
product_id CHAR(3) PRIMARY KEY,
product_name VARCHAR(100) NOT NULL
);
CREATE TABLE order_items (
order_id INT,
product_id CHAR(3) REFERENCES products(product_id),
quantity INT NOT NULL,
PRIMARY KEY (order_id, product_id)
);
Third Normal Form (3NF): No Transitive Dependencies
A table is in Third Normal Form when:
- It is already in 2NF.
- Every non-key column depends only on the primary key, not on another non-key column.
A dependency of the form primary_key → column_A → column_B is called a transitive dependency. It violates 3NF because column_B depends on column_A, not directly on the primary key.
Violation Example
| employee_id | employee_name | department_id | department_name |
|---|---|---|---|
| E01 | Alice | D10 | Engineering |
| E02 | Bob | D20 | Marketing |
| E03 | Carol | D10 | Engineering |
The primary key is employee_id.
department_iddepends onemployee_id- correct.department_namedepends ondepartment_id, not onemployee_iddirectly - a transitive dependency. This violates 3NF.
If Engineering is renamed to Platform Engineering, every employee row with department_id = D10 must be updated.
3NF Fix: Move Transitive Dependencies to Their Own Table
employees (fact about each employee):
| employee_id | employee_name | department_id |
|---|---|---|
| E01 | Alice | D10 |
| E02 | Bob | D20 |
| E03 | Carol | D10 |
departments (fact about each department):
| department_id | department_name |
|---|---|
| D10 | Engineering |
| D20 | Marketing |
Now renaming a department requires updating exactly one row.
SQL for 3NF Tables
CREATE TABLE departments (
department_id CHAR(3) PRIMARY KEY,
department_name VARCHAR(100) NOT NULL
);
CREATE TABLE employees (
employee_id CHAR(3) PRIMARY KEY,
employee_name VARCHAR(100) NOT NULL,
department_id CHAR(3) REFERENCES departments(department_id)
);
BCNF: One Step Further
Boyce-Codd Normal Form (BCNF) is a stricter version of 3NF. A table is in BCNF when every determinant (every column that determines another column’s value) is a candidate key.
Most tables in 3NF are already in BCNF. The edge cases that separate the two are uncommon in practice (they arise from overlapping composite candidate keys). For most application databases, reaching 3NF is sufficient.
When to Denormalize
Normalization is a design best practice, not an absolute rule. There are situations where intentional denormalization makes sense:
Read-Heavy Analytical Workloads
Joining many normalized tables on every query can be expensive at scale. Data warehouses and reporting systems often denormalize into wide flat tables or use pre-aggregated summary tables to reduce join cost.
Caching and Derived Data
Storing a computed field (such as a running total) alongside the source data avoids recalculating it on every read. The tradeoff is that writes must keep the derived column in sync.
Performance Bottlenecks with Clear Measurement
If a profiled query is consistently slow because of a join and you have exhausted indexing options, denormalizing that specific path with careful documentation is a reasonable tradeoff. Do not denormalize speculatively.
The rule of thumb: normalize first, denormalize only where you have measured evidence that the join is the bottleneck.
Normalization at a Glance
| Normal Form | What It Eliminates | Key Requirement |
|---|---|---|
| 1NF | Repeating groups, multi-values | Atomic values; every row uniquely identified |
| 2NF | Partial dependencies | Non-key columns depend on the entire composite key |
| 3NF | Transitive dependencies | Non-key columns depend only on the primary key |
| BCNF | Remaining functional anomalies | Every determinant is a candidate key |
Interview Questions
1. What is database normalization and why is it important?
Normalization organizes a database to reduce redundancy and prevent update, insert, and delete anomalies. It produces schemas where each fact is stored once and changes propagate without inconsistency.
2. What is the difference between 1NF, 2NF, and 3NF?
1NF requires atomic values and a unique primary key. 2NF additionally requires that every non-key column depends on the full composite primary key (no partial dependencies). 3NF additionally requires that every non-key column depends directly on the primary key with no transitive dependencies through another non-key column.
3. What is a partial dependency?
A partial dependency occurs in a table with a composite primary key when a non-key column depends on only part of that key. It violates 2NF and is fixed by moving the partially dependent column to a separate table keyed by the relevant part of the composite key.
4. What is a transitive dependency?
A transitive dependency occurs when a non-key column depends on another non-key column rather than directly on the primary key. It violates 3NF and is fixed by extracting the transitively dependent columns into their own table.
5. When would you denormalize a database?
Denormalization is appropriate when profiling reveals that join cost is a real bottleneck, typically in read-heavy analytical systems or reporting queries. Always normalize first and denormalize only where you have measured evidence, not as a premature optimization.
6. What is the difference between 3NF and BCNF?
Both eliminate transitive dependencies, but BCNF is stricter: it requires every determinant to be a candidate key. Most 3NF tables are also in BCNF; the distinction matters only in rare schemas with overlapping composite candidate keys.
Conclusion
Database normalization is a foundational design skill. Starting with 1NF eliminates multi-valued cells. Moving to 2NF removes partial dependencies that cause redundant updates. Reaching 3NF eliminates transitive dependencies and produces a schema where each fact lives in exactly one place.
In practice, design to 3NF by default. Introduce denormalization deliberately and only where you have measurement data showing that the normalized design is a genuine bottleneck.
References
- Database Normalization: 1NF, 2NF, 3NF & BCNF Examples - DigitalOcean
- Database Normalization - Normal Forms 1NF 2NF 3NF Table Examples - freeCodeCamp
- Normalization in SQL (1NF-5NF): A Beginner’s Guide - DataCamp
YouTube Videos
-
“Database Normalization - 1NF, 2NF, 3NF, 4NF” - Decomplexify [https://www.youtube.com/watch?v=GFQaEYEc8_8]
-
“Database Normalization Explained | 1NF 2NF 3NF BCNF” - Socratica [https://www.youtube.com/watch?v=UrYLYV7WSHM]
-
“Learn Database Normalization - 1NF, 2NF, 3NF, 4NF, 5NF” - Database Star [https://www.youtube.com/watch?v=J-drts33N8g]