#JourneyToCTA Diaries Part 7 - Normalization vs De-Normalization

Shreyas Dhond
Mar 8
7 min read

In the previous blog, we discussed the design strategy for architecting a data model that aligns with business needs, leverages Salesforce-native capabilities, and remains scalable as the system grows.

A key part of this strategy is first aligning business entities conceptually and translating them into a logical blueprint. Only after the logical model is clearly defined should those entities be mapped to Salesforce constructs such as core data models for Sales and Service, supporting data models, and where necessary, custom objects. This approach ensures that the platform implementation reflects the business domain rather than forcing business processes to fit predefined technical structures.

In this blog, we will focus on an important aspect of logical data modelling: normalization versus denormalization. We will explore the role of the three normal forms in data model design, the benefits they provide in maintaining data integrity and reducing redundancy, and how these principles support different architectural use cases.

As part of this discussion, we will also examine the common relationship types available in Salesforce and how normalized and denormalized patterns are represented within the platform’s data model. Understanding how these concepts translate into Salesforce relationships is essential for designing solutions that balance data integrity, performance, and scalability.

Normalization

In simple terms, normalization is the process of organizing data efficiently based on its logical relationships. The primary goal of normalization is to reduce data redundancy within a data model. Redundant data not only consumes unnecessary storage but can also negatively impact system performance. It increases the amount of data that must be processed during create, read, update, and delete (CRUD) operations and can slow down queries as datasets grow.

Normalization is structured around a set of principles known as normal forms. For Salesforce architects, the three most relevant forms to understand in practice are First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF). Each form introduces rules that progressively improve data integrity and reduce duplication.

First Normal Form (1NF)

A database is considered to be in First Normal Form when all fields contain atomic values, meaning the data stored in each field cannot be logically divided into smaller meaningful components.

A common example is an address. If an address is stored as a single string such as “123 Main St, New York, NY, 321212, USA” the value is considered non-atomic, because it combines multiple elements into a single field. In contrast, if the address is broken down into separate components—street, city, state, postal code, and country—each field contains a single atomic value and therefore satisfies the 1NF requirement.

Customer ID	Address
1	902 Marquette Avenue, Minneapolis, MN, 123232, USA
2	123 Main St, New York, NY, 321212, USA
3	415 Mission Street, 3rd Floor, San Francisco, California 94105, United States

Another rule of 1NF is that tables should not contain repeating groups of similar data. Each column should represent a single attribute rather than multiple occurrences of the same type of information.

A simple Salesforce example can be seen with address fields on the Account object. The object contains both Billing Address and Shipping Address, which are essentially repeated structures representing the same type of information. Because of these repeating groups, the Account object would not strictly satisfy 1NF under traditional database normalization rules.

Account ID	Account Name	Billing Street	Billing City	Billing State	Billing Country	Shipping Street	Shipping City	Shipping State	Shipping Country
001A0000001XYZ	Acme Corp	123 Main St	New York	NY	USA	456 Market St	Boston	MA	USA
001A0000002XYZ	Global Telecom	78 King St	London	London	UK	78 King St	London	London	UK
001A0000003XYZ	Northern Energy	900 Bay St	Toronto	ON	Canada	25 Industrial Rd	Calgary	AB	Canada

To fully normalize this design, address information would be separated into a dedicated Address entity with its own unique identifier. The Account record would then reference the appropriate address records using lookup relationships. This approach eliminates duplication and allows a single address record to be reused across multiple entities when appropriate.

Account ID	Account Name	Billing Address ID	Shipping Address ID
001A0000001XYZ	Acme Corp	ADD001	AD002
001A0000002XYZ	Global Telecom	ADD003	ADD004
001A0000003XYZ	Northern Energy	ADD005	ADD006

Address ID	Address Type	Street	City	State	Country
ADD001	Billing	123 Main St	New York	NY	USA
ADD002	Shipping	456 Market St	Boston	MA	USA
ADD003	Billing	78 King St	London	London	UK
ADD004	Shipping	78 King St	London	London	UK
ADD005	Billing	900 Bay St	Toronto	ON	Canada
ADD006	Shipping	25 Industrial Rd	Calgary	AB	Canada

While strict normalization like this is not always implemented in Salesforce due to usability and performance considerations, understanding these principles helps architects make informed decisions when designing scalable data models.

Second Normal Form (2NF)

A database is considered to be in Second Normal Form (2NF) when it already satisfies the requirements of First Normal Form (1NF) and all non-key (non-prime) attributes are fully functionally dependent on the primary key.

In simpler terms, this means that every non-key field in a table must depend entirely on the table’s primary key and not on part of the key or on another attribute.

Continuing with the previous Account and Address example, the Account object includes columns such as Billing Address and Shipping Address. In this structure, the type of address (Billing or Shipping) is implicitly embedded in the column name itself. At the same time, the address entity may also capture the address type as a field in the Address table.

This creates a form of redundancy because the address type becomes indirectly tied to the Account record through the column definition, rather than being determined solely by the Address record itself.

To bring the design into Second Normal Form, the schema needs to be adjusted so that address type is stored only within the Address table and not embedded in the Account table structure. The Account table would simply reference address records, while the Address table would include an Address Type field (for example, Billing, Shipping, or Mailing).

Account ID	Account Name
001A0000001XYZ	Acme Corp
001A0000002XYZ	Global Telecom
001A0000003XYZ	Northern Energy

Address ID	Account ID	Address Type	Street	City	State	Country
ADD001	001A0000001XYZ	Billing	123 Main St	New York	NY	USA
ADD002	001A0000001XYZ	Shipping	456 Market St	Boston	MA	USA
ADD003	001A0000002XYZ	Billing	78 King St	London	London	UK
ADD004	001A0000002XYZ	Shipping	78 King St	London	London	UK
ADD005	001A0000003XYZ	Billing	900 Bay St	Toronto	ON	Canada
ADD006	001A0000003XYZ	Shipping	25 Industrial Rd	Calgary	AB	Canada

With this design, the address type becomes dependent only on the Address record, eliminating the partial dependency and ensuring the model satisfies 2NF.

Third Normal Form (3NF)

A database is considered in the third normal form if it is already 2NF and a table doesn't have transitively dependent attributes. For example in the above if the account table has attributes that are already captured in the address table it would be in violation of 3NF.

Account ID	Account Name	Billing State	Shipping State
001A0000001XYZ	Acme Corp	NY	MA
001A0000002XYZ	Global Telecom	London	London
001A0000003XYZ	Northern Energy	ON	AB

Denormalization

Denormalization can be considered the opposite of normalization. While normalization focuses on reducing redundancy and organizing data into well-structured relational tables, denormalization intentionally introduces controlled redundancy into the data model.

The primary goal of denormalization is to improve performance and simplify data access, particularly for read-heavy operations such as reporting, analytics, and complex queries. By storing related data together, the system can reduce the need for joins or multi-object queries, which can improve query performance and reduce processing overhead during data retrieval. The below table is a denormalized version of the above Account table easy to perform analytics but hard to keep updated if address changes occur in separate address table.

Account ID	Account Name	Billing Street	Billing City	Billing State	Billing Postal Code	Billing Country	Shipping Street	Shipping City	Shipping State	Shipping Postal Code	Shipping Country
001A0000001XYZ	Acme Corp	123 Main St	New York	NY	10001	USA	456 Market St	Boston	MA	02108	USA
001A0000002XYZ	Global Telecom	78 King St	London	London	EC1A 1BB	UK	78 King St	London	London	EC1A 1BB	UK
001A0000003XYZ	Northern Energy	900 Bay St	Toronto	ON	M5S 2B1	Canada	25 Industrial Rd	Calgary	AB	T2P 1J9	Canada

However, denormalization introduces trade-offs. Because the same data may exist in multiple places, there is an additional responsibility to keep redundant data consistent and synchronized. Updates to a record may require changes across several objects or fields to maintain data integrity. This adds complexity to write operations and may require automation, triggers, or integration logic to ensure the replicated data remains aligned.

In practice, denormalization is often used selectively. Architects balance the benefits of faster reads and simplified reporting against the operational overhead required to maintain consistency. When applied thoughtfully, denormalization can significantly improve system performance for high-volume transactional systems and analytics-driven use cases.

Normalization vs Denormalization

Normalization

Organizes data based on logical relationships to reduce redundancy.
Data is separated into multiple related tables/objects.
Improves data integrity and consistency by ensuring each piece of data exists in only one place.
Changes to data are easier to manage because updates occur in a single location.
Requires joins or relationships to retrieve related data.
Based on normal forms (1NF, 2NF, 3NF) that progressively remove redundancy and enforce functional dependencies.
Well suited for transactional systems where data accuracy and consistency are critical.

Denormalization

Intentionally introduces controlled redundancy into the data model.
Stores related data within the same table/object to simplify data retrieval.
Improves read performance and query efficiency, especially for reporting and analytics.
Reduces the need for complex joins across objects.
Requires additional logic or automation to keep redundant data synchronized.
Increases storage usage and update complexity.
Often used in reporting, analytics, and high-read performance scenarios.

Architectural Perspective (Salesforce Context)

Normalization helps maintain clean data models and integrity.
Denormalization helps improve performance and reporting efficiency.
Effective Salesforce architecture typically requires a balance of both approaches depending on the use case.