Is It a Best Practice to Normalize a SQL Database?

Problem scenario
There are sources that recommend database normalization as a best practice for relational databases. Is it always a best practice to normalize databases?

There is no clear answer as the sources vary.

A heavily voted-up answer on StackOverflow says that denormalization for OLAP performance is something to be avoided. We find recognition of this "best practice" on a venerated engineering blog:

Keeping data normalized is considered a best practice in MySQL. As we mentioned before, however, when using a NoSQL database like Cassandra, denormalizing data often improves query performance.

The above sources have legitimacy, but authoritative sources dispute that normalization is a best practice. No one should dispute that there are proponents and opponents of normalization of SQL/relational databases as a best practice.

The opponents of normalization as a best practice say that you need to have denormalized data for OLAP databases (for reporting to reduce the expensive operation of a JOIN in SQL, and to avoid aggregate queries which can be time-intensive) and for datawarehouses. Datawarehouses are usually not updated that frequently; it is common for batch jobs to happen monthly with datawarehouses. The work to normalize such a database may be economically prohibitive, and the benefits may be negligible or have a negative value.

Kimball cubes are widely used in the I.T. industry; they have changed the database and BI verticals. The man behind the namesake of Kimball cubes says this on his company's website:

In general, dimensional designers must resist the normalization urges caused by years of operational database designs and instead denormalize the many-to-one fixed depth hierarchies into separate attributes on a flattened dimension row.

NetFlix and Meta (formerly known as Facebook) both use denormalization. A Microsoft MVP says that a datawarehouse best practice is to denormalize data.

In a dataware house the best practice is to denormalize the data to increase the performanze [sic] of ad-hoc reporting; with the prize of a higher data volume.

Other examples suggesting denormalization is beneficial (and indicated a "best practice") are the following:

If you want to read about the controversy of "best practices" in general, we recommend this posting.

Leave a comment

Your email address will not be published.