Problem scenario
There are sources that recommend database normalization as a best practice for relational databases. Is it always a best practice to normalize databases?
Answer
There is no clear answer as the sources vary.
A heavily voted-up answer on StackOverflow says that denormalization for OLAP performance is something to be avoided. We find recognition of this "best practice" on a venerated engineering blog:
Keeping data normalized is considered a best practice in MySQL. As we mentioned before, however, when using a NoSQL database like Cassandra, denormalizing data often improves query performance.
https://engineeringblog.yelp.com/2016/08/how-we-scaled-our-ad-analytics-with-cassandra.html
The above sources have legitimacy, but authoritative sources dispute that normalization is a best practice. No one should dispute that there are proponents and opponents of normalization of SQL/relational databases as a best practice.
The opponents of normalization as a best practice say that you need to have denormalized data for OLAP databases (for reporting to reduce the expensive operation of a JOIN in SQL, and to avoid aggregate queries which can be time-intensive) and for datawarehouses. Datawarehouses are usually not updated that frequently; it is common for batch jobs to happen monthly with datawarehouses. The work to normalize such a database may be economically prohibitive, and the benefits may be negligible or have a negative value.
Kimball cubes are widely used in the I.T. industry; they have changed the database and BI verticals. The man behind the namesake of Kimball cubes says this on his company's website:
In general, dimensional designers must resist the normalization urges caused by years of operational database designs and instead denormalize the many-to-one fixed depth hierarchies into separate attributes on a flattened dimension row.
https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/denormalized-flattened-dimension/
NetFlix and Meta (formerly known as Facebook) both use denormalization. A Microsoft MVP says that a datawarehouse best practice is to denormalize data.
In a dataware house the best practice is to denormalize the data to increase the performanze [sic] of ad-hoc reporting; with the prize of a higher data volume.
https://social.technet.microsoft.com/Forums/en-US/8c54d1d2-5fbd-4b62-a07f-16c34a863668/is-denormalization-best-practices?forum=transactsql
Other examples suggesting denormalization is beneficial (and indicated a "best practice") are the following:
- https://stackoverflow.com/a/4404043
- https://www.sqlteam.com/articles/denormalize-for-performance
- https://www.linkedin.com/pulse/best-practices-snowflake-implementation-mainak-sarkar
If you want to read about the controversy of "best practices" in general, we recommend this posting.