Problem scenario
You are trying to select a database that is right for your needs. You will not be doing many transactions. The database does not need to support regular ETL or OLTP. You are going to have a large database, and you want it to compress to the smallest size possible. In some cases a column-oriented database better than a row-oriented database. What type of database should you use for this situation?
Solution
A columnar database. Here are the three reasons:
- A single disk-read to retrieve a row is achievable with smaller tables. With very large amounts of data however, some tables in row-oriented databases need to be partitioned. Columnar databases scale to large sizes more easily *.
- Columns have uniform data types; this is optimal for compression. Column-oriented databases leverage run-length encoding **. Run-length encoding is like deduplication conceptually. Run-length encoding will store a sequence (or series) of data, and such sequences can be stored by their count.
- As there are no ETL jobs (per the problem scenario), a columnar database is superior. ETL jobs involve adding rows. Adding rows is inefficient for column-oriented databases because you have to write to many different columns. OLTP activity is better supported by row-oriented databases ***. If you will not be doing many ETL jobs or online transactional processing, column-oriented databases have benefits that row-oriented databases do not have.
* https://www.predictiveanalyticstoday.com/top-wide-columnar-store-databases/
*** https://www.omnisci.com/technical-glossary/columnar-database
FFR
- To learn more about run-length encoding, see https://stackabuse.com/run-length-encoding/
- Low cardinality in columns (meaning that values were often repeated) would increase the propensity to choose a columnar database because of run-length encoding.
- ClickHouse is a column-oriented database, and it is very fast for many use cases (taken from clickhouse.com) . The company that maintains it recently (in 2021) obtained a large amount of funding (taken from this external website). As of January of 2022, Alibaba, Bloomberg, Microsoft, and Tesla have adopted ClickHouse (according to this external page).