Many Ways to Optimize an OLTP Database Running PostgreSQL

Updated on 2/1/22
Tuning relational databases and related processes can be somewhat simple or very complex. OLTP databases have many processes governing the usage of resources. Optimizing these and keeping locks granular and efficient in accord with business requirements is key to having high performance and reliability.

Studying the business requirements carefully can allow you to redesign the database and the SQL queries to make things happen more efficiently. You could rewrite stored procedures to involve less computational complexity.

We cannot provide every possible way of enhancing the performance of your PostgreSQL database given the multitude of idiosyncrasies that could exist. We hope that some of these ideas give you some ideas of where to begin.

1. Find which query or operation is slow. It may be that your PostgreSQL OLTP database is performing acceptably for most operations, but one query is not working. If you want more assistance is determining which query is having a problem, see this external page.

2. Place important indexes of heavily used tables on solid state disks. You do this with manually configuring tablespaces to be mounted on such disks. We appreciate that cost constraints can keep you from using solid state disks exclusively. Indexes with rapidly changing statistics should be placed on the fastest disks available.

3. Use the autovacuum feature. Tables with many updates and deletes will change rapidly. Updated and deleted rows are called "dead rows" or "dead tuples." The autovacuum feature will automatically eliminate "dead tuples." To read more see this posting.

4. Avoid aggregate (min, max and count) SQL commands if possible.

5. Avoid SELECT *. Try to select only the columns you need.

6. Utilize indexes properly. Indexes on columns with highly varied data are more efficiently used than with indexes with columns that have unvaried data. Multi-column indexes can be created to ensure the index key tends to be unique. DML operations with such indices are the most efficient.

If you have SQL statements that will only operate on a portion of the rows, partial indexes can be ideal for overall performance. Partial indexes may be limited to SQL statements with a WHERE clause to filter only matching rows. These indexes take up less space and need to be updated less frequently compared to normal indexes.

In Postgres version 11 covering indexes can satisfy all the columns requested by predefined SELECT statements. You can obviate locks on a table and optimize performance by including only the columns necessary in a covering index. Indexes that do not have excessive complexity in their binary trees can be more useful than those with excessive complexity based on how the rows are looked up. Properly designed covering indexes include leaf columns that have the desired trees for optimal leverage of the index's binary tree. To learn more about covering indexes, see this posting.

If the corresponding metadata of a table is not updated when a table is updated, the query planner will rely on stale metadata. Performance of a PostgreSQL database with outdated statistics will perform poorly. In some cases indexes that are not updated regularly enough defeat their purpose of improving performance and can even be counterproductive. Statistics must be kept up-to-date. But too frequent of updates to statistics could degrade performance.

Do not have excessive numbers of indexes. For OLTP operations, statistics must be updated for each index. Having unnecessary indexes will slow your database performance with each DML command. The indexes themselves will also consume disk space.

7. Normalize the database. Some databases start out denormalized as the database architect did not think the program would become popular or grow. Normalized tables are usually less useful to humans when looking at them directly than denormalized tables. But highly normalized tables involve less locking during OLTP operations. PostgreSQL is ACID-compliant and normalization facilitates this feature. (In some instances, if there are joins across large datasets, you may want to have duplicative data in denormalized form. Usually OLTP databases benefit from normalization, hypotehtically it is possible that denormalized tables could help specific read applications.)

8. Do not run expensive queries during peak hours. Full database backups during peak business hours are not recommended. If you are not sure if a SQL statement is expensive use the EXPLAIN command. The EXPLAIN command will find a cost of a query without actually running it. This is advantageous because it will rely on statistics and not cause a significant load on your database. EXPLAIN ANALYZE will actually run the query and provide details of what is happening behind-the-scenes. These can be helpful when trying to learn more about what your stored procedures will do to the database. To learn about the health of your database in a public cloud, you may want to learn more about Azure Metrics, Cloud SQL (in GCP) which provides metrics data with instance information, or Amazon RDS monitoring.

9. If you need additional disk space (e.g., for solid state disks which are very fast), you may want to analyze the data types chosen in your tables. Sometimes business requirements allow you to maintain precision with datatypes that do not facilitate the same range of values. If you can maintain necessary accuracy and redesign tables to use columns that are minimally large enough to accommodate expected growth of the values of new data, using such datatypes can keep the size of tables smaller. Refactoring database tables can be a big task. If there is ample RAM and lots of disk I/O activity, you may want to increase the default work_mem value. For adjusting memory settings for PostgreSQL specifically, you may want to view this external site.

10. If you have ample disk space, created materialized views or temporary tables to query from. Complex computations can be expensive. Keeping a materialized view or temporary table can reduce CPU and RAM utilization. If you have configured TABLESPACE appropriately, having materialized views can reduce I/O contention and deadlocks. Well-designed materialized views can enhance your database performance significantly.

11. The query planner has limitations. If statistics are not being updated as rapidly as the database tables are being updated, the query planner will not be operating in the most efficient manner.

There are many operations that have constant values (sometimes they are integers while others are floating point values) that serve as default weights in PostgreSQL. Through scrutiny of your system, research and/or trial-and-error, you could optimize the query planner by granularly modifying these values for over 12 different constants. This could increase the efficiency of your system. The default values will be used, but they are not necessarily the most appropriate for your needs and hardware.

For example the query planner may place an arbitrary estimate on fetching pages from a database. Fetching pages (sections of a database table) from sequential pages is a different operation from randomly selecting pages. If your database tables tend to be in memory, placing a lower value for the random_page_cost may help the query planner. Likewise if you are in a memory-constrained environment, then setting a higher value for the random_page_cost may make the query planner consider costs in a more accurate way.

To learn more about these different values (other than the cost of fetching pages), see this page.

12. From a pragmatic perspective network latency can be factor with database performance. Optimizing networks to minimize congestion and packet collisions is beyond the scope of this posting. Operating system environments that support PostgreSQL can also impact database performance. Tuning and architecting servers is also beyond the scope of this posting.

13. For text indexing, SQL databases are not always the best. NoSQL options may be more suited. Elastic Search or Apache Lucene may be what could help you for searching text files.

14. If your OLTP database uses a user-defined function or stored procedure that uses procedural logic (e.g., looping through each row), see if you can re-write it to use a CASE statement. In SQL Server, CASE statements are much faster (according to the SQL Server 2008 Bible).

15. If the database is large, use partitioning and/or sharding. "Sharding and partitioning are both about breaking up a large data set into smaller subsets. The difference is that sharding implies the data is spread across multiple computers while partitioning does not." (This quote was taken from https://hazelcast.com/glossary/sharding/.) To learn more about partitioning or sharding (which can be complicated), see these pages:

https://www.percona.com/blog/2019/05/24/an-overview-of-sharding-in-postgresql-and-how-it-relates-to-mongodbs/
https://wiki.postgresql.org/wiki/WIP_PostgreSQL_Sharding
https://pgdash.io/blog/postgres-11-sharding.html
https://www.highgo.ca/2020/03/12/horizontal-scalability-sharding-in-postgresql-core-missing-pieces-of-the-puzzle/
https://medium.com/swlh/beginners-guide-to-table-partitioning-in-postgresql-5a014229042
https://severalnines.com/database-blog/guide-partitioning-data-postgresql
https://www.enterprisedb.com/blog/postgres-table-partitioning
https://www.enterprisedb.com/postgres-tutorials/how-use-table-partitioning-scale-postgresql
https://www.postgresql.org/docs/10/ddl-partitioning.html

16. Memory buffers act as a cache; this setting in PostgreSQL is known as shared_buffers. To optimize the OLTP operations, adjust the shared_buffers size to be either lower or higher. To learn more, see these postings:
https://www.postgresql.org/docs/9.1/runtime-config-resource.html
https://redfin.engineering/how-to-boost-postgresql-cache-performance-8db383dc2d8f

17. Configure checkpoints to happen in longer duration intervals.

"A checkpoint is a point in the transaction log sequence at which all data files have been updated to reflect the information in the log. All data files will be flushed to disk." (This was taken from https://www.postgresql.org/docs/9.2/sql-checkpoint.html.)

By default they happen every five minutes on systems with a low load. If they happen less frequently, you should get better performance. The max_wal_size parameter can be configured differently; the checkpoints can happen in a way that is influenced by this number. "The general rule is: increasing this [max_wal_size] value will lead to more space consumption, but improve write performance at the same time." This optimization tip was adopted, and its last sentence quoted, from https://www.cybertec-postgresql.com/en/postgresql-what-is-a-checkpoint/

Leave a comment

Your email address will not be published. Required fields are marked *