Configuring PostgreSQL for Pretty Good Performance

1797

PostgreSQL is a highly customizable relational database management system (RDBMS) with a dazzling array of configuration options. Fortunately, especially in recent versions of PostgreSQL, most of these settings do not need to be changed. The trick is to figure out which options do need to be changed and what values are most appropriate.

A few minutes spent adjusting your postgresql.conf file can result in major performance improvements. Further improvements may be possible with sufficient time and thought, but in this article I’ll focus on how to quickly get your settings into the right ballpark.

One of the most important methods of improving PostgreSQL performance, especially on large data sets, is to increase the parameter shared_buffers, which controls the amount of memory PostgreSQL uses for its private buffer cache. Because PostgreSQL also makes use of the system buffer cache, and for other reasons, experience has shown that it’s usually best to set this parameter to about 25% of system memory on UNIX and Linux systems, but not more than about 8GB. (Sometimes 8GB is already too much.)

On Windows, even smaller values are appropriate, between 256MB and 512MB. Raising this parameter may cause PostgreSQL not to start, because most operating system distributions have very conservative limits on the amount of System V shared memory that can be allocated by default. Fortunately, the PostgreSQL documentation gives detailed instructions on how to increase these limits, and the process is typically not difficult.

Aside from shared_buffers, the most important memory-allocation parameter is work_mem. The default value of 1MB allows any sort, hash join, or materialize operation to use up to 1MB of physical memory. Larger operations will use a less efficient algorithm that allows data to spill to disk. Raising this value can dramatically improve the performance of certain queries, but it’s important not to overdo it. In the worst case, every connection could be performing several sorts simultaneously, so if you have 100 simultaneous connections to the database and work_mem is set to 10MB, you might use several gigabytes of memory — if you don’t have enough, your system will begin swapping, and performance will drop off dramatically. Reasonable values are typically between 4MB and 64MB, depending on the size of your machine, how many concurrent connections you expect, and the complexity of your queries.

While you’re setting work_mem, it’s also a good idea to set the related parameter maintenance_work_mem, which controls the amount of physical memory PostgreSQL will attempt to use for maintenance operations, such as routine vacuuming and index creation. As with work_mem, it’s important not to set this parameter so high that your system begins swapping, but because it’s rare to have more than 3 or 4 maintenance operations in progress simultaneously, it isn’t necessary to be quite so careful about the value. A good rule of thumb is to set this to about 5% of system memory, but not more than about 512MB. Larger values won’t necessarily improve performance.

Finally, you should increase the default value of wal_buffers, which defaults to 64kB. Although even this very small setting does not always cause a problem, there are situations where it can result in extra fsync calls, and degrade overall system throughput. Increasing this value to 1MB or so can alleviate this problem. On very busy systems, an even higher value may be needed, up to a maximum of about 16MB. Like shared_buffers, this parameter increases PostgreSQL’s initial shared memory allocation, so if bumping it up causes the system not to start, you’ll need to increase the operating system limit.

Tuning Checkpoint Parameters

Once you’ve got the memory-related parameters set correctly, the next thing you should think about doing is tuning your checkpoint parameters. Increasing the checkpoint_segments parameter, which defaults to 3, can dramatically improve performance during bulk data loads. A reasonable starting value is 30. Once you’ve increased this parameter, it also makes sense to increase checkpoint_completion_target, which defaults to 0.5, to 0.9; this will decrease the performance impact of checkpointing on a busy system (but is ineffective for small values of checkpoint_segments, which is why the default is 0.5).

Finally, increasing checkpoint_timeout from 5 minutes to a larger value, such as 15 minutes, can reduce the I/O load on your system, especially when using large values for shared_buffers. The downside of making these adjustments is that your system will use a modest amount of additional disk space, and will take longer to recover in the event of a crash. However, for most users, this is a small price to pay for a significant performance improvement.

With memory allocation and checkpoint parameters out of the way, it’s time to adjust your planner parameters. These settings are generally not quite so important as the memory allocation and checkpoint parameters, but they can definitely make a difference, especially on more complex queries. The parameters random_page_cost and seq_page_cost, control the planner’s estimate of how expensive it will be to obtain each database page. The default values assume very little caching, so it’s frequently a good idea to reduce them. Even if your database is significantly larger than physical memory, you might want to try setting these parameters to 2 and 1 (rather than the default values of 4 and 1) to see whether you get better query plans that way. If your database fits entirely within memory, you can lower these values much more, perhaps to 0.1 and 0.1. Never set random_page_cost less than seq_page_cost, but consider setting them equal (or very close to equal) if your database fits mostly or entirely within memory.

You should also configure the parameter effective_cache_size. Despite being measured in megabytes, this parameter does not allocate any memory. Instead, it is used by the query planner to estimate certain caching effects. When this parameter is set too low, the planner may decide not to use an index even when it would be beneficial to do so. An appropriate value is approximately 75% of physical memory.

Finally, for best performance, it’s a good idea to consider setting the synchronous_commit parameter to off. When this parameter is turned off, an unexpected crash or power failure could result in the loss of a transaction that was reported to the client as committed. For financial or other mission-critical applications, this is unacceptable, and the default value of on should be retained. However, many web applications can tolerate the loss of a few seconds with of updates in the event of a crash, and the performance gain from changing this setting can be massive.

Going Farther

If you’re building a database server from scratch, there are many things that you can do to improve performance and reliability above and beyond adjusting parameters in postgresql.conf. Greg Smith’s book PostgreSQL 9.0 High Performance is the definitive guide to building a high-performance PostgreSQL system, and the chapter on database hardware (PDF) is freely available as a sample.

It covers not only how to maximize performance, but also steps you should take to maximize the reliability of your database on either new or existing hardware. If you’re serious about maximizing PostgreSQL performance, both that chapter and the rest of the book are highly recommended reading. If, however, you’re a casual PostgreSQL user or just trying the product out, the suggestions above will enable you get you much of the performance benefit in a fraction of the time.