mardi 3 février 2015

Why is Postgres on RDS maxing out on CPU every few hours?


Using Amazon RDS, we are running ETL scripts to migrate our data. However, every few hours there is a huge CPU spike. I'm fairly new to Postgres, but I think this might have something to do with our shared buffers being large.


Here are the ETL specs (per ETL):



50 records inserted / second
pool of 1000 connections


Here are the server's specs:



Amazon R3.8XL
244 GB RAM
32 vCPU
3TB SSD
10 GiB Network Performance
No Multi-AZ (yet)


Here are the main & modified PG parameter group settings:



checkpoint_completion_target = 0.9
checkpoint_segments = 16
effective_cache_size = {DBInstanceClassMemory/10923} (8kb)
maintenance_work_mem = {DBInstanceClassMemory/16384} (kb)
max_connections = {DBInstanceClassMemory/12582880}
max_locks_per_transaction = 64
shared_buffers = {DBInstanceClassMemory/32768} (8kb)
work_mem = {DBInstanceClassMemory/20480000} (kb)


In this case, DBInstanceClassMemory is approximately 244,000,000,000 bytes. The (8kb) means the value is taken as blocks of 8kb, so shared_buffers = 244000000000/32768*8 = 60 mb. All of the changes made were based on this article, and I set the effective_cache_size to 75% because (as you will see below) the memory doesn't seem to be getting fully utilized.


Here is a screenshot of our database server's stats over a 6 hour time span: enter image description here


The graph on the top left shows the CPU spikes, and you can see the correlated drop in Write IOPS (the graph below it).


What can be the reason for these CPU spikes? They almost completely freeze queries by the ETL (taking upwards of 3 minutes for queries that would usually take less than a second).





Aucun commentaire:

Enregistrer un commentaire