THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Maria Zakourdaev

My AWS Aurora Postgres instance is 100% CPU. What’s next?

I had to investigate today situation when my AWS Aurora PostgreSQL instance CPU was 100%.

I have started to search for a root cause at the database statistic views level. There is a view pg_stat_activity which shows information related to the current activity of each process, such as host ip, last transaction starttime and waitstats information. There were no long running transactions or long waits and unfortunately this view does not have any counters on CPU or memory usage per process. Boom.

Another way to track down performance issues is to use extension pg_stat_statements to see execution statistics. This view rows are per query and provide information about how many times query was executed, query execution time, number of rows retrieved etc. Again, no counters related to memory or cpu usage.

Some use the below query (source here) that is based on query total_time, assuming that the query that runs longer uses more cpu.

SELECT substring(query, 1, 50) as query, round(total_time::numeric, 2) AS total_time, calls, 

rows, round(total_time::numeric / calls, 2) AS avg_time, 

round((100 * total_time / sum(total_time::numeric) OVER ())::numeric, 2) AS percentage_cpu 

FROM pg_stat_statements 

ORDER BY total_time 

DESC LIMIT 10;

I am not sure it’s true in all cases but this might point us to the root cause.

I could have used plperlu extension that can show percentage of CPU and memory used by particular session but it is not supported by AWS RDS.


Postgresql is a process-based system, it starts new process for each database connection. This is why you can see database connection memory and cpu usage using OS facilities only.

If we are using RDS, we have no access to OS level and cannot run top to take a look at all processes.

Today I have discovered advanced monitoring for RDS instances to monitor OS processes:

After enabling this monitoring, we can chose OS process list in the below drop down:

Using the above, you can monitor all PostgreSQL processes and their resource consumption !

Pid 6723 had some sensitive information and I had to clean it. The above screenshot was taken after the CPU peak was over which is why the numbers are low.

Now I can go back to pg_stat_activity and check which host and which application is using the specific connection and see executed queries and the waitstats:

select * from pg_stat_activity where pid = 6723

Unfortunately pg_stat_activity does not show an active statement but only the top-level one. And there is no way to join between pg_stat_activity and pg_stat_statements to match pid of the connection with query history that we can see in pg_stat_statements.

We are halfway through our problem. We now know which processes are consuming CPU but do not really know which queries they have executed.

However, since we have the hostname and the application that stand behind problematic connections, we can go to the application developers, check together the query patterns that they execute and try to understand why their connections are so CPU intensive.

I would appreciate hearing your thoughts on this

Yours

Maria

Published Friday, October 26, 2018 10:38 PM by Maria Zakourdaev

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(required) 
Submit

About Maria Zakourdaev

The shortest word in the English language that contains the letters: abcdef is… feedback! Let me know if I have touched something that is alive in the cosmos.
Privacy Statement