![]() ![]() With this information, you can start to diagnose what is happening. This approach can help me get out of immediate trouble and plan my long-term fix. One thing I like to do after I have identified the problem is to execute a verbose vacuum command and see how long the action takes. I’ll walk through the steps that I take in identifying solutions to the most common problems I’ve helped customers resolve. Let’s say that the monitor has sent the alarm email and you know that there is a problem. There is nothing more frustrating than trying to fix this type of problem while keeping the system under normal workloads. It’s a luxury to have extra time to fix these issues. If the age of the oldest transaction is 1 billion, autovacuum is having a problem keeping this threshold at the target of 200 million. The default autovacuum_freeze_max_age value is 200 million. It really depends on the problem’s cause how much time it will take to fix. Setting this threshold to 1 billion should give you plenty of time to investigate the problem. This message will also show up in the CloudWatch dashboard: In my example, I’ve already exceeded the threshold, so I get an email with this wording: Then set up the threshold (1000000000) and notification email: ![]() If you have several instances, you can group several together and get a single alarm.Īfter selecting your target instance, choose Next to set up the threshold: For this example, I’m going to just select one of them. Select the MaximumUsedTransactionIDs metric and press Return to bring up a list of your instances. Then choose Alarms on the navigation pane:Ĭhoose Create Alarm to create a new alarm: Let’s take a look at configuring a CloudWatch alarm, as described in the CloudWatch documentation.įirst, choose CloudWatch in AWS Management Console: For some situations, a low severity warning at 500 million might also be useful. To generate this metric, the Amazon RDS agent runs this query: SELECT max(age(datfrozenxid)) FROM pg_database įrom my experience working with customers, I recommend an alarm when this metric reaches a value of 1 billion. Although it can be monitored with manual queries, we have introduced an Amazon CloudWatch metric, MaximumUsedTransactionIDs, to make setting this kind of alert easier for you. There are several reasons a transaction can become this old, but let’s first look at how you can be alerted of this pending problem. We now know that around 2.1 billion unvacuumed transactions is “where the world ends” to quote the source code comment. A very detailed explanation of transaction ID wraparound is found in the PostgreSQL documentation. This vacuum requires multiple hours or days of downtime (depending on database size). If the number of unvacuumed transactions reaches (2^31 - 1,000,000), PostgreSQL sets the database to read-only mode and requires an offline, single-user, standalone vacuum. If the number of unvacuumed transactions reaches (2^31 - 10,000,000), the log starts warning that vacuuming is needed. I’ll cover two areas in this post: First, what you can do to monitor your Amazon RDS for PostgreSQL database’s transaction ID health, and second, common things I’ve helped customers work through once a problem is identified.Ī PostgreSQL database can have two billion “in-flight” unvacuumed transactions before PostgreSQL takes dramatic action to avoid data loss. However, some workloads and usage patterns require customized parameters. For most PostgreSQL users, the default settings will work just fine. The effectiveness of this background process is constantly being improved with each release. Shawn McCoy is a database engineer in Amazon Web Services.Īs a PostgreSQL database owner, you know that critical maintenance operations are handled by the autovacuum process.
0 Comments
Leave a Reply. |