ACE (the Anti-Chaos Engine)
In an eventually consistent system (like a pgEdge cluster), your nodes have the potential to go out of sync (diverge) due to replication lag, network partitioning, or node failure. Database differences can also arise out of planned or unplanned downtime or unexpected cloud provider failures. The Anti Chaos Engine (ACE) allows for reconciliation of such differences across databases by performing efficient table comparisons and reporting found differences in output formats that are both machine and user friendly. You can also use ACE to repair the discrepancies by specifying the node that is considered your source-of-truth and instructing ACE to update the differences between the nodes.
You can use ACE functions at the command line (opens in a new tab), or from an API endpoint (opens in a new tab). Scheduling options (opens in a new tab) for ACE make it convenient to include as part of routine maintenance to keep your pgEdge cluster healthy.
Common Use Cases for ACE
Planned Maintenance: If you need to take a database node down for administrative purposes, its state may fall behind the state of other nodes. When you bring the node online again, you can use ACE to compare node states and bring the node up to speed.
Node Failure: Network failure can cause unplanned node downtime; use ACE to repair the node’s state and return it to the cluster.
Network Partition: In a typical deployment of pgEdge, a database cluster is often spread across multiple regions and data centers. Due to the volatility that comes with a large-scale deployment, network partitions are commonly used to manage data. You can use ACE to return a cluster to a consistent state if the state is disrupted due to partition-related irregularities.