Replication Group Member Recovery
This page provides recovery instructions based on the specific issue that you are experiencing.
Failover Recovery Overview
Section titled “Failover Recovery Overview”If the primary (write) database fails unexpectedly, such as in the case of power failure, network failure, etc., Porta will automatically failover to the backup database and continue operations. Here is a high-level overview of the failover process:
Primary failure
Section titled “Primary failure”Upon detection of a failure, the remaining databases will elect a new primary database.
Failover to backup
Section titled “Failover to backup”The backup database is now given the MEMBER_ROLE
of PRIMARY
and the ability to write data. Porta detects and establishes this new connection, and operations proceed as usual.
Failed machine begins recovery
Section titled “Failed machine begins recovery”If the failed machine regains connectivity, it will attempt to rejoin the group and begin recovering data that was missed during the failure. During this period, it will have a MEMBER_STATE
of RECOVERING
.
Failed Machine is fully operational again
Section titled “Failed Machine is fully operational again”When the failed machine has fully recovered, it will have a MEMBER_STATE
of ONLINE
and will be fully operational again. However, the MEMBER_ROLE
of PRIMARY
will remain with the backup machine until it leaves group, such as via failure, shutdown, or other means.
Manual Recovery Steps
Section titled “Manual Recovery Steps”There are some cases in which manual intervention will be required after a failure or shutdown. Below are some common issues and their recovery steps.
All/Most Databases Are Down and Won’t Come Online Correctly
Section titled “All/Most Databases Are Down and Won’t Come Online Correctly”When all databases are down and won’t come online correctly, as in the case of a site power failure, we need to manually bootstrap the group. See: Bootstrap the Replication Group
Single Database Failing to Rejoin Group
Section titled “Single Database Failing to Rejoin Group”- Check the status of all members in the group
- Navigate to the desktop’s
porta-onprem-bundle
folder, then toporta-helpers
, thenporta-database
. - In the
porta-database
folder, double clickview-ALL-group-repl-status.bat
- Navigate to the desktop’s
- If the other machines have
MEMBER_STATE
ofONLINE
and there is a member withMEMBER_ROLE
ofPRIMARY
, then we can try simply restarting replication on the machine that is failing to rejoin the group: Restart Replication - If the other machines have
MEMBER_STATE
ofONLINE
and there is NOT a member withMEMBER_ROLE
ofPRIMARY
, then bootstrap a machine that is notONLINE
: Bootstrap Replication- After bootstrapping the machine:
- On another machine, Restart Replication to ensure it joins the group
- On the last machine, Restart Replication to ensure it joins the group
- After bootstrapping the machine:
- If one of the above steps does not work, then we can try resetting the database container that is failing to rejoin the group:
- Reset the Database
- ! WARNING: Resetting will wipe the database data on the machine !
Every Machine Only Registers Itself in the Group
Section titled “Every Machine Only Registers Itself in the Group”If each machine only sees itself in the replication group with a MEMBER_STATE
of OFFLINE
, this likely means that there was no existing group to join. This can happen if the machines are not able to communicate with each other, or if the group replication process was not bootstrapped on any of the machines.
To fix this, we need to bootstrap the replication group: Bootstrap the Replication Group
A Machine is Listed as UNREACHABLE
Section titled “A Machine is Listed as UNREACHABLE”If the other machines are listed as ONLINE
, then restart replication on the machine that is listed as UNREACHABLE
: Restart Replication
If the restart fails to fix the issue, then we need to Bootstrap the Replication Group
A Joining Member Creates Its Own Group
Section titled “A Joining Member Creates Its Own Group”If a joining member creates its own group, this likely means that the group_replication_bootstrap_group
configuration was not set to OFF
on the joining member. We will need to turn this off in the configuration file and restart replication on the machine: Turn Off Default Bootstrap.
Note: It is possible that creating its own group may prevent it form joining the existing group, even after turning bootstrap off. If this happens, we may need to Bootstrap the Replication Group. If the machine is unable to join the group after bootstrapping, then we may need to Reset the Database.
For more troubleshooting details and error messages, see the Group Replication Troubleshooting document.
Common Replication Recovery Actions
Section titled “Common Replication Recovery Actions”Stop Replication
Section titled “Stop Replication”- Navigate to the desktop’s
porta-onprem-bundle
folder, then toporta-helpers
, thenporta-database
, thenactions
- In the
actions
folder, double clickSTOP-repl.bat
- Enter or confirm the current Windows user.
- Enter or confirm the default WSL user.
- Enter or confirm the current machine type.
- Enter or confirm the current machine IP address.
- Let the process run until finished.
Start Replication
Section titled “Start Replication”- Navigate to the desktop’s
porta-onprem-bundle
folder, then toporta-helpers
, thenporta-database
, thenactions
- In the
actions
folder, double clickSTART-repl.bat
- Enter or confirm the current Windows user.
- Enter or confirm the default WSL user.
- Enter or confirm the current machine type.
- Enter or confirm the current machine IP address.
- Let the process run until finished.
Restart Replication
Section titled “Restart Replication”- Navigate to the desktop’s
porta-onprem-bundle
folder, then toporta-helpers
, thenporta-database
, thenactions
- In the
actions
folder, double clickrestart-repl.bat
- Enter or confirm the current Windows user.
- Enter or confirm the default WSL user.
- Enter or confirm the current machine type.
- Enter or confirm the current machine IP address.
- Let the process run until finished.
Bootstrap Replication
Section titled “Bootstrap Replication”For bootstrapping a single member.
- Navigate to the desktop’s
porta-onprem-bundle
folder, then toporta-helpers
, thenporta-database
, thenactions
- In the
actions
folder, double clickbootstrap.bat
- Enter or confirm the current Windows user.
- Enter or confirm the default WSL user.
- Enter or confirm the current machine type.
- Enter or confirm the current machine IP address.
- Let the process run until finished.
- If prompted for migration, enter
n
. - If prompted for seeding, enter
n
.
Bootstrap the Replication Group
Section titled “Bootstrap the Replication Group”For bootstrapping the replication group on all machines, such as when a primary member does not exist.
- On the main machine, Stop Replication
- On the backup machine, Stop Replication
- On the arbiter machine, Stop Replication
- On the main machine, Bootstrap Replication
- On the backup machine, Start Replication
- On the arbiter machine, Start Replication
All members should be running correctly and ONLINE
.
Reset Database
Section titled “Reset Database”!! WARNING: THIS WILL WIPE DATA ON THE MACHINE !!
- Navigate to the desktop’s
porta-onprem-bundle
folder, then toporta-helpers
, thenporta-database
, thenactions
- In the
actions
folder, double clickcreate-backup.bat
to create a backup of the database. - In the
actions
folder, navigate to thecaution
folder. - In the
caution
folder, double clickreset.bat
- Enter or confirm the current Windows user.
- Enter or confirm the default WSL user.
- Enter or confirm the current machine type.
- Enter or confirm the current machine IP address.
- Enter or confirm additional machine information.
- Let the process run until finished.
Turn Off Default Bootstrap
Section titled “Turn Off Default Bootstrap”- Navigate to the desktop’s
porta-onprem-bundle
folder, then toporta-helpers
, thenporta-database
, thenactions
, thencaution
- In the
caution
folder, double clickset-bootstrap-OFF.bat
- Enter or confirm the current Windows user.
- Enter or confirm the default WSL user.
- Enter or confirm the current machine type.
- Enter or confirm the current machine IP address.
- Let the process run until finished.
- Navigate up to the
actions
folder and Restart Replication to rejoin the existing group.
Turn On Default Bootstrap
Section titled “Turn On Default Bootstrap”- Navigate to the desktop’s
porta-onprem-bundle
folder, then toporta-helpers
, thenporta-database
, thenactions
, thencaution
- In the
caution
folder, double clickset-bootstrap-ON.bat
. - Enter or confirm the current Windows user.
- Enter or confirm the default WSL user.
- Enter or confirm the current machine type.
- Enter or confirm the current machine IP address.
- Let the process run until finished.
- Navigate up to the
actions
folder and Restart Replication.