I must admit I think very little about the VMware Horizon ADAM database but this week I was forced to when a customer called to say they were having problems when attempting to upgrade their Horizon connection servers from 7.5 to 7.6. Specifically the error was stating that “LDAP is not ready for an upgrade.”
The Root Cause – reverting to an earlier VM snapshot?
VMware snapshots are great! I love them, I do, especially for their immense CYA potential when messing around with updates or registry settings or whatever and should you hose a VM, you can use the snapshot to easily revert back to where you started.
Note however, that snapshots should be used with caution! All seasoned VMware administrators should know that snapshots are not meant for long-term backup or data retention for if they are used in this manner as unattended snapshots can cause all free space within a datastore to be consumed, thus causing all VMs with virtual disks in that datastore to crash. But even if your datastore space doesn’t fill up, restoring old snapshots can lead to problems that may not be evident immediately.
All seasoned VMware administrators should know that snapshots are not meant for long-term backup or data retention….
Though I don’t know how old the snapshot was, the “first” Horizon connection broker in the environment was reverted to an earlier snapshot and since Horizon connection servers using an ADAM database to store and replicate configuration information, I’d say it’s highly likely that reverting to the earlier snapshot caused the replication errors/failures seen between the connection servers.
In regards to resolving this issue, I found the post VMware Snapshot and Recovery: Fix AD Replication by Guy Brand the most useful. And though the post details the steps to resolve AD replication issues after reverting an AD domain controller to an earlier snapshot, the problem I was experiencing between the Horizon connection servers was virtually identical in its symptoms and ultimately, the final resolution.
From what I’ll call VCS01, I ran the following command repadmin /showrepl localhost:389 DC=vdi,DC=vmware,DC=int and saw that the source server is rejecting replication requests.
Next, I used repadmin to check that status of inbound and outbound replication by executing the command repadmin /options localhost and just as it states in Guy’s post, inbound and outbound replication was disabled on VCS01. To re-enable inbound and outbound replication on VCS01, I executed the following commands on VCS01:
- repadmin /options localhost -DISABLE_OUTBOUND_REPL
- repadmin /options localhost -DISABLE_INBOUND_REPL
- repadmin /options localhost (to verify inbound and outbound replication was re-enabled)
I then connected to VCS02 to check its repadmin options and found no issues.
You can also use repadmin to force/test replication between the Horizon connection servers. The syntax, as detailed by VMware KB1021805 is as follows:
repadmin /replicate localCSFQDN:port remoteCSFQDN:port DC=vdi,DC=vmware,DC=int
To test replication between the Horizon connection servers, I executed the following commands:
- From VCS01: repadmin /replicate vcs01:389 vcs02:389 DC=vdi,DC=vmware,DC=int
- From VCS02: repadmin /replicate vcs02:389 vcs01:389 DC=vdi,DC=vmware,DC=int
Fortunately, there were no replication errors.
Though I cannot state with 100% certainty that reverting to an earlier snapshot caused this issue, I am about 99.999% sure this was the issue. With the replication errors between the Horizon connection servers resolved, the upgrade to Horizon 7.6 proceeded without any further issues.
With the replication errors between the Horizon connection servers resolved, the upgrade to Horizon 7.6 proceeded without any further issues.
Snapshots are awesome and have gotten me out of several jams, but beware, they can also cause issues should you revert to what I’ll call an “out-of-date” snapshot, especially when dealing with systems that replicate to and between one another.