Passive node cluster service issues following Windows Updates



I recently came across the following issue, and found very little documentation online about it, so thought i'd add it here to help others. Following the installation of some windows updates security patches onto a passive SQL cluster node, the cluster service refused to start. Event id 7031, 1073, 1173, 1123 were all logged in the system event log.

Event Type: Warning
Event Source: ClusSvc
Event Category: Node Mgr
Event ID: 1123
Date: 01/11/2010
Time: 13:21:56
User: N/A
Computer: PASSIVENODE
Description:
The node lost communication with cluster node 'ACTIVENODE' on network 'Heartbeat (Left)'.
--------------------------------
Event Type: Error
Event Source: ClusSvc
Event Category: Membership Mgr
Event ID: 1173
Date: 01/11/2010
Time: 13:22:39
User: N/A
Computer: PASSIVENODE
Description:
Cluster service is shutting down because the membership engine detected a membership event while trying to join the server cluster. Shutting down is the normal response to this type of event. Cluster service will restart per the Service Manager's recovery actions.
--------------------------------
Event Type: Error
Event Source: ClusSvc
Event Category: Startup/Shutdown
Event ID: 1073
Date: 01/11/2010
Time: 13:22:39
User: N/A
Computer: PASSIVENODE
Description:
Cluster service was halted to prevent an inconsistency within the server cluster. The error code was 5890.

After numerous network traces and diagnostics i discovered that one of the windows updates previously installed (KB97546) had updated a file named MSV1_0.dll on the passive cluster, this had caused a version mismatch and was the cause of the problems. The active node had a file version 5.2.3790.4587 where as the passive node had 5.2.3790.3959.

Before discovering the mismatch we had already evicted the passive node from the cluster, so i can't say whether just uninstalling KB975467 from the passive node would resolve the issues. Most likely, it will.


Below was my full set of troubleshooting/resolution steps.

  • Suspected a Network Issue so collected Network Monitor Trace for Heartbeat NIC - came out clean.
  • Evicted NODE B from cluster and ran, cluster.exe node /forcecleanup
  • Tried adding NODE B in cluster but it failed to start Cluster Service during the join process.
  • Confirmed following registry entries on both the nodes were the same,

Verified that HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\LMCompatibilityLevel and it is set to 2.

Verified that HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\lmcompatibilitylevel and restrictanonymous are the same on all nodes.

Verified that HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa\MSV1_0\ntlmminclientsec and ntlmminserversec are the same on all nodes.

  • Verified the Cluster Service Account Password used was correct and not expired.
  • Checked HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Cluster Server and Clusterinstallationstate was set to 1 on passive node and it was set to 2 on active node.
  • Confirmed the subnet mask for all the interfaces in cluster on both the nodes were configured correctly.
  • Investigated installed KB's and noticed that there was a mismatch in the file version of MSV1_0.dll between the cluster nodes.
  • Tried replacing the .dll file from NODE A but that did not help.
  • Uninstalled KB 975467 from NODE B and the file version was reverted to 5.2.3790.4530.
  • Successfully re added the passive node into the cluster and restarted the cluster service.
Hope this helps someone else out!

5 comments:

  1. Thanks. But uninstalling KB975467 not helping. Replacing library msv1_0.dll from active node to passive node is helping after uninstalling KB.

    Sorry for my English. :)

    ReplyDelete
  2. IIRC uninstalling the KB should revert the version of this DLL for you. AS you say though, manually copying and registering will achieve the same. Good news though, glad it's sorted. HTH

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Its really informative the facts and other informative points mentioned here are quite considerable and to the point as well, would be so far better idea to look for more of that kind to have better and efficient business affairs.

    Hvac Service Management Software

    ReplyDelete
  5. This is what passes for a forest in West Texas, “the center of everything, cheap ray bans five hundred miles from anything,” as Orbison once put it. The Orbison family moved to Wink in 1946, when Roy was ten years old, so his father, Orbie Lee, could find work in the oil fields. ray ban wayfarers online Though he did eventually get hired on as a rigger, the Orbisons were late to the oil boom party: Wink’s population peaked at around 6,000 in 1929; seventeen years later, when the Orbisons settled in, ray ban sunglasses australia most of the wells had dried up and the town had shrunk to about 1,500 residents.

    ReplyDelete