NLB Convergence Issue Where Connections Switch to Host 2 While the primary Host 1 is still running

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance



I'm running Network Load Balancing (NLB) in a two server configuration
(WHICH HAS BEEN RUNNING FOR YEARS) to manage incomming FIX connections
from the Internet. We have an application running on each server with
server/host 1 as the primary and affinity set to host 1. TO ME THIS
MEANS AND HAS ALWAYS MEANT THAT AS LONG AS HOST 1 IS UP AND THE
APPLICATION IS RUNNING THE FIX CONNECTIONS SHOULDN'T SWITCH TO HOST 2.

Today, for the second time this year, we experienced a problem where
one of the hosts (host 2) left the cluster and then rejoined it within
seconds causing the FIX application to partially switch to host 2.

My questions are as follows:
1. What might cause a hicup where a host leaves and rejoins a cluster
within seconds?
2. What might cause some of the connections from HOST 1 to switch to
HOST 2 on reconvergence with HOST 1 set as the primary for the
application, which was still up and running, and AFFINITY set to HOST
1?

Here is the output of the NLB Status command.

C:\>nlb display
WLBS Cluster Control Utility V2.4 (c) 1997-2003 Microsoft Corporation.
Cluster 192.168.1.100

=== Configuration: ===

Current time = 4/24/2008 9:11:07 AM
ParametersVersion = 4
VirtualNICName =
AliveMsgPeriod = 1000
AliveMsgTolerance = 5
NumActions = 100
NumPackets = 200
NumAliveMsgs = 66
ClusterNetworkAddress = 03-bf-3f-fb-19-59
ClusterName = nycluster.indii.com
ClusterIPAddress = 192.168.1.100
ClusterNetworkMask = 255.255.255.0
DedicatedIPAddress = 192.168.1.85
DedicatedNetworkMask = 255.255.255.0
HostPriority = 1
ClusterModeOnStart = STOPPED
PersistedStates = SUSPENDED
DescriptorsPerAlloc = 512
MaxDescriptorAllocs = 512
TCPConnectionTimeout = 60
IPSecConnectionTimeout = 86400
FilterICMP = DISABLED
ScaleSingleClient = 0
NBTSupportEnable = 1
MulticastSupportEnable = 1
MulticastARPEnable = 1
MaskSourceMAC = 1
IGMPSupport = DISABLED
IPtoMcastIP = ENABLED
McastIPAddress = 0.0.0.0
NetmonAliveMsgs = 0
EffectiveVersion = V2.4
IPChangeDelay = 60000
IPToMACEnable = 1
ConnectionCleanupDelay = 300000
RemoteControlEnabled = 1
RemoteControlUDPPort = 2504
RemoteControlCode = 0x35C32A0A
RemoteMaintenanceEnabled = 0x0
CurrentVersion = V2.4
InstallDate = 0x4032A38C
VerifyDate = 0x0
NumberOfRules = 12
BDATeaming = DISABLED
TeamID =
Master = DISABLED
ReverseHash = DISABLED
IdentityHeartbeatPeriod = 10000
IdentityHeartbeatEnabled = ENABLED
PortRules
Virtual IP addr Start End Prot Mode Pri
Load Affinity

192.168.1.90 80 80 TCP Multiple
Equal S
192.168.1.90 4202 4205 TCP Multiple
Equal S
192.168.1.90 6501 6505 TCP Single 1
192.168.1.91 80 80 TCP Multiple
Equal S
192.168.1.91 4202 4205 TCP Multiple
Equal S
192.168.1.93 80 80 TCP Multiple
Equal N
192.168.1.93 8085 8099 TCP Single 1
192.168.1.94 25 25 Both Single 2
192.168.1.94 80 80 TCP Multiple
Equal S
192.168.1.94 110 110 Both Single 2
192.168.1.94 8080 8080 Both Single 2
192.168.1.94 8443 8443 TCP Single 2


=== Event messages: ===

#301 ID: 0x4007001C Type: 4 Category: 0 Time: 4/24/2008 7:55:35 AM
NLB Cluster 192.168.1.89 : Host 1 converged with host(s) 1,2 as part
of the clus
ter.
000C0000 005A0004 00000000 4007001C 00000000 00000000 00000000
00000000
00000000 00000000 00061550 00000000 00000000

#299 ID: 0x4007003F Type: 4 Category: 0 Time: 4/24/2008 7:55:29 AM
NLB Cluster 192.168.1.89 : Initiating convergence on host 1. Reason:
Host 2 is
joining the cluster.
000C0000 005A0004 00000000 4007003F 00000000 00000000 00000000
00000000
00000000 00000000 00060A7D 00000000 00000000

#297 ID: 0x4007001D Type: 4 Category: 0 Time: 4/24/2008 7:55:07 AM
NLB Cluster 192.168.1.89 : Host 1 converged as DEFAULT host with
host(s) 1 as pa
rt of the cluster.
000C0000 005A0004 00000000 4007001D 00000000 00000000 00000000
00000000
00000000 00000000 00061532 00000000 00000000

#295 ID: 0x40070045 Type: 4 Category: 0 Time: 4/24/2008 7:55:02 AM
NLB Cluster 192.168.1.89 : Initiating convergence on host 1. Reason:
Host 2 is
leaving the cluster.
000C0000 005A0004 00000000 40070045 00000000 00000000 00000000
00000000
00000000 00000000 00060BD1 00000000 00000000

#287 ID: 0x40070045 Type: 4 Category: 0 Time: 4/24/2008 7:54:58 AM
NLB Cluster 192.168.1.89 : Initiating convergence on host 1. Reason:
Host 2 is
leaving the cluster.
000C0000 005A0004 00000000 40070045 00000000 00000000 00000000
00000000
00000000 00000000 00060BD1 00000000 00000000

#221 ID: 0x4007001D Type: 4 Category: 0 Time: 4/13/2008 12:04:35 PM
NLB Cluster 192.168.1.89 : Host 1 converged as DEFAULT host with
host(s) 1,2 as
part of the cluster.
000C0000 005A0004 00000000 4007001D 00000000 00000000 00000000
00000000
00000000 00000000 00061532 00000000 00000000

#219 ID: 0x4007004B Type: 4 Category: 0 Time: 4/13/2008 12:04:30 PM
NLB Cluster 192.168.1.89 : Current NLB host state successfully updated
in the re
gistry.
000C0000 005A0004 00000000 4007004B 00000000 00000000 00000000
00000000
00000000 00000000 000803DD 00000000 00000000

#217 ID: 0x40070005 Type: 4 Category: 0 Time: 4/13/2008 12:04:30 PM
NLB Cluster 192.168.1.89 : Cluster mode started with host ID 1.
000C0000 005A0004 00000000 40070005 00000000 00000000 00000000
00000000
00000000 00000000 000532BE 00000000 00000000

#215 ID: 0x4007003F Type: 4 Category: 0 Time: 4/13/2008 12:04:30 PM
NLB Cluster 192.168.1.89 : Initiating convergence on host 1. Reason:
Host 1 is
joining the cluster.
000C0000 005A0004 00000000 4007003F 00000000 00000000 00000000
00000000
00000000 00000000 0006082A 00000000 00000000

#213 ID: 0x4007002E Type: 4 Category: 0 Time: 4/13/2008 12:04:30 PM
NLB Cluster 192.168.1.89 : START remote control request received from
192.168.1.
89:1048.
000C0000 005A0004 00000000 4007002E 00000000 00000000 00000000
00000000
00000000 00000000 00053773 00000000 00000000


=== IP configuration: ===


Windows IP Configuration

Host Name . . . . . . . . . . . . : INNYWPP01
Primary Dns Suffix . . . . . . . :
Node Type . . . . . . . . . . . . : Unknown
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No

Ethernet adapter Team #1 - SFT OUTSIDE (Red (L), Black (B))
63.251.25.85:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel(R) Advanced Network
Services Virtua
l Adapter #2
Physical Address. . . . . . . . . : 00-09-6B-F1-9D-92
DHCP Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 192.168.1.94
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IP Address. . . . . . . . . . . . : 192.168.1.93
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IP Address. . . . . . . . . . . . : 192.168.1.91
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IP Address. . . . . . . . . . . . : 192.168.1.90
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IP Address. . . . . . . . . . . . : 192.168.1.89
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IP Address. . . . . . . . . . . . : 192.168.1.85
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.1.65
DNS Servers . . . . . . . . . . . : 216.52.94.1
216.52.94.33

Ethernet adapter Team #2 - SFT INSIDE (Blue (T), Green (R))
172.18.10.85:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel(R) Advanced Network
Services Virtua
l Adapter
Physical Address. . . . . . . . . : 00-09-6B-F1-9D-93
DHCP Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 192.168.168.5
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IP Address. . . . . . . . . . . . : 172.18.11.85
Subnet Mask . . . . . . . . . . . : 255.255.254.0
IP Address. . . . . . . . . . . . : 172.18.10.85
Subnet Mask . . . . . . . . . . . : 255.255.254.0
Default Gateway . . . . . . . . . :

=== Current state: ===

Host 1 has entered a converging state 3 time(s) since joining the
cluster
and the last convergence completed at approximately: 4/24/2008
7:55:35 AM
Host 1 converged as DEFAULT with the following host(s) as part of the
cluster:
1, 2
.



Relevant Pages

  • Re: Network fails when adding new NLB host
    ... NLB with Teamed NICs is a touchy area. ... Using teaming adapters with network load balancing may cause network ... > We are trying to add a new host too an existing 2 server NIb cluster ...
    (microsoft.public.windows.server.clustering)
  • Re: Front End Servers in physically seperate locations
    ... gateway servers and to host OWA/RPC HTTP. ... "Network Load Balancing enables all cluster hosts on a single subnet to ... I'm not sure if geographically dispersed NLB is supported. ...
    (microsoft.public.exchange.admin)
  • Need help with NLB setup.
    ... Host Parameters: ... Checked All for Cluster IP address ... After configured NLB. ... Host 2 has entered a converging state 1 timesince joining the cluster ...
    (microsoft.public.windows.server.clustering)
  • Re: any http traffic between NLB clusters knocks both off line for two to three minutes
    ... something other than Convergence? ... We watched traffic from one to the other, and ANY NETWORK TRAFFIC, even just a ping, from one NLB node to the other will "Take Out" the cluster. ... Initiating convergence on host 2. ...
    (microsoft.public.windows.server.clustering)
  • Re: NLB nodes in workgroup mode - Newbie Question
    ... Host 1 has entered a converging state 1 timesince joining the cluster ... Host 1 converged as DEFAULT with the following hostas part of the cluster: ... NLB setup is correct or not? ...
    (microsoft.public.windows.server.clustering)