Routing interprocess connections between NLB cluster members



This is question is going to seem a bit strange, but hopefully someone can
help me. I am trying to use NLB in a small, two server (Win2003 server)
farm in a publicly accessible perimeter network. I have the usual
complement of IIS web and FTP servers, as well as a whole cadre of homegrown
TCP/IP sockets based services. Some of these applications communicate
between themselves using TCP/IP socket-based sessions. It's this last
group, which is sending data between two NLB apps on the same cluster, that
is giving me fits.

I am using NLB in a form of poor-man's automated backup, rather than to
balance traffic across the servers. I have specific port rules set up for
each virtual IP address. I run the duplicate services (active and started)
on both servers, but I have the port rules are set up in single server mode.
The port rules, priorities and apps are set so that only one of the two
machines normally services the traffic for each specific services. The
priority on the second machine is set lower, so that if the primary server
for a service is shut down or fails, the NLB cluster reconverges, and
traffic automatically will transfer to the backup server. With NLB, we
don't have to do anything special (like starting backup services, changing
IP addresses across servers, etc.) to keep things running on a near
continuous basis. The need for this is real as our offices are only staffed
for 10 hours each day, but we have processing that runs 24x7. We are trying
to make failover as automatic (and cheap as possible).

The inter-process communications TCP/IP sessions are the ones presenting me
with a problem. I want them to automatically fail over to the remaining
server (per the port rules and priority). For NLB to do its thing, though,
all traffic for the servers must be routed through the network switch which
services all servers in the cluster. Take a look at the excerpt from the
route table on one of the NLB servers, below. The virtual IP addresses of
the NLB apps are in the 192.168.12.xx range. Look closely at the entry for
192.168.12.11. Note that the gateway and interface IP addresses are both
set to the loopback address. This is the same of all of the NLB addresses.
If I try to start an inter-process session to one of these addresses one of
these NLB these addresses, the TCP/IP stack sees the loopback address is the
destination and routes it internally, never sending it to a NIC or switch
for load balancing. My problem occurs if an app tries to connect to a
service which is currently being processed on the opposite machine. Instead
of going through the load balancing (via the switch and NIC) and being
processed by the single-mode machine with the highest priority, it will
instead be swallowed by the TCP/IP stack (because of the loopback address)
and it will be processed by the service on the local machine. Some of these
services are not truly "stateless" and I need them to only be processed on
the machine with the highest priority. To net this out, I need a way to
force the transaction out to the switch so that it gets seen by all the NLB
servers (and only handled by the one with highest priority).

I tried setting a static route table entry on each server, setting the
gateway addresses for all NLB virtual addresses to the gateway router. This
sent the traffic to the router which then ping-ponged it back through the
switch, but it caused the router to also include with every packet routed an
ICMP re-direct packet (telling the server it has a closer, more direct route
to the address and to stop bothering the router). This process creates a
ton of extra traffic for the router, which it shouldn't really have to
handle, since the switch could very well have handled the packets. Now my
problem is the switches I am saddled with are just dumb, unmanaged switches
without VLANs or IP addresses on their ports. I can't set a static route to
the send the packet only to the switch because it doesn't have an IP
address.

The route command I used to set up the static route was:

route -p add 192.168.12.0 mask 255.255.255.0 192.168.12.2 metric 1 [the
router's address is 192.168.12.2]

I am considering setting up the following route command (actually, as series
of commands, one command for each NLB virtual IP address):

route -p add 192.168.12.11 mask 255.255.255.0 192.168.12.11 metric 1

This is similar to the entry in the route table for the 192.168.12.0 network
(the 192.168.12.81 gateway IP address is the first IP on on the NIC). I'm
not sure whether this will actually route the traffic out to the switch,
though. I have a feeling that it will only be serviced by the NIC driver
and that the network switch will never get the packet (so it can forward it
to all the NLB servers for marshalling).

Has anyone else come across this same problem and solved it? Specific
example statements showing a solution would be appreciated.


ROUTE TABLE EXCERPT:

Active Routes:
Network Destination Netmask Gateway
Interface Metric
0.0.0.0 0.0.0.0 192.168.12.2
192.168.12.81 20
127.0.0.0 255.0.0.0 127.0.0.1
127.0.0.1 1
192.168.12.0 255.255.255.0 192.168.12.81
192.168.12.81 20
192.168.12.11 255.255.255.255 127.0.0.1
127.0.0.1 20
192.168.12.12 255.255.255.255 127.0.0.1
127.0.0.1 20
192.168.12.13 255.255.255.255 127.0.0.1
127.0.0.1 20
192.168.12.21 255.255.255.255 127.0.0.1
127.0.0.1 20
192.168.12.81 255.255.255.255 127.0.0.1
127.0.0.1 20
192.168.12.255 255.255.255.255 192.168.12.81
192.168.12.81 20
255.255.255.255 255.255.255.255 192.168.12.81
192.168.12.81 1
... OMITTED ...
Default Gateway: 192.168.12.2


.



Relevant Pages

  • RE: Managed Switch => Hub for clustering?
    ... I am trying to use NLB for Sharepoint Portal Server 2003. ... also tried using cross over cables for NIC2 on both servers but of no use. ... I strongly feel its something with the switch itself. ... > virtual IP address be plugged into a hub, ...
    (microsoft.public.windows.server.clustering)
  • NLB and Vlan setup questions
    ... I am new to setting up NLB and have a few questions. ... The servers are configured with two network cards. ... My plan is to create a vlan2 on the switch for the two NLB cards to ...
    (microsoft.public.windows.server.clustering)
  • Re: Windows 2003 Network Load Balancing Problem
    ... Sadly my servers websites were configured with specific IPs and host ... I find it very strange why the NLB driver can receive on a virtual IP ... he set his IIS website to use ALL UNASSIGNED addresses rather than ... specifically pointing it at the single virtual cluster address. ...
    (microsoft.public.windows.server.clustering)
  • Re: NLB with Catalyst switches
    ... be able to stop the flooding issue on the switch so I will have to isolate ... it to a separate VLAN and place my NLB TS servers in it. ... >> performance I should use 2 NICs. ...
    (microsoft.public.windows.server.clustering)
  • Re: Load balancing strange issue
    ... I often made a NLB cluster but never had this much problems configuring it. ... I updated the NIC drivers to the newest version today on both servers. ... >> Cluster configuration stabilized. ...
    (microsoft.public.windows.server.clustering)