Re: working cluster, but can't ping nodes reliably



First:
if you have a "default gateway" configured on the hearbeat IP interface...
remove that on both nodes.
(might need reboot of cluster nodes, not sure)

Second:
if your private/hearbeat configured network range is really really large it
can include ip addresses from your public network, reconfigure your
heartbeat network to be a smaller without any overlap to any of your
public/company wide networks.



and test again, let us know

rgds,
edwin.

"billd" <billd@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:03504B02-29D8-467F-81FF-2B7D0761AA92@xxxxxxxxxxxxxxxx
Further inforamation.

I have just taken a small switch. I plugged the two cluster public
interfaces into it and I plugged my workstation into it. Everything
worked
fine. I went to a workstation on the main network, i tried to ping the
two
nodes by ip number. No replies, so there are no address conflicts. I
uplinked the hub back into the main network. Problem cam back, won't
ping.

I can't say I'm must better off because of this - I know that for some
reason, when I plug this rig into my main network, something goes haywire.

"billd" wrote:

This is a long shot I think, but I am not sure what to do next, maybe
someone can think of something...

I have a two node dell cluster. It runs SQL 2005.

Each node's public interface has an IP address .231 and .232
each node's private interaface has an IP address 10.51.100.1 and
10.51.100.2

The cluster's administrative address is .230, all of the public
numbers
are on a /24 subnet.

SQL server runs on this Clsuter, it has an IP address of .234

So, now for the wieirdness:

If I ping .231 I get the following

oh joy... now it's working....

Pinging squealer1.emtex.com [152.144.155.231] with 32 bytes of data:

Reply from 152.144.155.231: bytes=32 time<1ms TTL=128
Reply from 152.144.155.231: bytes=32 time<1ms TTL=128
Reply from 152.144.155.231: bytes=32 time<1ms TTL=128
Reply from 152.144.155.231: bytes=32 time<1ms TTL=128

ususally this doesn't work and I get the same thing as when I ping
squealer2


Pinging squealer2.emtex.com [152.144.155.232] with 32 bytes of data:

Reply from 152.144.155.232: bytes=32 time<1ms TTL=128
Request timed out.
Request timed out.
Request timed out.

Ping statistics for 152.144.155.232:
Packets: Sent = 4, Received = 1, Lost = 3 (75% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms

first ping works and then fails.


This is all from my workstation .200

The administrative and sql IP address both ping fine from my
workstation.
They work fine whichever node is active. I can ping squealer2 from
squealer1 fine. Other workstations on the network are similar, but
seemly
randomly can only ping either squealer1 or squealer2. No one can ping
both.
I have tried plugging them into different switches, flushing ARP,
everything
other than actually changing the IP addresses as that would cause mayhem
for
a while, in my experience anyway, it seems to take quite a while for the
cluster to figure out who it is again and I always seem to do something
in
the wrong order.... and I'm not convinced that it would fix this
problem.

I post ipconfig /all from both machines below:


L:\>ipconfig /all

Windows IP Configuration

Host Name . . . . . . . . . . . . : squealer1
Primary Dns Suffix . . . . . . . : emtex.com
Node Type . . . . . . . . . . . . : Unknown
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : emtex.com

Ethernet adapter Private:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel(R) PRO/1000 MT Network
Connection #
2
Physical Address. . . . . . . . . : 00-14-22-23-D2-81
DHCP Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 10.51.0.1
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . :

Ethernet adapter Public:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel(R) PRO/1000 MT Network
Connection
Physical Address. . . . . . . . . : 00-14-22-23-D2-80
DHCP Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 152.144.155.234
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IP Address. . . . . . . . . . . . : 152.144.155.230
Subnet Mask . . . . . . . . . . . : 255.255.255.0
IP Address. . . . . . . . . . . . : 152.144.155.231
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 152.144.155.254
DNS Servers . . . . . . . . . . . : 152.144.155.222
152.144.155.245

L:\>

and


Ethernet adapter Private:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel(R) PRO/1000 MT Network
Connection #
2
Physical Address. . . . . . . . . : 00-14-22-23-D2-2A
DHCP Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 10.51.0.2
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . :

Ethernet adapter Public:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel(R) PRO/1000 MT Network
Connection
Physical Address. . . . . . . . . : 00-14-22-23-D2-29
DHCP Enabled. . . . . . . . . . . : No
IP Address. . . . . . . . . . . . : 152.144.155.232
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 152.144.155.254
DNS Servers . . . . . . . . . . . : 152.144.155.222
152.144.155.245

node 1 is obviously active in this scenario.

I can't see anything obviously wrong with this. I have anoher custer
which
has more services, in the same network, basic configuration the same,
it's
running SQL and web servers on one node and file services on the other
node,
active/active with a SAN storage device... but basic configuration is
the
same as far as node IP numbers etc.

Has anyone seen this sort of hehaviour? I don't get any network address
conflict messages anywhere. I might try to isolate it to it's own
segment
later to see if it could be any form of ip address conflict, but I don't
think it is.

Any ideas?




.



Relevant Pages

  • Re: Vista clients became unresponsive after network move
    ... The computers detected a new network, ... Connection-specific DNS Suffix  . ... you must change the DHCP scope to match your new subnet. ...
    (microsoft.public.windows.server.networking)
  • Re: Vista clients became unresponsive after network move
    ... was mentioned that DHCP wasn't used, ... If the configured reverse lookup zone is empty you have to check ... network connection. ... Connection-specific DNS Suffix. ...
    (microsoft.public.windows.server.networking)
  • Re: How to change the heartbeat rate or should I?
    ... Communication between Server Cluster nodes is critical for smooth cluster ... each cluster network must fail independently of all other ... traffic from the network adapter that is set to Internal Cluster ...
    (microsoft.public.windows.server.clustering)
  • Re: Vista clients became unresponsive after network move
    ... If the configured reverse lookup zone is empty you have to check ... The computers detected a new ... network connection. ... Connection-specific DNS Suffix. ...
    (microsoft.public.windows.server.networking)
  • RE: cluster completely unavailable
    ... The network peoples says that switches was ok, network was ok, dns and wins ... Only the cluster suffered from this situation. ... completed update seq 225906 type 2 context 15 ...
    (microsoft.public.windows.server.clustering)

Loading