Re: DCpromo issue. Health check on AD and group policy.




"Garry Starck-MCITP Enterprise Admin" <vjsparx@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message news:86CEE86F-5BAF-4FBC-92D2-CA7129D83BBE@xxxxxxxxxxxxxxxx
Hi Meinholf and Hello IT Team Queensbridge.bham.sch.uk

Since Repadmin was not looking great to say the least, check FRS and AD evt
logs on the other intrastire DC's for failures creating connection objects
with NED. Presuming that NED was recently promo'd out and in again (I'm
really hoping)

I have a recollection of this issue after I removed a DC via DCPROMO and
within 20minutes I DCpromo'd the new Hardware in as the exact same name. What
happened then was GUID/CNAMES in DNS were 100% right for the new DC, but
every DC whether Intra or Intersite that was a direct replication partner
with the he renewed DC simply would not allow the new DC to create new
inbound connection objects (You can't even via manual methods). Every DC that
was a replication partner of the DC before removing it obviously continued
repl via KCC auto generated connection objects to another preferred bridge
head. I eventually found nothing on the internet to help, but what I did do
next was user repadmin /expertuser switch and users the following cowboy
trick (Int the LAB first, managed to replicate exact problem luckily):

/delrepsto <Naming Context> <DC> <Reps-To DC> <Reps-To DC GUID>
Examples:
Naming Context <DC=TESTDOM,DC=LOCAL >
<DC> done at each DC that was a pervious repl partner
<repsto dc> this will most definitely be NED in every run of the commands on
each old partners. And check intrasite DC's FRS/AD eventlogs on each DC to
see if their is an issue showing the old GUID/CNAME in the events. This GUID
will be the buid use supply for <Reps-To DC GUID>

Now I scripted this as the forest has over 200 DC's and due to lack of RAM /
perf on most DC's, KCC was not autogening connection obj's. 90 % of the dc's
used this DC as a bridgehead (Manually set seince we were still on 2000 AD
and it's hidden agenda, we had switched KCC &ISTG off and every connection
object was manual (This is how I know that not even a manual obj creation
helps to trick).

To add to my missery, when I spotted the errors after the new DC's promo. I
dcpromo'd out again and then there were now 2 wrong outdated GUID to remove.
I don't think the /delrepsto <Naming Context> <DC> <Reps-To DC> <Reps-To DC
GUID> way is complex, just guid's burnt into you retinas if manually done.
But you are small, so if this pie in the sky theory is write, each DC
Intrasite show hhave some eventlogs, hopefully showing the antiquated GUID's.
Since each other site had one or more DC's, only one is generally in need of
attention, the Bridgehead which KCC selects. KCC does the KCC thing every 15
minutes and will auto gen the new "true" connetion objects at those
intervals,

Also, who's the RID master, is he UP?

Root Cause Analysis of my issue, A bit of a thumb suck, I has just arrived
at the clients site and I have never seen the monetuos amount of linger
objects in AD, maybe that cintributed, I douted that, I then thought through
a personally created issue, I took the HDD's out of the old DC and added them
to the new server so as to mirrow the OS and current configs and then
promoted it in with 20mins. By this stage, the mirrors had completed sync and
pulled the old hdd's out. You may think this is menial, but in my VM labs, I
often promo one out then straigh back in, and have noticed similar issues
eventing. Apparently the now member server keeps it AD settings and what you
should do is promo it 1st into another new dummy.junk domain and promo it out
and reboot. All the "so called" domain history is now gone from registry etc.
I do not know what exacts around documented around that issue, maybe some of
the MVP can comment/ drill me/thrill me

Regards


Very interesting, and very VERY plausible. I've seen this happen before years ago in a 2000 domain, and without running numerous tests, I realized it before it got too far, when replication was failing. Looking at replication intervals where the removed DC's reference replication to other sites did not occur before promoting the new machine into the domain with the same name, caused the issue. Since this was a 2000 domain, there was no /forceremoval switch to work with, but not that it would have probably worked because of the identical names and two GUIDs. I pulled out the old DC and ran a Metadata Cleanup, and manually cleaned out DNS, Sites & Services, etc, and blew away the machine, and reinstalled it, but did not re-promote it until waiting a day, ran replmon, etc, to monitor all DCs to make sure there were no replication references.

As for registry settings, the only entry I am aware of would be the product type entry, whether it's a DC or not (HKLM\SYSTEM\CCS\Control\ProductOptions - only values would be either LanmanNT or ServerNT). Everything else is in the AD database as far as the GUID, etc, nut then again, there's the machine's TCP reg entries, as well as netlogon reg entry, which registers the GUID into DNS and AD database, which when demoted, the reg entry should get removed, as well as the DNS reg.

So if this is the case, and a /forceremoval doesn't work, I would think to unplug it, run Metadata Cleanup, and rebuild the machine from scratch.

But then again, there were other similar cases where I've seen similar issues where the customer updated one of their SonicWall routers wtih a new firmware that changed the MTU to 1492 from 1500. It took me two days to figure this one out. Apparently from researching it, LDAP/RPC traffic fails at anything less than 1500 MTU. We put the old firmware back on and replication started once again. This is one reason I advise customers to not use an ADSL service for a corporate link.

Then again, it could be a simple firewall rule blocking necessary ports, but I'm starting to think not because of the DNS issue I saw in the DNSLint report.

Awaiting to see the dcdiag and netdiags to see what they have to say...

But I like your theory, and it may just probably be the case. We'll need IT Team Queensbridge.bham.sch.uk to elaborate on what occured for a determination.

Cheers!

Ace

.



Relevant Pages

  • Re: replication topology
    ... 'other' sites for replication to/from each other? ... If there are alternative WAN Links then: ... KCC would treat the Site Links as transitivie Spoke-Hub-Spoke ... On the connection objects however, they appear to be configured to ...
    (microsoft.public.win2000.active_directory)
  • Re: replication topology
    ... The site links have a replication frequency of 60 minutes, ... On the connection objects however, they appear to be configured to allow ... b.and why would the KCC configure itself in this way.? ... If there are alternative WAN Links then: ...
    (microsoft.public.win2000.active_directory)
  • Re: replication topology
    ... If there are alternative WAN Links then: ... KCC would treat the Site Links as transitivie Spoke-Hub-Spoke ... rest of the world, if that breaks replication will stop, configured as ... On the connection objects however, they appear to be configured to ...
    (microsoft.public.win2000.active_directory)
  • Re: replication topology
    ... If there are alternative WAN Links then: ... KCC would treat the Site Links as transitivie Spoke-Hub-Spoke ... rest of the world, if that breaks replication will stop, configured as ... On the connection objects however, they appear to be configured to ...
    (microsoft.public.win2000.active_directory)
  • Re: wie kann ich testen, ob WS.2k DC ist oder nicht?
    ... Active Directory Replication Monitor ... Replicated because the replication partner is a ring ... Partner GUID: 3655CE20-8859-403D-BEDC-40892F5D8201 ... Object GUID: 3655CE20-8859-403D-BEDC-40892F5D8201 ...
    (microsoft.public.de.german.windows.server.general)

Loading