Re: active directory replication



[FYI: You need a better (newsreader OR) quoting mechanism.]

"rodge" <rodge@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:B3613203-91F5-48F2-9E21-E75F98739FF7@xxxxxxxxxxxxxxxx
> I'm sure you have every opportunity to gain the knowledge needed for
> everything you do, but in my situation, our network admin was fired and I
> was
> put into the position having very little knowledge or experience with
> active
> directory. I also have not been given any opportunity to get the training
> I
> need other than a very helpful technet subscription, part of which pays
> for
> me to ask you questions.

Not sure what the above is about but I am virtually completely
self-taught myself; TechNet is good (I am probably it's prime
evangelist over the years), but the subscription is no longer
essential since the built-in help is so good and the MS web site
has most anything on TechNet CD.

> Now given that info, I am going to try and respond to your comments one by
> one.
>
> "Reports "about slowness" are seldom a reason to take
> drastic action, BUT they are a reason for investigation
> perhaps...."
>
> I have actually been to every one of these sites many times over the past
> 6
> months and there is definitely a slowness issue and it is not slow all of
> the
> time, the slowness seems to recover after 2 hours.

Recover AFTER two hours? Not go bad (briefly) every two hours?

Again, "Slowness" without specifics is IMPOSSIBLE to
fix through any sort of reliable procedure (e.g., you might
get LUCKY by accident if you through money at the problem
but you wouldn't know without trying it.)

Slow in what way? How specifically? Slow in comparison
with what?

> Ironically(and possibly
> unrelated) that's how often each site is setup to replicate. I am not sure
> how to determine when replication will time out if not succeeding, i.e. if
> something is causing packets to continually drop and no response is sent
> back
> to the main site, but I'm going to guess it may be 2 hours.

Monitoring with Netmon will work even if it isn't optimum.


> I have been able to determine that this is being caused by replication.
> "How? When? What are the actual symptoms?"
>
> Based on the times the slowness hits and lasts until, and the following
> eventids in the event viewer of the main office domain controller, the
> primary holder of all 5 fsmo roles: 1566, 1311, 1865, and 13508.

No, you haven't even come close to determining the cause.

Example: Replication every two hours would cause a problem
for a (limited) interval every two hours, not a problem that LASTED
two hours.

Unless the problem were continuous, in which case you wouldn't
see a two hour period, just trouble all of the time.

> "What do your sites and SITE LINKS look like?"
>
> what do they look like? Could you be a little more specific? They are
> "setup" with a site for each branch office, i.e. Ballenger is site one and
> contains the domain controller called ballenger. This site has it's own
> subnet and is connected by a cisco 2600 router over our WAN. Each site has
> a
> site link to the main office domain controller, maindc.

What are the Schedule (24 hours/7 days?) and the Frequency (2 hours)?

> "On the face of it, neither of these changes will change the
> AMOUNT of replication to each site. Certainly not "picking"
> the bridgehead server which will just make the replication
> less reliable, although it MIGHT reduce the load on THAT
> bridgehead DC (but this only makes sense if you are doing a
> LOT of replication.)
>
> Not sure what you mean by on the face of it, but I think that there are
> two
> main types of replication we are dealing with here, scheduled replication
> and
> immediate(security) replication.

But changing the things you mentioned (Bridgehead DC) will
NOT affect the amount of replication ("on the face of it" meant
there is NO reason to expect to see that.)

Immediate replication is almost NIL for reasonable size domains;
but you still haven't given us the size of the domain, the bandwidth
on these lines (you did indicate they were LESS than 512 Kbps),
nor the AVAILABLE bandwidth.....

> While I am not even claiming to be an
> expert, it appears to me that the replication I am having issues with is
> the
> scheduled replication.

IF replication is an issue which is possible but pretty much a
random guess so far...

> If I am right, then the solution of making regions
> into sites would certainly cut down on replication across the WAN to the
> main
> site because instead of the maindc replicating with each dc on the WAN, he
> would only replicate with the bridgehead servers I appoint for each
> region.

There are no "regions" in AD, you would have to join sites or set
your own CONNECTIONS (not Bridgehead servers).

You do NOT want a Site to span a (slow) WAN line.
[In fact, you almost never want a Site to span a fast WAN line either.]

Are these Sites joined better to EACH other than to the Main Site?

If so, your SiteLinks are wrong. (You have described a Central-HUB
type SiteLink topology which SHOULD be used for a comparable
PHYSICAL WAN topology.)

By replicating to other Sites (than Main) you might overcome SOME
types of bottlenecks into the MainSite, but you don't seem to have a
problem with MAIN, but rather with the branch sites/locations.

Eventually, all of those changes have to replicate with Main anyway.
Generally you SiteLinks should follow your physical WAN links but
you claimed some (unspecified) type of Full Mesh. What specifically?

> Those bridgehead servers would then replicate changes to the other dc's in
> thier site. I also believe that the same would be true of
> immediate(security)
> related replication as well.

> "No, since you didn't describe the network except to say "full mesh"
> which IMPLIES that all sites have the same bandwidth to each
> other but doesn't really state that."
>
> Actually, fully meshed network simply means that every node has a direct
> connection to every other node, it doesn't mean bandwidth is the same.

So there is a physical WAN link between EACH pair of sites?
Doubtful. (That would be SUM of 1...24 or 300 WAN links.)

My guess is you have some sort of FrameRelay but even then you
probably don't have 300 virtual circuits (enabled.)

> Our
> slowest sites(10 of them) connect at 256K, we also have 384K and 512K
> sites.
> All sites need to get to the internet through the main office.

You are far more likely to be experiencing some sort of
problem which the Internet usage is causing.

That doesn't explain the "2 hour duration" but it is far more likely.

Have you even considered monitoring the traffic to see if bandwidth
usage is EVEN AN ISSUE (it probably is but you don't know until
you measure) and WHAT traffic that is? (It might be AD replication
but you don't have any evidence of that YET.)


> Each site has
> a dc that runs ad integrated DNS, maindc provides dns for everyone. Each
> dc
> lists maindc as primary dns and itself as secondary.

How many Users and Computers (roughly)? Hundreds, thousands,
etc?

Are you registering and de-registering thousands of portables
in DNS FREQUENTLY?

> "What do your SITE LINKS look like? (Which Sites and
> Cost, Schedule, Frequency?)"
>
> All sites are setup the same, they all link to maindc, cost = 100,
> replication interval = 180.

Then you aren't even replicating every 2 hours, but RATHER
every THREE HOURS.

> I've never looked at the schedule before until
> just now, but I took a look at the change schedule button of two site
> links
> and noticed that one site has weekdays from 8 AM until 8 PM as replication
> not available.

So it won't replicate DAYS and it will TEND to show a replication
SPIKE after 8 PM.

> Not sure why that is setup that way and not sure what issues
> this will cause, but this is a 256K site and a site that has major
> slowness
> during the day.

And so replication is NOT the problem for that daytime
"slowness" since this SiteLink doesn't replicate days.


> "In general, you site links should follow your PHYSICAL
> connections, and use only your "best" physical connections
> OR use costs to PREFER those "best" WANS."
>
> Okay, and that is how it was setup. I wasn't here when it was setup, but I
> did have some issues last year and worked with a tech from Microsoft over
> the
> phone over a period of 3 days to clean everything up, so I did get some
> valuable insight from him as to how things "should" be setup. I am not
> real
> familiar with costs,

Costs are pretty irrelevant with Hub and Spoke since there wouldn't
be "multiple paths" -- you use Costs to get the KCC to prefer one
SiteLink over another when there is more than one choice.

If every site links ONLY to Main then they are going to replicate there
anyway (as long as MainDC is available.)

> my only real exposure to this was one Microsoft webcast
> which briefly touched on the fact that for slower WAN links there are ways
> to
> cut down on replication through costs.

Not really. It doesn't "cut down" but rather PREFERS one WAN
over another when there are multiple choices (for complete
replication.)

When it REPLICATES over a line, you get pretty much the
same traffic no matter the costs (the costs just ENCOURAGES
the KCC to use one SiteLink rather than another.)

> "How many users/computers do you have? How much
> (available) bandwidth?"
>
> Approximately 300-350 users and computers, excluding servers. I do not
> know
> how to calculate available bandwidth.

You cannot "calculate" available bandwidth (in most real world
cases); you must rather MEASURE it.

Use a NetMon or other traffic monitor.

> "If you aren't changing a LOT of users, why is replication
> hurting you?"
>
> if I knew the answer to that question, I would need to post here.

It's not a replication problem MOST LIKELY.

300 hundred users to replicate the ENTIRE AD and you
probably wouldn't see must problem. (300 user x 4000KB
is about 1.2 MB or 12 Mbits; which would take about
12000/250kbs == 50 SECONDS -- double it for computers
which is a big overestimate, double it again "just because"
and you still get the WHOLE database in about 5 minutes.)

And of course AD doesn't replicate an entire account (4k),
but rather (some few hundred bytes of) the CHANGED portion.

Only a DC promotion would replicate the entire AD -- and
that should still come from another LOCAL DC unless it
were the first DC in the Site.


> Are you using DFS (across the WANS)?
>
> I believe this is true. I believe the sysvol folder is replicated and it
> does contain a huge amount of logon batch files(scripts), which could
> possibly be getting changed some, but I wouldn't think enough to have an
> effect this great, but I am very unfamiliar with dfs.

DFS is MUCH more likely your problem.

Look at this: If you have multi-Megabyte files and a user changes
one that is DFS replicated then you would have VASTLY more
replication from this than from AD.

> Delays when you create new accounts (computers and users), or
> the (increased) need to 'reach across the WAN when resetting
> remote user passwords etc.'.
> (That is, delays in replication will mean that such changes don't
> immediately propagate from wherever they are done to where
> they may be needed.)
>
> I received a book from Microsoft that states something a little contrary
> to
> what you say here. It says that security related items(unlocking locked
> accounts, resetting passwords, etc) are replicated immediately. So,
> doesn't
> that mean that scheduling replication at night would not really affect
> those
> types of things?

Mostly "urgent replication" is for NOTIFICATION (based)
replication which means "Same Site" -- unless you setup
urgent replication between sites (advanced registry change.)

Password changes DO try to replicate to the PDC Emulator
EVEN across sites without regard to schedule but then they
replicate FROM there normally so it isn't domain wide.

Even if all 300 users changed there password EVERY DAY
this would probably go unnoticed.

> "This implies you don't have a strong grasp of SiteLinks and
> replication so first tell us about your Site and ESPECIALLY
> your Subnets, SiteLinks including your Costs, Schedule, and
> Frequency settings."
>
> Honestly, if I had a strong grasp of anything, I wouldn't be here. What
> would you like to know about the subnets? The branch sites use class c
> subnets, everything is a 10.0.(some number that identifies the site, i.e.
> 49,
> 49, etc.).machine ip 255.255.255.0, I've mentioned the other info earlier.

Numbers don't matter (except for examples) -- you gave me
most of what I wanted above (frequency and schedule) and
you said you had a subnet for each Site, but seeing that would
be the only "numbers" that really matter AND ONLY if you
had made some weird mistake in setting up the Site<-->Subnets.

> "IF you have setup your Sites and Site Links correctly then
> most replication issues are DNS based.
>
> IF I have then how about some help with troubleshooting dns issues,
> PLEASE.

YOU IGNORED that from my previous message:

You should also run DCDiag on each DC (and capture the
output to a test file where you search for FAIL, WARN,
and ERROR.)

Correct or report here all problems you find with DCDiag.

That will (largely) TROUBLESHOOT DNS and AD replication.

You probably don't have a DNS issue, and most likely not an
AD replication PERFORMANCE problem -- at least not based
on your reports so far.

If you want to learn the MOST IMPORTANT skill in troubleshooting
it is to be VERY SPECIFIC.

If you don't think you are "getting your question answered" on
this 'issue' then read this again CAREFULLY, and note that
most of the problem in helping you is NOT your "technical
knowledge of AD/Microsoft etc" but the lack of SPECIFICITY
in the report.

And for performance problems that means MEASURE, as
well as INSPECT the traffic (if it's a net issue). (Netmon or
something similar.)

No one can fix, "It's slow...". Not even if they are standing there
next to you.

They would first have to figure out what "It's slow" means.

Then isolate the components to see which one is causing or
can improve the situation.


--
Herb Martin, MCSE, MVP
Accelerated MCSE
http://www.LearnQuick.Com
[phone number on web site]


>> "rodge" <rodge@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
>> news:5FF70C8C-7F3F-48AE-9052-C76F04A13EB4@xxxxxxxxxxxxxxxx
>> > We are using a mix of windows 2000 and 2003 servers for domain
>> > controllers.
>> > We have 25 remote sites on a fully meshed network. I am hearing
>> > complaints
>> > about slowness throughout the day from all remote sites.
>>
>> Reports "about slowness" are seldom a reason to take
>> drastic action, BUT they are a reason for investigation
>> perhaps....
>>
>> > I have been able to
>> > determine that this is being caused by replication.
>>
>> How? When? What are the actual symptoms?
>>
>> What do your sites and SITE LINKS look like?
>>
>>
>> > I believe I have 2
>> > solutions, but want to make sure of any issues before I attempt to
>> > implement.
>> > The way our sites were set up has 25 sites in sites and services. I
>> > believe
>> > it may be more advantageous to combine some of the sites into regions
>> > and
>> > appoint ip bridgehead servers for the new sites(regions).
>>
>> On the face of it, neither of these changes will change the
>> AMOUNT of replication to each site. Certainly not "picking"
>> the bridgehead server which will just make the replication
>> less reliable, although it MIGHT reduce the load on THAT
>> bridgehead DC (but this only makes sense if you are doing a
>> LOT of replication.)
>>
>> > This should cut down on a great deal of traffic.
>>
>> Why and how do you think this will happen?
>>
>> > Currently all sites do not have equal
>> > bandwidths, but by year end, all sites will have 512 connectivity to
>> > the
>> > main
>> > office. Does this sound like a valid solution?
>>
>> No, since you didn't describe the network except to say "full mesh"
>> which IMPLIES that all sites have the same bandwidth to each
>> other but doesn't really state that.
>>
>> What do your SITE LINKS look like? (Which Sites and
>> Cost, Schedule, Frequency?)
>>
>> In general, you site links should follow your PHYSICAL
>> connections, and use only your "best" physical connections
>> OR use costs to PREFER those "best" WANS.
>>
>> How many users/computers do you have? How much
>> (available) bandwidth?
>>
>> If you aren't changing a LOT of users, why is replication
>> hurting you?
>>
>> Are you using DFS (across the WANS)?
>>
>> > Also, I noticed in reading
>> > that it is common to have scheduled replication during non-peak hours.
>>
>> THAT might make more sense if you truly have a replication
>> problem...
>>
>> > We
>> > have very little activity at night, so I was wondering what sort of
>> > issues
>> > I
>> > would run into if I scheduled replication to occur only at night?
>>
>> Delays when you create new accounts (computers and users), or
>> the (increased) need to 'reach across the WAN when resetting
>> remote user passwords etc.'.
>>
>> (That is, delays in replication will mean that such changes don't
>> immediately propagate from wherever they are done to where
>> they may be needed.)
>>
>> > I am also not sure how to go about this.
>>
>> This implies you don't have a strong grasp of SiteLinks and
>> replication so first tell us about your Site and ESPECIALLY
>> your Subnets, SiteLinks including your Costs, Schedule, and
>> Frequency settings.
>>
>> You should also run DCDiag on each DC (and capture the
>> output to a test file where you search for FAIL, WARN,
>> and ERROR.)
>>
>> Correct or report here all problems you find with DCDiag.
>>
>> IF you have setup your Sites and Site Links correctly then
>> most replication issues are DNS based.
>>
>> --
>> Herb Martin, MCSE, MVP
>> Accelerated MCSE
>> http://www.LearnQuick.Com
>> [phone number on web site]
>>
>>
>>


.


Loading