Re: Deadlock between Distribution Agent and Distribution Agent Cle
- From: "Patrick Ikhifa" <patisi@xxxxxxxxxxx>
- Date: Sun, 1 Jan 2006 22:27:03 -0800
Your last comment maybe the cue. If indeed it is stored procedure
replication, then you have no way to control the impact the execution of a
stored procedure at the subscriber will have. Especially if the proc touches
on many tables. Where as the sequence is garanteed at the publisher it may
not be at the subcriber. If as you say you do not have much control over the
application because it is a third party application, then you maybe stuck.
However, what is not clear is whether the subscriber is also involved in the
third party application. If it is not, I would look at transactional, one
way, replication to the subscriber and the possibility of moving the
sbscription DB to another server if that is an option as well. HTH.
"Andrew Pike" <AndrewPike@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:386390FB-46C9-4C93-BC8C-24DC56F13434@xxxxxxxxxxxxxxxx
> I'm back in the office on Wednesday; I'll try to run a trace then.
>
> However, returning to the original SQL involved in the deadlock as given
> below:
>
> select @max_xact_seqno = max(xact_seqno) from MSrepl_commands
> (READPAST)
> where
> publisher_database_id = @publisher_database_id and
> command_id = 1 and
> type <> -2147483611
>
> Assuming this is executed at the default isolation level of READ COMMITTED
> (I don't have access to a server at the moment to check), the shared locks
it
> obtains will be released at the end of its execution. Therefore why would
> long running replicated transactions/stored procedure executions have
> an impact? This part of the stored procedure sp_MSget_repl_commands is
> near to the top of the code and is used to find the most recent command
> written by the Log Reader for replication to subscribers. This is then
used
> as an upper bound for further queries.
>
> Kind Regards
> Andrew Pike
> --
> SQL Server DBA
> Accenture UK
>
>
>
> "Hilary Cotter" wrote:
>
> > The complexity of the procs on both sides (they can be different, they
just
> > have to have the same name) and the indexes in place on both sides will
have
> > an impact on performance, and hence the amount of time it takes to apply
the
> > transactions/commands/execute the stored procedures.
> >
> > You might want to run profiler and watch for recompiles, duration, CPU
and
> > IO.
> >
> > --
> > Hilary Cotter
> > Looking for a SQL Server replication book?
> > http://www.nwsu.com/0974973602.html
> >
> > Looking for a FAQ on Indexing Services/SQL FTS
> > http://www.indexserverfaq.com
> >
> > "Andrew Pike" <AndrewPike@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
> > news:91AC92FD-A6BA-4E61-9A72-7E82F83AAEA4@xxxxxxxxxxxxxxxx
> > > Our current replication topology involves a single subscriber - in
fact
> > > the
> > > subscription database is located on the same instance as the
publication
> > > database.
> > > All of the articles contained within the publication are stored
procedure
> > > executions.
> > > This is a component of a 3rd party application that I cannot alter,
> > > although
> > > I may be
> > > able to suggest improvements to the vendor.
> > >
> > > Kind Regards
> > > Andrew Pike
> > > --
> > > SQL Server DBA
> > > Accenture UK
> > >
> > >
> > >
> > > "Hilary Cotter" wrote:
> > >
> > >> The commands are shared by multiple subscribers. So when the commands
are
> > >> distributed to all subscribers they are marked for deletion. For a
single
> > >> subscriber your conclusions are accurate.
> > >>
> > >> Note that the delivered commands in distribution db can vary widely
as
> > >> some
> > >> transactions affect more rows than others. Then some transactions
take
> > >> longer to be applied on the subscriber, especially text ones. You
might
> > >> want
> > >> to look at replicating the execution of stored procedures. If a large
> > >> part
> > >> of your DML affects a large number of rows (in other words a single
proc
> > >> will affect more than a singleton) you can get substantial
performance
> > >> improvements by doing so.
> > >>
> > >> --
> > >> Hilary Cotter
> > >> Looking for a SQL Server replication book?
> > >> http://www.nwsu.com/0974973602.html
> > >>
> > >> Looking for a FAQ on Indexing Services/SQL FTS
> > >> http://www.indexserverfaq.com
> > >>
> > >> "Andrew Pike" <AndrewPike@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in message
> > >> news:7FE114CB-2234-4053-B5D1-CBD47EBF0484@xxxxxxxxxxxxxxxx
> > >> > Thanks for the speedy reply Hilary,
> > >> >
> > >> > The subscription in question is a named push subscription -
anonymous
> > >> > pull
> > >> > subscriptions are not enabled.
> > >> >
> > >> > I ran SELECT * FROM distribution.dbo.MSdistribution_status as
> > >> > requested.
> > >> > There were two rows in the result set, both of which had a value of
0
> > >> > for
> > >> > UndelivCmdsInDistDB, while the values of DelivCmdsInDistDB were 2
and
> > >> > 7.
> > >> > A
> > >> > couple of minutes later this query returned no rows - the Cleanup
job
> > >> > had
> > >> > run
> > >> > I presume. Would I be correct in stating that the sum of the
> > >> > DelivCmdsInDistDB column will correlate with the number of commands
> > >> > purged
> > >> > by
> > >> > the Cleanup job?
> > >> > The job history of the last execution reads:
> > >> >
> > >> > Removed 9 replicated transactions consisting of 9 statements in 0
> > >> > seconds
> > >> > (0
> > >> > rows/sec)
> > >> >
> > >> > It certainly looks that way.
> > >> > These don't look to be particularly high numbers to me, although it
is
> > >> > just
> > >> > past midnight here! The deadlock does occur at many different
times
> > >> > though.
> > >> > The job history that relates to the last occurrence of the deadlock
is
> > >> > as
> > >> > follows:
> > >> >
> > >> > Removed 31 replicated transactions consisting of 9669 statements in
2
> > >> > seconds (4850 rows/sec)
> > >> >
> > >> > You're right about the READPAST hint, it's there to prevent
blocking
> > >> > caused
> > >> > by the Log Reader Agent. I suppose the PAGLOCK hint has the
> > >> > alternative
> > >> > of
> > >> > ROWLOCK, although as you suggest Microsoft have almost certainly
tested
> > >> > this
> > >> > to the nth degree. ROWLOCK would undoubtedly incur much greater
lock
> > >> > resources given very large volumes of replicated commands.
> > >> >
> > >> > Kind Regards,
> > >> > Andrew Pike
> > >> > --
> > >> > SQL Server DBA
> > >> > Accenture UK
> > >> >
> > >> >
> > >> >
> > >> > "Hilary Cotter" wrote:
> > >> >
> > >> >> Do you have anonymous subscribers or named. With named subscribers
the
> > >> >> distribution clean up agent cleans up more aggressively and you
may
> > >> >> see
> > >> >> problems like this when a subscriber has been offline for some
time.
> > >> >>
> > >> >> First off issue a select * from
distribution.dbo.MSdistribution_status
> > >> >> to
> > >> >> see how many undelivered vs delivered commands there are. If there
are
> > >> >> a
> > >> >> high number of delivered commands, I would stop the SQL Server
Agent
> > >> >> and
> > >> >> run
> > >> >> the distribution clean up agent manually.
> > >> >>
> > >> >> I can't comment on why the decision was made to implement the two
> > >> >> types
> > >> >> of
> > >> >> locks, but in general MS has done a lot of research to deliver
optimal
> > >> >> performance. For example the 27 in sp_MSadd_repl_commands27 comes
> > >> >> from
> > >> >> tests that they did to find the optimal number of commands to send
to
> > >> >> the
> > >> >> distribution database in a batch from the log reader agent. And
yes,
> > >> >> they
> > >> >> tested a range of commands to find which offered best performance.
> > >> >>
> > >> >> It looks like the readpast is to prevent locking, and the page
lock is
> > >> >> to
> > >> >> prevent a table lock.
> > >> >>
> > >> >> --
> > >> >> Hilary Cotter
> > >> >> Looking for a SQL Server replication book?
> > >> >> http://www.nwsu.com/0974973602.html
> > >> >>
> > >> >> Looking for a FAQ on Indexing Services/SQL FTS
> > >> >> http://www.indexserverfaq.com
> > >> >>
> > >> >> "Andrew Pike" <AndrewPike@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote in
message
> > >> >> news:CFB322D7-1D17-4064-AB36-6338A49C90A4@xxxxxxxxxxxxxxxx
> > >> >> > This is occurring regularly on SQL Server 2000 build 878.
> > >> >> >
> > >> >> > The problem is a deadlock in the Distribution database. The
> > >> >> > Distribution
> > >> >> > Agent spid is executing the SELECT statement below:
> > >> >> >
> > >> >> > select @max_xact_seqno = max(xact_seqno) from MSrepl_commands
> > >> >> > (READPAST)
> > >> >> > where
> > >> >> > publisher_database_id = @publisher_database_id and
> > >> >> > command_id = 1 and
> > >> >> > type <> -2147483611
> > >> >> >
> > >> >> > which is found in sp_MSget_repl_commands. It holds an Intent
Shared
> > >> >> > page
> > >> >> > lock on a data page in the MSrepl_commands table.
> > >> >> >
> > >> >> > The Distribution Agent Cleanup spid is found to be running the
> > >> >> > command
> > >> >> > below:
> > >> >> >
> > >> >> > DELETE MSrepl_commands WITH (PAGLOCK) where
> > >> >> > publisher_database_id = @publisher_database_id and
> > >> >> > xact_seqno <= @max_xact_seqno
> > >> >> >
> > >> >> > located in the stored procedure sp_MSdelete_publisherdb_trans.
This
> > >> >> > spid
> > >> >> > holds an exclusive page lock on another data page in
> > >> >> > MSrepl_commands.
> > >> >> >
> > >> >> > Both spids then attempt to obtain the same lock type on the page
> > >> >> > which
> > >> >> > is
> > >> >> > locked by the other.
> > >> >> >
> > >> >> > The Distribution Agent runs continuously and the Cleanup job is
> > >> >> > scheduled
> > >> >> > for every 10 minutes. The Publication, Distribution and
> > >> >> > Subscription
> > >> >> > databases are all on the same instance (3rd party vendor
solution,
> > >> >> > not
> > >> >> > mine!)
> > >> >> > in an active/active Win2003 cluster configuration. The articles
are
> > >> >> > all
> > >> >> > stored procedure executions.
> > >> >> >
> > >> >> > Has anybody else seen this deadlock? Is it just a timing issue?
> > >> >> > Why
> > >> >> > is
> > >> >> > the
> > >> >> > PAGLOCK hint used in sp_MSdelete_publisherdb_trans as above?
> > >> >> > (I can't find any articles which correlate exactly to this
problem)
> > >> >> >
> > >> >> > Kind Regards
> > >> >> > Andrew Pike
> > >> >> > --
> > >> >> > SQL Server DBA
> > >> >> > Accenture UK
> > >> >> >
> > >> >>
> > >> >>
> > >> >>
> > >>
> > >>
> > >>
> >
> >
> >
.
- Follow-Ups:
- Re: Deadlock between Distribution Agent and Distribution Agent Cle
- From: Andrew Pike
- Re: Deadlock between Distribution Agent and Distribution Agent Cle
- References:
- Re: Deadlock between Distribution Agent and Distribution Agent Cle
- From: Andrew Pike
- Re: Deadlock between Distribution Agent and Distribution Agent Cle
- Prev by Date: Re: SQL Server 2005 Sep CTP Web Synchronization Problem
- Next by Date: Re: Transaction replication question
- Previous by thread: Re: Deadlock between Distribution Agent and Distribution Agent Cle
- Next by thread: Re: Deadlock between Distribution Agent and Distribution Agent Cle
- Index(es):
Relevant Pages
|