Re: Jobs don't run and are stuck with request pending

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance

From: Scott (Scott_at_discussions.microsoft.com)
Date: 11/15/04


Date: Mon, 15 Nov 2004 05:54:06 -0800

How specifically would you like us to capture the trace?

"Sue Hoegemeier" wrote:

> There are fixes in MS03-031 as well as the security patches.
> It's a good idea to install it even though I doubt it has
> anything to do with your issues.
> All the threading looks fine. The jobs are in the job cache
> and are fine. It's just something at the end of running the
> jobs that hangs up.
> I'd still suspect some contention somewhere. Or something is
> erroring out. It will likely be a pain with all the jobs you
> have running but you probably need to run a trace on the
> jobs after getting everything running again - even if it's
> for just one run on some of the jobs. You'll need to use
> that to track down where the issues are. You could try
> catching whatever is going on or additional info by querying
> sysprocesses as well while you run a trace. Watch for wait
> times and blocking in sysprocesses. Watch for errors and
> capture both starting and completing statements in the trace
> so you can see if something timeouts or never completes.
>
> -Sue
>
> On Thu, 11 Nov 2004 14:06:50 -0800, "Scott"
> <Scott@discussions.microsoft.com> wrote:
>
> >1) Yes we are on SP3a for 2000 Enterprise.
> >
> >2) No we didn't have the patch. We will install tonight but it just looks
> >like a security patch and does not look like it is related to the problem. ???
> >
> >3) Here is the output for the dbcc command when the jobs are "stalled":
> >
> >
> >Scheduler ID 0
> > num users 6
> > num runnable 0
> > num workers 12
> > idle workers 11
> > work queued 0
> > cntxt switches 3975785
> > cntxt switches(idle) 9990418
> >Scheduler ID 1
> > num users 6
> > num runnable 0
> > num workers 12
> > idle workers 11
> > work queued 0
> > cntxt switches 1.01E+07
> > cntxt switches(idle) 2.31E+07
> >Scheduler ID 2
> > num users 7
> > num runnable 0
> > num workers 12
> > idle workers 11
> > work queued 0
> > cntxt switches 7623701
> > cntxt switches(idle) 1.17E+07
> >Scheduler ID 3
> > num users 6
> > num runnable 0
> > num workers 12
> > idle workers 11
> > work queued 0
> > cntxt switches 5559495
> > cntxt switches(idle) 1.04E+07
> >Scheduler ID 4
> > num users 5
> > num runnable 0
> > num workers 12
> > idle workers 11
> > work queued 0
> > cntxt switches 1.28E+07
> > cntxt switches(idle) 1.75E+07
> >Scheduler ID 5
> > num users 5
> > num runnable 0
> > num workers 12
> > idle workers 11
> > work queued 0
> > cntxt switches 1.34E+07
> > cntxt switches(idle) 1.52E+07
> >Scheduler ID 6
> > num users 5
> > num runnable 1
> > num workers 11
> > idle workers 9
> > work queued 0
> > cntxt switches 1.46E+07
> > cntxt switches(idle) 1.98E+07
> >Scheduler ID 7
> > num users 4
> > num runnable 0
> > num workers 11
> > idle workers 10
> > work queued 0
> > cntxt switches 2.49E+07
> > cntxt switches(idle) 1.77E+07
> >Scheduler Switches 0
> >Total Work 1.34E+07
> >
> >
> >It looks okay to me. What is your interpretation?
> >Is there anything else we can check?
> >
> >
> >
> >"Sue Hoegemeier" wrote:
> >
> >> Pretty much like you are hitting a max number of jobs but
> >> you aren't getting any errors on running out of threads on
> >> the subsystem or running out of worker threads on the
> >> server. Are you on all the latest service packs? Did you
> >> apply MS03-031?
> >> Did you check worker threads on the server itself? You can
> >> use dbcc sqlperf(umsstats)
> >>
> >> -Sue
> >>
> >> On Thu, 11 Nov 2004 12:24:05 -0800, "Scott"
> >> <Scott@discussions.microsoft.com> wrote:
> >>
> >> >They all got set to a date in the future.
> >> >Of the 5 jobs 1 ran and 4 never ran.
> >> >And the one that started running made another of the jobs (that were
> >> >running) stop.
> >> >
> >> >
> >> >
> >> >"Sue Hoegemeier" wrote:
> >> >
> >> >> On the jobs that are no longer running, what happens if you
> >> >> update the schedule with:
> >> >> sp_update_jobschedule
> >> >> @job_name = 'YourJob',
> >> >> @name = 'ScheduleNameforJob'
> >> >> Does the next run time get set to a time in the future vs.
> >> >> in the past. And does the job then run on the next run time?
> >> >>
> >> >> -Sue
> >> >>
> >> >> On Thu, 11 Nov 2004 08:41:01 -0800, "Scott"
> >> >> <Scott@discussions.microsoft.com> wrote:
> >> >>
> >> >> >1) Event viewer was okay. log files were not full.
> >> >> >
> >> >> >2) Jobs are NOT set to run on idle CPU
> >> >> >
> >> >> >3) Everything about the job is enabled. If we disable other jobs the ones
> >> >> >that were not running start to run.
> >> >> >
> >> >> >4) There was no errors in the Agent log file.
> >> >> >
> >> >> >5) We turned on verbose option and saw that the jobs requested to run:
> >> >> >
> >> >> >Job Application ID: XXX Report Generation has been requested to run by
> >> >> >Schedule XXX (Run Every Minute)
> >> >> >
> >> >> >But did not see the "being logged to sysjobhistory" message when it
> >> >> >shouldhave completed running.
> >> >> >
> >> >> >Note: That we turned on verbose option then turned on the jobs. For each job
> >> >> >the request to run message happens once and never again.
> >> >> >
> >> >> >
> >> >> >
> >> >> >"Sue Hoegemeier" wrote:
> >> >> >
> >> >> >> When you say "The jobs that don't run have a next scheduled
> >> >> >> run time as the time they were suppose to run" do you mean
> >> >> >> dates and times in the past?
> >> >> >> A few other odd things to check: Make sure the Windows event
> >> >> >> logs aren't filled up and no longer logging. Make sure the
> >> >> >> jobs aren't set to run only on idle cpu conditions. Make
> >> >> >> sure everything in the job is enabled - job, schedules.
> >> >> >> For other issues you would typically find something in the
> >> >> >> SQL Agent log (SQLAgent.out file). You may want to turn on
> >> >> >> more verbose logging in Agent - from properties select
> >> >> >> Include execution trace messages. You only want to keep that
> >> >> >> on for troubleshooting though -especially if you have a lot
> >> >> >> of jobs on the box.
> >> >> >>
> >> >> >> -Sue
> >> >> >>
> >> >> >> On Wed, 10 Nov 2004 10:59:05 -0800, "Scott"
> >> >> >> <Scott@discussions.microsoft.com> wrote:
> >> >> >>
> >> >> >> >Okay I ran the sp_help_job several times
> >> >> >> >All the jobs are 1's and 4's. The ones that don't run are 4's which seems
> >> >> >> >odd to me considering that if you try to start it manually it says there is
> >> >> >> >already a request.
> >> >> >> >
> >> >> >> >
> >> >> >> >The jobs that don't run have a next scheduled run time as the time they were
> >> >> >> >suppose to run
> >> >> >> >
> >> >> >> >
> >> >> >> >"Sue Hoegemeier" wrote:
> >> >> >> >
> >> >> >> >> The T-SQL job subsystem defaults to 20. I doubt that is the
> >> >> >> >> issue as well - I'm pretty sure something gets logged in the
> >> >> >> >> SQL Agent log when you run out of job worker threads.
> >> >> >> >> You need to check the execution status of the jobs using
> >> >> >> >> sp_help_job. If there are execution statuses of 2, then it's
> >> >> >> >> related to worker threads. If there are execution statuses
> >> >> >> >> of 7, then the jobs are getting hung up performing something
> >> >> >> >> - sometimes it's on emailing results and mail gets hung up.
> >> >> >> >> There can be other reasons as well. The other thing to check
> >> >> >> >> would be the next scheduled run time. If the scheduler gets
> >> >> >> >> confused, this would be listed as not available or something
> >> >> >> >> like that. Can't remember the exact wording.
> >> >> >> >>
> >> >> >> >> -Sue
> >> >> >> >>
> >> >> >> >> On Wed, 10 Nov 2004 08:02:05 -0800, "Scott"
> >> >> >> >> <Scott@discussions.microsoft.com> wrote:
> >> >> >> >>
> >> >> >> >> >I doubt that it is the TSQL running out of worker threads since it is current
> >> >> >> >> >set at 200 and there are no more than 100 jobs running at any one time. But I
> >> >> >> >> >will verify.
> >> >> >> >> >I belive the default was 20 or 25 before I added the registry entry and
> >> >> >> >> >increased it to 200.
> >> >> >> >> >
> >> >> >> >> >The job history does not show anything out of the ordinary. If the job was
> >> >> >> >> >running and then stopped running it just shows the times that it ran. If the
> >> >> >> >> >job is created and does not run the first time then there is no job historyto
> >> >> >> >> >look at.
> >> >> >> >> >
> >> >> >> >> >There is no blocking or locks. Another thing that I tried is changing the
> >> >> >> >> >transaction isolation level on the sp running to do dirty reads and this did
> >> >> >> >> >not help either.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >Thanks again Sue for your suggestions.
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >"Sue Hoegemeier" wrote:
> >> >> >> >> >
> >> >> >> >> >> If it's actually the subsystem running out of worker
> >> >> >> >> >> threads, you could verify this by executing:
> >> >> >> >> >> sp_help_job @execution_status = 2
> >> >> >> >> >> If it is a worker thread issue, you have to increase the
> >> >> >> >> >> value of threads for the appropriate job subsystem depending
> >> >> >> >> >> on the jobs steps and what subsystem is used.
> >> >> >> >> >> If it's not related to threads, did you check the job
> >> >> >> >> >> history and check for any blocking?
> >> >> >> >> >>
> >> >> >> >> >> -Sue
> >> >> >> >> >>
> >> >> >> >> >> On Wed, 10 Nov 2004 07:13:05 -0800, "Scott"
> >> >> >> >> >> <Scott@discussions.microsoft.com> wrote:
> >> >> >> >> >>
> >> >> >> >> >> >1) Yes I rebooted the system to make sure after adding the registrsy entry
> >> >> >> >> >> >for the work threads
> >> >> >> >> >> >
> >> >> >> >> >> >2) Verifyed that the worker threads were set in SQL system after the reboot.
> >> >> >> >> >> >
> >> >> >> >> >> >3) I don't know if that subsystem is where the problem it. It is just one
> >> >> >> >> >> >thing that I tried.
> >> >> >> >> >> >
> >> >> >> >> >> >4) Nothing in the agent log file.
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> >"Sue Hoegemeier" wrote:
> >> >> >> >> >> >
> >> >> >> >> >> >> Did you restart SQL Agent after you created the registry key
> >> >> >> >> >> >> and changed the max worker threads in the registry for TSQL?
> >> >> >> >> >> >> Did you verify the max worker thread setting with
> >> >> >> >> >> >> sp_enum_sqlagent_subsystems? Is that the job subsystem that
> >> >> >> >> >> >> is being maxed out on threads?
> >> >> >> >> >> >> Anything in the SQL Agent log?
> >> >> >> >> >> >>
> >> >> >> >> >> >> -Sue
> >> >> >> >> >> >>
> >> >> >> >> >> >> On Wed, 10 Nov 2004 06:09:08 -0800, "Scott"
> >> >> >> >> >> >> <Scott@discussions.microsoft.com> wrote:
> >> >> >> >> >> >>
> >> >> >> >> >> >> >Hi,
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >I have a strange SQL Server 2K problem that I am looking for suggestions on
> >> >> >> >> >> >> >how resolve it.
> >> >> >> >> >> >> >In the SQL job scheduler I have 178 jobs. Of these around 80 of them are
> >> >> >> >> >> >> >scheduled to run every minute. We started to experience a problem where that
> >> >> >> >> >> >> >some of these jobs do not run (about 4 jobs) They get stuck in a pending
> >> >> >> >> >> >> >request status.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >System is an 8 CPU Win2K with SQL Server 2K SP3a Enterprise
> >> >> >> >> >> >> >CPU utilization never goes over 36% and memory (4GB) and disk I/O all look
> >> >> >> >> >> >> >fine.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >What I have tried:
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >1) I have increased the worker threads to 100 and 200 with no apparent
> >> >> >> >> >> >> >difference.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >2) Deleting the job and re-creating it is hit or miss. If the job does start
> >> >> >> >> >> >> >running after I have recreated it then another one of the 80 jobs stops. If
> >> >> >> >> >> >> >it doesn’t start running it goes into a pending state and trying to start the
> >> >> >> >> >> >> >job returns the following error:
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >Error 22020: SQL Server Agent Error: Request to run job XXX refused because
> >> >> >> >> >> >> >the job already has a pending request from Schedule XXX.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >3) I created a very simple job “select getdate()” and that is hit or miss
> >> >> >> >> >> >> >too. If it works then one of the other jobs stops running. If it does not run
> >> >> >> >> >> >> >it just stays in the pending state and attempts to start it return an error
> >> >> >> >> >> >> >like the one describe in #2
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >4) When the job is stuck in pending I can run the scheduled job(s) in the
> >> >> >> >> >> >> >Query Analyzer without any problems. (These jobs that run every minute only
> >> >> >> >> >> >> >take 1-5 seconds to run each)
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >5) I can disable a couple of jobs and then the pending ones start to run.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >It seems to me with what I have tried that these problems point to some kind
> >> >> >> >> >> >> >of threshold that I am hitting.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >Does anybody know what problem we are running into?
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >Thanks,
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >Scott
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >>
> >> >> >>
> >> >>
> >> >>
> >>
> >>
>
>



Relevant Pages

  • Re: "Invalid Connection Data" Message XBOX 360 acting as an MCE
    ... During this trace the PC was connected with a patch cable to a Linksys ... The capture using netcap has been sent. ... Install the Windows XP Service Pack 2 Support Tools from the following ...
    (microsoft.public.windows.mediacenter)
  • BizTalk Adapter for SQL Server failed
    ... I am using BizTalk Server 2002, and the adapter for SQL Server. ... as the jobs run just fine for some time ... Nothing gets read from the database, and the trace ... I have put a trace from the Adapter Trace Utility in ...
    (microsoft.public.biztalk.server)
  • Re: Call Trace: page allocation failure - is it normal behaviour?
    ... > I noticed some call trace when testing box under heavy load. ... > To create a load following jobs have been running simultaneously. ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: cc1plus invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0
    ... Call Trace: ... I just compiled the latest gcc snapshot a few days ... How many jobs did you run in parallel? ... time you compile such big c++ projects? ...
    (Linux-Kernel)
  • Re: "Invalid Connection Data" Message XBOX 360 acting as an MCE
    ... I am looking at your trace today and have a couple questions for you. ... This posting is provided "AS IS" with no warranties, and confers no rights. ... The capture using netcap has been sent. ... Install the Windows XP Service Pack 2 Support Tools from the following ...
    (microsoft.public.windows.mediacenter)