Re: DMX Query response time

Tech Tip: Click here to run a free scan for Windows Errors and optimize PC performance

From: Poch Reyes (Reyes_at_discussions.microsoft.com)
Date: 09/16/04


Date: Wed, 15 Sep 2004 22:47:05 -0700

Hi Peter,

Given this limitation of SQL Server data mining, can you suggest ways to
improve query performance? Do we upgrade our server processor/RAM? Do we
split processing of requests to different machines with their own copies of
trained models - that way we balance request processing to multiple machines?

Thanks,

Poch

"Peter Kim [MS]" wrote:

> Yes, but mostly training time rather than query performance. Setting
> MINIMUM_LEAF_CASES to lower and/or COMPLEXITY_PENALTY to a lower number
> effectively makes the tree learn more details (i.e., more splits) which
> causes training time longer. The prediction is obtained by traversing the
> tree. As the tree gets longer, the traversal time gets longer. However, the
> impact for the prediction should be ignorable compared to training.
>
> --
> Peter Kim
> This posting is provided "AS IS" with no warranties, and confers no rights.
>
> "Poch Reyes" <pochreyes@hotmail.com> wrote in message
> news:%23c8Gv6wmEHA.632@TK2MSFTNGP12.phx.gbl...
> > Hi Peter,
> >
> > Generally, would playing around with the setting of MINIMUM_LEAF_CASES
> > and/or COMPLEXITY_PENALTY improve/degrade DMX query performance? Model
> > Tranining time?
> >
> > Thanks?
> >
> > Poch
> >
> > "Peter Kim [MS]" <peterkim@online.microsoft.com> wrote in message
> > news:eKOmr85iEHA.1048@tk2msftngp13.phx.gbl...
> >> We have not published any paper on the optimizations, but I'll try to
> >> answer your questions here.
> >> In SQL Server 2000, the patterns (contents) found during training in the
> >> server are remoted to the client and cached into client memory until it's
> >> not used any longer. So, the prediction happens in the client; in your
> >> case, inside the SQL server since you're invoking the Analysis Services
> >> client component (MSOLAP) inside the SQL Server engine. High disk
> >> activity may be introduced by the SQL server to deal with temp result
> >> between MSOLAP and its query processor (it's just my guess), but DM
> >> prediction wouldn't cause any disk IO since all prediction happens in
> >> memory and your prediction input is on-the-fly using SELECT.
> >>
> >> Yes, 64bit SQL Server 2000 uses the same code path in 32bit. So, the
> >> limitation that I mentioned below remains the same for both 32bit and
> >> 64bit.
> >>
> >> Yes, SQL Server 2005 Beta2 include all algorithms in SQL 2000 and more.
> >>
> >> --
> >> Peter Kim
> >> This posting is provided "AS IS" with no warranties, and confers no
> >> rights.
> >>
> >> "Poch Reyes" <pochreyes@hotmail.com> wrote in message
> >> news:uOjq4TxiEHA.2992@TK2MSFTNGP12.phx.gbl...
> >>> Hi Peter,
> >>>
> >>> In terms of implementation, are traned models stored in memory? I'm just
> >>> wondering because whenever we the query below we notice high disk
> >>> activities, CPU utilization is about 50%. What's the best step to take
> >>> to improve the query performance?
> >>>
> >>> The limitation you mentioned below, will this still be the case if we go
> >>> 64-bit SQL Server 2000?
> >>>
> >>> Worse case, to meet or come even close to our target we may need to run
> >>> Yukon Beta 2 mining - are all the alogorithms available in Beta 2?
> >>>
> >>> Poch
> >>>
> >>> "Poch Reyes" <pochreyes@hotmail.com> wrote in message
> >>> news:uS5yy9liEHA.1656@TK2MSFTNGP09.phx.gbl...
> >>>> Hi Peter,
> >>>>
> >>>> Do you have data mining optimization papers you can share with us? We
> >>>> are planning to setup a SQL Server 2005 beta box to check out the
> >>>> difference in query/processing times...
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Poch
> >>>>
> >>>> "Peter Kim [MS]" <peterkim@online.microsoft.com> wrote in message
> >>>> news:%23C6bNScbEHA.2520@TK2MSFTNGP12.phx.gbl...
> >>>>>I see two potential sources of slow response time, but unfortunately
> >>>>>neither of issues is addressed fully until SQL 2005 beta2 (which will
> >>>>>be available shortly).
> >>>>>
> >>>>> 1. I guess you had to use OPENROWSET() from SQL to MSOLAP in order to
> >>>>> make your procedure reusable in the server as SPROC that accepts
> >>>>> parameters. This would introduce another expensive layer of rowset
> >>>>> traffic between the SQL engine and the provider. In SQL Server 2005,
> >>>>> DMX does support parameterized query and SPROC as well. But, no such
> >>>>> feature was available in SQL Server 2000. The only option may be to
> >>>>> move the SPROC logic to middle-tier or client app so that you may be
> >>>>> able to use MSOLAP directly.
> >>>>>
> >>>>> 2. In SQL Server 2000, the optimization for TopCount() in DMX was
> >>>>> limited. It can be very slow when you have lots of distinct keys in
> >>>>> the nested table (distinct Service in your app). You could try run the
> >>>>> DMX query directly against MSOLAP to figure out what the pure overhead
> >>>>> of DMX is. In SQL Server 2005 beta2, a better optmization has been
> >>>>> applied and the performance has improved in order of magnitude in this
> >>>>> type of query.
> >>>>>
> >>>>> --
> >>>>> Peter Kim
> >>>>> This posting is provided "AS IS" with no warranties, and confers no
> >>>>> rights.
> >>>>>
> >>>>> "Noldz" <enoldz@hotmail.com> wrote in message
> >>>>> news:2f4bb01c46d37$362f87d0$a501280a@phx.gbl...
> >>>>>> ALTER PROC sp_CrossSell @MobileNo Varchar(15),
> >>>>>> @Service Varchar(100) = ''
> >>>>>> AS
> >>>>>> DECLARE @Services AS VARCHAR(8000)
> >>>>>> DECLARE @DMX NVARCHAR(4000)
> >>>>>>
> >>>>>> EXECUTE dbo.sp_GetTexterLastServiceV2 @MobileNo,
> >>>>>> @Service ,@OutServices = @Services OUTPUT
> >>>>>>
> >>>>>> --//THIS WAS DONE IN THIS MANNER BECAUSE
> >>>>>> --//OPENROWSET Function does not accept variables for its
> >>>>>> arguments.
> >>>>>> SET @DMX = 'SELECT A.* FROM
> >>>>>> OPENROWSET(
> >>>>>> ''MSOLAP'',
> >>>>>> ''Provider=MSOLAP;Data Source=Proj88A;Mining
> >>>>>> Location=D:\proj88a\msanalysissrvcs\data\Proj88_DM'',
> >>>>>> ''SELECT FLATTENED
> >>>>>> TOPCOUNT(
> >>>>>> (SELECT Service, $Support as
> >>>>>> [Support] ,100 * $probability as [Confidence], 100 *
> >>>>>> $adjustedProbability as Lift
> >>>>>> FROM PREDICT([KeyHits],
> >>>>>> INCLUDE_STATISTICS, EXCLUSIVE)
> >>>>>> ), [Lift], 5)
> >>>>>> FROM [Cross Selling Service] PREDICTION
> >>>>>> JOIN
> >>>>>> (SELECT (@Services) AS ServiceList) AS
> >>>>>> Input
> >>>>>> ON [Cross Selling Service].[KeyHits].
> >>>>>> [Service] = Input.ServiceList.[Service]'') AS A'
> >>>>>>
> >>>>>> --print @service
> >>>>>> SET @DMX = REPLACE(@DMX, '@Services' , @Services)
> >>>>>>
> >>>>>> --PRINT @DMX
> >>>>>> EXECUTE SP_EXECUTESQL @DMX
> >>>>>>
> >>>>>>>-----Original Message-----
> >>>>>>>Can you send a sample query?
> >>>>>>>
> >>>>>>>--
> >>>>>>>
> >>>>>>>-Jamie MacLennan
> >>>>>>>SQL Server Data Mining
> >>>>>>>This posting is provided "AS IS" with no warranties, and
> >>>>>> confers no rights.
> >>>>>>>"Noldz" <enoldz@hotmail.com> wrote in message
> >>>>>>>news:2dc9001c46a2f$df9a3910$a301280a@phx.gbl...
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Is there a way to make a DMX Query response time
> >>>>>> faster?
> >>>>>>>> Currently our query response time is 0.6 sec/hit. Our
> >>>>>>>> requirement is 200 hits/sec.
> >>>>>>>>
> >>>>>>>> Noldz
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>.
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>
>



Relevant Pages

  • Re: question about decision trees algorithm
    ... SQL Server Data Mining ... What about Bayesian score for SCORE_METHOD parameter. ... I am comparing MS Decicion Trees algorithm to Weka J48 ... information gainmethod for tree building. ...
    (microsoft.public.sqlserver.datamining)
  • Re: Desicion trees performance
    ... (to build more complex trees) ... minimum_support (to have greater representation of cases in the leaves); ... SQL Server Data Mining ... I used all default parameters and the tree has 10 levels. ...
    (microsoft.public.sqlserver.datamining)