Re: DMX Query response time

From: Peter Kim [MS] (peterkim_at_online.microsoft.com)
Date: 09/16/04

  • Next message: Jamie MacLennan \(MS\): "Re: DMX Query response time"
    Date: Thu, 16 Sep 2004 10:19:23 -0700
    
    

    In SQL 2000, the prediction is done in client-side. You don't need to have
    multiple servers with copies of the model for that. Having multiple clients
    (where your app that talks to the server through MSOLAP is located) will
    increase your prediction throughput although an individual query response
    time doesn't change.

    In SQL 2005, all predictions are performed in the server. Also, numerous
    optimization has been applied, specially for the scenario that you have.
    Besides, the programmability and integration with other components that SQL
    2005 has added is tremendous. SQL 2005 is now at beta2 stage and available
    to download for MSDN subscribers. It expected to be released early next
    year.

    -- 
    Peter Kim
    This posting is provided "AS IS" with no warranties, and confers no rights.
    "Poch Reyes" <Poch Reyes@discussions.microsoft.com> wrote in message 
    news:9A55FBB0-ADC1-418A-9361-F79A2FA59888@microsoft.com...
    > Hi Peter,
    >
    > Given this limitation of SQL Server data mining, can you suggest ways to
    > improve query performance? Do we upgrade our server processor/RAM? Do we
    > split processing of requests to different machines with their own copies 
    > of
    > trained models - that way we balance request processing to multiple 
    > machines?
    >
    > Thanks,
    >
    > Poch
    >
    > "Peter Kim [MS]" wrote:
    >
    >> Yes, but mostly training time rather than query performance. Setting
    >> MINIMUM_LEAF_CASES to lower and/or COMPLEXITY_PENALTY to a lower number
    >> effectively makes the tree learn more details (i.e., more splits) which
    >> causes training time longer. The prediction is obtained by traversing the
    >> tree. As the tree gets longer, the traversal time gets longer. However, 
    >> the
    >> impact for the prediction should be ignorable compared to training.
    >>
    >> -- 
    >> Peter Kim
    >> This posting is provided "AS IS" with no warranties, and confers no 
    >> rights.
    >>
    >> "Poch Reyes" <pochreyes@hotmail.com> wrote in message
    >> news:%23c8Gv6wmEHA.632@TK2MSFTNGP12.phx.gbl...
    >> > Hi Peter,
    >> >
    >> > Generally, would playing around with the setting of MINIMUM_LEAF_CASES
    >> > and/or COMPLEXITY_PENALTY improve/degrade DMX query performance? Model
    >> > Tranining time?
    >> >
    >> > Thanks?
    >> >
    >> > Poch
    >> >
    >> > "Peter Kim [MS]" <peterkim@online.microsoft.com> wrote in message
    >> > news:eKOmr85iEHA.1048@tk2msftngp13.phx.gbl...
    >> >> We have not published any paper on the optimizations, but I'll try to
    >> >> answer your questions here.
    >> >> In SQL Server 2000, the patterns (contents) found during training in 
    >> >> the
    >> >> server are remoted to the client and cached into client memory until 
    >> >> it's
    >> >> not used any longer. So, the prediction happens in the client; in your
    >> >> case, inside the SQL server since you're invoking the Analysis 
    >> >> Services
    >> >> client component (MSOLAP) inside the SQL Server engine. High disk
    >> >> activity may be introduced by the SQL server to deal with temp result
    >> >> between MSOLAP and its query processor (it's just my guess), but DM
    >> >> prediction wouldn't cause any disk IO since all prediction happens in
    >> >> memory and your prediction input is on-the-fly using SELECT.
    >> >>
    >> >> Yes, 64bit SQL Server 2000 uses the same code path in 32bit. So, the
    >> >> limitation that I mentioned below remains the same for both 32bit and
    >> >> 64bit.
    >> >>
    >> >> Yes, SQL Server 2005 Beta2 include all algorithms in SQL 2000 and 
    >> >> more.
    >> >>
    >> >> -- 
    >> >> Peter Kim
    >> >> This posting is provided "AS IS" with no warranties, and confers no
    >> >> rights.
    >> >>
    >> >> "Poch Reyes" <pochreyes@hotmail.com> wrote in message
    >> >> news:uOjq4TxiEHA.2992@TK2MSFTNGP12.phx.gbl...
    >> >>> Hi Peter,
    >> >>>
    >> >>> In terms of implementation, are traned models stored in memory? I'm 
    >> >>> just
    >> >>> wondering because whenever we the query below we notice high disk
    >> >>> activities, CPU utilization is about 50%. What's the best step to 
    >> >>> take
    >> >>> to improve the query performance?
    >> >>>
    >> >>> The limitation you mentioned below, will this still be the case if we 
    >> >>> go
    >> >>> 64-bit SQL Server 2000?
    >> >>>
    >> >>> Worse case, to meet or come even close to our target we may need to 
    >> >>> run
    >> >>> Yukon Beta 2 mining - are all the alogorithms available in Beta 2?
    >> >>>
    >> >>> Poch
    >> >>>
    >> >>> "Poch Reyes" <pochreyes@hotmail.com> wrote in message
    >> >>> news:uS5yy9liEHA.1656@TK2MSFTNGP09.phx.gbl...
    >> >>>> Hi Peter,
    >> >>>>
    >> >>>> Do you have data mining optimization papers you can share with us? 
    >> >>>> We
    >> >>>> are planning to setup a SQL Server 2005 beta box to check out the
    >> >>>> difference in query/processing times...
    >> >>>>
    >> >>>> Thanks,
    >> >>>>
    >> >>>> Poch
    >> >>>>
    >> >>>> "Peter Kim [MS]" <peterkim@online.microsoft.com> wrote in message
    >> >>>> news:%23C6bNScbEHA.2520@TK2MSFTNGP12.phx.gbl...
    >> >>>>>I see two potential sources of slow response time, but unfortunately
    >> >>>>>neither of issues is addressed fully until SQL 2005 beta2 (which 
    >> >>>>>will
    >> >>>>>be available shortly).
    >> >>>>>
    >> >>>>> 1. I guess you had to use OPENROWSET() from SQL to MSOLAP in order 
    >> >>>>> to
    >> >>>>> make your procedure reusable in the server as SPROC that accepts
    >> >>>>> parameters. This would introduce another expensive layer of rowset
    >> >>>>> traffic between the SQL engine and the provider. In SQL Server 
    >> >>>>> 2005,
    >> >>>>> DMX does support parameterized query and SPROC as well. But, no 
    >> >>>>> such
    >> >>>>> feature was available in SQL Server 2000. The only option may be to
    >> >>>>> move the SPROC logic to middle-tier or client app so that you may 
    >> >>>>> be
    >> >>>>> able to use MSOLAP directly.
    >> >>>>>
    >> >>>>> 2. In SQL Server 2000, the optimization for TopCount() in DMX was
    >> >>>>> limited. It can be very slow when you have lots of distinct keys in
    >> >>>>> the nested table (distinct Service in your app). You could try run 
    >> >>>>> the
    >> >>>>> DMX query directly against MSOLAP to figure out what the pure 
    >> >>>>> overhead
    >> >>>>> of DMX is. In SQL Server 2005 beta2, a better optmization has been
    >> >>>>> applied and the performance has improved in order of magnitude in 
    >> >>>>> this
    >> >>>>> type of query.
    >> >>>>>
    >> >>>>> -- 
    >> >>>>> Peter Kim
    >> >>>>> This posting is provided "AS IS" with no warranties, and confers no
    >> >>>>> rights.
    >> >>>>>
    >> >>>>> "Noldz" <enoldz@hotmail.com> wrote in message
    >> >>>>> news:2f4bb01c46d37$362f87d0$a501280a@phx.gbl...
    >> >>>>>> ALTER       PROC sp_CrossSell @MobileNo Varchar(15),
    >> >>>>>> @Service Varchar(100) = ''
    >> >>>>>> AS
    >> >>>>>> DECLARE @Services AS VARCHAR(8000)
    >> >>>>>> DECLARE @DMX NVARCHAR(4000)
    >> >>>>>>
    >> >>>>>> EXECUTE dbo.sp_GetTexterLastServiceV2 @MobileNo,
    >> >>>>>> @Service ,@OutServices = @Services OUTPUT
    >> >>>>>>
    >> >>>>>> --//THIS WAS DONE IN THIS MANNER BECAUSE
    >> >>>>>> --//OPENROWSET Function does not accept variables for its
    >> >>>>>> arguments.
    >> >>>>>> SET @DMX = 'SELECT A.* FROM
    >> >>>>>> OPENROWSET(
    >> >>>>>> ''MSOLAP'',
    >> >>>>>> ''Provider=MSOLAP;Data Source=Proj88A;Mining
    >> >>>>>> Location=D:\proj88a\msanalysissrvcs\data\Proj88_DM'',
    >> >>>>>> ''SELECT FLATTENED
    >> >>>>>> TOPCOUNT(
    >> >>>>>>   (SELECT Service, $Support as
    >> >>>>>> [Support] ,100 * $probability as [Confidence], 100 *
    >> >>>>>> $adjustedProbability as Lift
    >> >>>>>>    FROM PREDICT([KeyHits],
    >> >>>>>> INCLUDE_STATISTICS, EXCLUSIVE)
    >> >>>>>>   ), [Lift], 5)
    >> >>>>>> FROM [Cross Selling Service] PREDICTION
    >> >>>>>> JOIN
    >> >>>>>>   (SELECT (@Services) AS ServiceList) AS
    >> >>>>>> Input
    >> >>>>>> ON [Cross Selling Service].[KeyHits].
    >> >>>>>> [Service] = Input.ServiceList.[Service]'') AS A'
    >> >>>>>>
    >> >>>>>> --print @service
    >> >>>>>> SET @DMX = REPLACE(@DMX, '@Services' , @Services)
    >> >>>>>>
    >> >>>>>> --PRINT @DMX
    >> >>>>>> EXECUTE SP_EXECUTESQL @DMX
    >> >>>>>>
    >> >>>>>>>-----Original Message-----
    >> >>>>>>>Can you send a sample query?
    >> >>>>>>>
    >> >>>>>>>-- 
    >> >>>>>>>
    >> >>>>>>>-Jamie MacLennan
    >> >>>>>>>SQL Server Data Mining
    >> >>>>>>>This posting is provided "AS IS" with no warranties, and
    >> >>>>>> confers no rights.
    >> >>>>>>>"Noldz" <enoldz@hotmail.com> wrote in message
    >> >>>>>>>news:2dc9001c46a2f$df9a3910$a301280a@phx.gbl...
    >> >>>>>>>> Hi,
    >> >>>>>>>>
    >> >>>>>>>> Is there a way to make a DMX Query response time
    >> >>>>>> faster?
    >> >>>>>>>> Currently our query response time is 0.6 sec/hit. Our
    >> >>>>>>>> requirement is 200 hits/sec.
    >> >>>>>>>>
    >> >>>>>>>> Noldz
    >> >>>>>>>>
    >> >>>>>>>
    >> >>>>>>>
    >> >>>>>>>.
    >> >>>>>>>
    >> >>>>>
    >> >>>>>
    >> >>>>
    >> >>>>
    >> >>>
    >> >>>
    >> >>
    >> >>
    >> >
    >> >
    >>
    >>
    >> 
    

  • Next message: Jamie MacLennan \(MS\): "Re: DMX Query response time"

    Relevant Pages

    • Re: DMX Query response time
      ... in SQL 2000, prediction is done on the client _by default_. ... SQL Server Data Mining ...
      (microsoft.public.sqlserver.datamining)
    • Re: DataSet.GetChanges() in RowChanged(DataRowAction.Add)
      ... have you considered SQL Express and use ... > I realize now that I didn't describe well how the client application is ... > Framework installed on the client machine, but not any SQL Server). ... > 20 tables in different relations with eachother in the database, ...
      (microsoft.public.dotnet.framework.adonet)
    • Re: SBS 2003 and Sql Server ~ Client Install
      ... I've found the client tools and have now successfully installed them. ... Access front end to a SQL database, Excel can be a front end, A word mail merge document could access a SQL database via ODBC. ... We are trying to install the client software. ... We understand that a Northwind training database is available for SQL Server. ...
      (microsoft.public.backoffice.smallbiz2000)
    • Re: SBS 2003 and Sql Server ~ Client Install
      ... I've found the client tools and have now ... A word mail merge document could access a SQL database via ... We want to install to gain an understanding. ... Once we've managed to install the SQL Server Client we can start to ...
      (microsoft.public.backoffice.smallbiz2000)
    • Re: TDS vs. TCP
      ... I start capturing the network traffic between SQL ... Server and the client. ... I don't see any TDS packets but the communication between SQL ... stand-a-lone server and the client, ...
      (microsoft.public.sqlserver.clustering)