Re: DMX Query response time
From: Poch Reyes (pochreyes_at_hotmail.com)
Date: 09/20/04
- Next message: Jamie MacLennan \(MS\): "Re: DMX Query response time"
- Previous message: Noldz: "Re: DMX Query response time"
- In reply to: Noldz: "Re: DMX Query response time"
- Next in thread: Jamie MacLennan \(MS\): "Re: DMX Query response time"
- Reply: Jamie MacLennan \(MS\): "Re: DMX Query response time"
- Messages sorted by: [ date ] [ thread ]
Date: Mon, 20 Sep 2004 19:25:31 +0800
Hi Peter and Jamie,
Apart from implementing multiple clients in our recommendation system to
improve our throughput, our client is very much interested in checking out
what SQL Server 2005 can offer, to the point that they might use it in
production if they are satisfied with the throughput and features (quality
of mining output). How confident are you guys in SQL Server 2005 Beta 2
code?
Thanks,
Poch
"Noldz" <noldz@discussions.microsoft.com> wrote in message
news:DCE2573B-12BE-406F-A573-A927A5833623@microsoft.com...
> We are planning to test our Recommendation System to SQL 2005 . Is the
> migration from SQL & Analysis Server 2000 to 2005 is as easy as backup &
> restores?
>
> "Peter Kim [MS]" wrote:
>
>> In SQL 2000, the prediction is done in client-side. You don't need to
>> have
>> multiple servers with copies of the model for that. Having multiple
>> clients
>> (where your app that talks to the server through MSOLAP is located) will
>> increase your prediction throughput although an individual query response
>> time doesn't change.
>>
>> In SQL 2005, all predictions are performed in the server. Also, numerous
>> optimization has been applied, specially for the scenario that you have.
>> Besides, the programmability and integration with other components that
>> SQL
>> 2005 has added is tremendous. SQL 2005 is now at beta2 stage and
>> available
>> to download for MSDN subscribers. It expected to be released early next
>> year.
>>
>> --
>> Peter Kim
>> This posting is provided "AS IS" with no warranties, and confers no
>> rights.
>>
>> "Poch Reyes" <Poch Reyes@discussions.microsoft.com> wrote in message
>> news:9A55FBB0-ADC1-418A-9361-F79A2FA59888@microsoft.com...
>> > Hi Peter,
>> >
>> > Given this limitation of SQL Server data mining, can you suggest ways
>> > to
>> > improve query performance? Do we upgrade our server processor/RAM? Do
>> > we
>> > split processing of requests to different machines with their own
>> > copies
>> > of
>> > trained models - that way we balance request processing to multiple
>> > machines?
>> >
>> > Thanks,
>> >
>> > Poch
>> >
>> > "Peter Kim [MS]" wrote:
>> >
>> >> Yes, but mostly training time rather than query performance. Setting
>> >> MINIMUM_LEAF_CASES to lower and/or COMPLEXITY_PENALTY to a lower
>> >> number
>> >> effectively makes the tree learn more details (i.e., more splits)
>> >> which
>> >> causes training time longer. The prediction is obtained by traversing
>> >> the
>> >> tree. As the tree gets longer, the traversal time gets longer.
>> >> However,
>> >> the
>> >> impact for the prediction should be ignorable compared to training.
>> >>
>> >> --
>> >> Peter Kim
>> >> This posting is provided "AS IS" with no warranties, and confers no
>> >> rights.
>> >>
>> >> "Poch Reyes" <pochreyes@hotmail.com> wrote in message
>> >> news:%23c8Gv6wmEHA.632@TK2MSFTNGP12.phx.gbl...
>> >> > Hi Peter,
>> >> >
>> >> > Generally, would playing around with the setting of
>> >> > MINIMUM_LEAF_CASES
>> >> > and/or COMPLEXITY_PENALTY improve/degrade DMX query performance?
>> >> > Model
>> >> > Tranining time?
>> >> >
>> >> > Thanks?
>> >> >
>> >> > Poch
>> >> >
>> >> > "Peter Kim [MS]" <peterkim@online.microsoft.com> wrote in message
>> >> > news:eKOmr85iEHA.1048@tk2msftngp13.phx.gbl...
>> >> >> We have not published any paper on the optimizations, but I'll try
>> >> >> to
>> >> >> answer your questions here.
>> >> >> In SQL Server 2000, the patterns (contents) found during training
>> >> >> in
>> >> >> the
>> >> >> server are remoted to the client and cached into client memory
>> >> >> until
>> >> >> it's
>> >> >> not used any longer. So, the prediction happens in the client; in
>> >> >> your
>> >> >> case, inside the SQL server since you're invoking the Analysis
>> >> >> Services
>> >> >> client component (MSOLAP) inside the SQL Server engine. High disk
>> >> >> activity may be introduced by the SQL server to deal with temp
>> >> >> result
>> >> >> between MSOLAP and its query processor (it's just my guess), but DM
>> >> >> prediction wouldn't cause any disk IO since all prediction happens
>> >> >> in
>> >> >> memory and your prediction input is on-the-fly using SELECT.
>> >> >>
>> >> >> Yes, 64bit SQL Server 2000 uses the same code path in 32bit. So,
>> >> >> the
>> >> >> limitation that I mentioned below remains the same for both 32bit
>> >> >> and
>> >> >> 64bit.
>> >> >>
>> >> >> Yes, SQL Server 2005 Beta2 include all algorithms in SQL 2000 and
>> >> >> more.
>> >> >>
>> >> >> --
>> >> >> Peter Kim
>> >> >> This posting is provided "AS IS" with no warranties, and confers no
>> >> >> rights.
>> >> >>
>> >> >> "Poch Reyes" <pochreyes@hotmail.com> wrote in message
>> >> >> news:uOjq4TxiEHA.2992@TK2MSFTNGP12.phx.gbl...
>> >> >>> Hi Peter,
>> >> >>>
>> >> >>> In terms of implementation, are traned models stored in memory?
>> >> >>> I'm
>> >> >>> just
>> >> >>> wondering because whenever we the query below we notice high disk
>> >> >>> activities, CPU utilization is about 50%. What's the best step to
>> >> >>> take
>> >> >>> to improve the query performance?
>> >> >>>
>> >> >>> The limitation you mentioned below, will this still be the case if
>> >> >>> we
>> >> >>> go
>> >> >>> 64-bit SQL Server 2000?
>> >> >>>
>> >> >>> Worse case, to meet or come even close to our target we may need
>> >> >>> to
>> >> >>> run
>> >> >>> Yukon Beta 2 mining - are all the alogorithms available in Beta 2?
>> >> >>>
>> >> >>> Poch
>> >> >>>
>> >> >>> "Poch Reyes" <pochreyes@hotmail.com> wrote in message
>> >> >>> news:uS5yy9liEHA.1656@TK2MSFTNGP09.phx.gbl...
>> >> >>>> Hi Peter,
>> >> >>>>
>> >> >>>> Do you have data mining optimization papers you can share with
>> >> >>>> us?
>> >> >>>> We
>> >> >>>> are planning to setup a SQL Server 2005 beta box to check out the
>> >> >>>> difference in query/processing times...
>> >> >>>>
>> >> >>>> Thanks,
>> >> >>>>
>> >> >>>> Poch
>> >> >>>>
>> >> >>>> "Peter Kim [MS]" <peterkim@online.microsoft.com> wrote in message
>> >> >>>> news:%23C6bNScbEHA.2520@TK2MSFTNGP12.phx.gbl...
>> >> >>>>>I see two potential sources of slow response time, but
>> >> >>>>>unfortunately
>> >> >>>>>neither of issues is addressed fully until SQL 2005 beta2 (which
>> >> >>>>>will
>> >> >>>>>be available shortly).
>> >> >>>>>
>> >> >>>>> 1. I guess you had to use OPENROWSET() from SQL to MSOLAP in
>> >> >>>>> order
>> >> >>>>> to
>> >> >>>>> make your procedure reusable in the server as SPROC that accepts
>> >> >>>>> parameters. This would introduce another expensive layer of
>> >> >>>>> rowset
>> >> >>>>> traffic between the SQL engine and the provider. In SQL Server
>> >> >>>>> 2005,
>> >> >>>>> DMX does support parameterized query and SPROC as well. But, no
>> >> >>>>> such
>> >> >>>>> feature was available in SQL Server 2000. The only option may be
>> >> >>>>> to
>> >> >>>>> move the SPROC logic to middle-tier or client app so that you
>> >> >>>>> may
>> >> >>>>> be
>> >> >>>>> able to use MSOLAP directly.
>> >> >>>>>
>> >> >>>>> 2. In SQL Server 2000, the optimization for TopCount() in DMX
>> >> >>>>> was
>> >> >>>>> limited. It can be very slow when you have lots of distinct keys
>> >> >>>>> in
>> >> >>>>> the nested table (distinct Service in your app). You could try
>> >> >>>>> run
>> >> >>>>> the
>> >> >>>>> DMX query directly against MSOLAP to figure out what the pure
>> >> >>>>> overhead
>> >> >>>>> of DMX is. In SQL Server 2005 beta2, a better optmization has
>> >> >>>>> been
>> >> >>>>> applied and the performance has improved in order of magnitude
>> >> >>>>> in
>> >> >>>>> this
>> >> >>>>> type of query.
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Peter Kim
>> >> >>>>> This posting is provided "AS IS" with no warranties, and confers
>> >> >>>>> no
>> >> >>>>> rights.
>> >> >>>>>
>> >> >>>>> "Noldz" <enoldz@hotmail.com> wrote in message
>> >> >>>>> news:2f4bb01c46d37$362f87d0$a501280a@phx.gbl...
>> >> >>>>>> ALTER PROC sp_CrossSell @MobileNo Varchar(15),
>> >> >>>>>> @Service Varchar(100) = ''
>> >> >>>>>> AS
>> >> >>>>>> DECLARE @Services AS VARCHAR(8000)
>> >> >>>>>> DECLARE @DMX NVARCHAR(4000)
>> >> >>>>>>
>> >> >>>>>> EXECUTE dbo.sp_GetTexterLastServiceV2 @MobileNo,
>> >> >>>>>> @Service ,@OutServices = @Services OUTPUT
>> >> >>>>>>
>> >> >>>>>> --//THIS WAS DONE IN THIS MANNER BECAUSE
>> >> >>>>>> --//OPENROWSET Function does not accept variables for its
>> >> >>>>>> arguments.
>> >> >>>>>> SET @DMX = 'SELECT A.* FROM
>> >> >>>>>> OPENROWSET(
>> >> >>>>>> ''MSOLAP'',
>> >> >>>>>> ''Provider=MSOLAP;Data Source=Proj88A;Mining
>> >> >>>>>> Location=D:\proj88a\msanalysissrvcs\data\Proj88_DM'',
>> >> >>>>>> ''SELECT FLATTENED
>> >> >>>>>> TOPCOUNT(
>> >> >>>>>> (SELECT Service, $Support as
>> >> >>>>>> [Support] ,100 * $probability as [Confidence], 100 *
>> >> >>>>>> $adjustedProbability as Lift
>> >> >>>>>> FROM PREDICT([KeyHits],
>> >> >>>>>> INCLUDE_STATISTICS, EXCLUSIVE)
>> >> >>>>>> ), [Lift], 5)
>> >> >>>>>> FROM [Cross Selling Service] PREDICTION
>> >> >>>>>> JOIN
>> >> >>>>>> (SELECT (@Services) AS ServiceList) AS
>> >> >>>>>> Input
>> >> >>>>>> ON [Cross Selling Service].[KeyHits].
>> >> >>>>>> [Service] = Input.ServiceList.[Service]'') AS A'
>> >> >>>>>>
>> >> >>>>>> --print @service
>> >> >>>>>> SET @DMX = REPLACE(@DMX, '@Services' , @Services)
>> >> >>>>>>
>> >> >>>>>> --PRINT @DMX
>> >> >>>>>> EXECUTE SP_EXECUTESQL @DMX
>> >> >>>>>>
>> >> >>>>>>>-----Original Message-----
>> >> >>>>>>>Can you send a sample query?
>> >> >>>>>>>
>> >> >>>>>>>--
>> >> >>>>>>>
>> >> >>>>>>>-Jamie MacLennan
>> >> >>>>>>>SQL Server Data Mining
>> >> >>>>>>>This posting is provided "AS IS" with no warranties, and
>> >> >>>>>> confers no rights.
>> >> >>>>>>>"Noldz" <enoldz@hotmail.com> wrote in message
>> >> >>>>>>>news:2dc9001c46a2f$df9a3910$a301280a@phx.gbl...
>> >> >>>>>>>> Hi,
>> >> >>>>>>>>
>> >> >>>>>>>> Is there a way to make a DMX Query response time
>> >> >>>>>> faster?
>> >> >>>>>>>> Currently our query response time is 0.6 sec/hit. Our
>> >> >>>>>>>> requirement is 200 hits/sec.
>> >> >>>>>>>>
>> >> >>>>>>>> Noldz
>> >> >>>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>.
>> >> >>>>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>
>> >> >>>>
>> >> >>>
>> >> >>>
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >>
>> >>
>>
>>
>>
- Next message: Jamie MacLennan \(MS\): "Re: DMX Query response time"
- Previous message: Noldz: "Re: DMX Query response time"
- In reply to: Noldz: "Re: DMX Query response time"
- Next in thread: Jamie MacLennan \(MS\): "Re: DMX Query response time"
- Reply: Jamie MacLennan \(MS\): "Re: DMX Query response time"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|