RE: Better approach for common business problem ? (long post)
From: Tom VdP (TomVdP_at_discussions.microsoft.com)
Date: 10/09/04
- Next message: Tom VdP: "Re: Using today's date"
- Previous message: Karen Middleton: "How to programmatically load data into a localcube"
- In reply to: Chris Webb: "RE: Better approach for common business problem ? (long post)"
- Next in thread: Chris Webb: "RE: Better approach for common business problem ? (long post)"
- Reply: Chris Webb: "RE: Better approach for common business problem ? (long post)"
- Messages sorted by: [ date ] [ thread ]
Date: Sat, 9 Oct 2004 09:13:05 -0700
Hi Chris, Jamie, rest of the world,
Sorry for not responding earlier. Here is a roundup of the situation.
First of all, a bit of info about the front-end. The front-end is written
in Excel VBA because the customer wants quite a bit of specific behaviour and
layout. Rows and columns can be collapsed/expanded, context menus appear on
right-click that use the specific cell-location info (e.g. filtering, linking
to another report, sort options), popup windows (mainly for entering report
parameters), ... The front-end creates the necessary MDX dynamically.
This development costed about 2 manmonths of work.
Cubes:
There are 5 cubes, 2 virtual cubes, largest cube has 7 dimensions, 5
measures, 2 calculated measures. Dimension security in about 30 groups. The
amount of data is very little: 2 million fact rows. The cubes fit in 10Mb
files.
Necessity for optimisation:
Some reports return a lot of data: 12.000 rows x 120 columns. Loading this
in Excel is slow, but on top of that getting the data out of the cube also
takes a quite some time: up to 2 minutes. The returned cells are sums over
variable periods, so these must be calculated ad hoc. Loading the data and
formatting takes another minute or so.
About the optimisation proces:
First of all, I did read and tried to apply the Performance Guide. So the
tweaking with connection string options and so was done. Also the usage
based optimisation and other settings server side were applied. Latest
service packs applied everywhere.
Extra things I did because of the hints Chris gave:
1) MDX: use the NECJ version instead of the Filter() version
2) MDX : work away the Extract()
Indeed, I think I got confused by the examples in BOL and the fact that NECJ
must not necessarily return a typical crossjoined set. It can be a set
consisting of members of 1 dimension only.
--> result : no measurable differences
3) Performance monitor: measure DSN Used & DSN Requested.
VERY interesting! I noticed quite some room for uptimisation here!
So I added aggregations manually via the Partition Manager tool (for those
looking for it: it is available in the SQLServer2000 Resource Kit).
All queries that were logged had the same level DSN Used & DSN Requested.
--> result: disappointing! again no measurable difference.
(Sidenote question: how do you log ALL DSN Used & DSN Requested ? The
perfmon only polls once/second. And the querylog.mdb does not contain DSN
Used. What a mistake!)
4) Partition the cubes
Here I must admit that I still do not grasp the finesses of correct
partitioning... I partioned over months (-> 284 partitions!?)
--> result: much worse performance.
What I did not yet investigate is Cache Policy & Cache Ratio.
So all in all a disappointing result. Yet I must say that I am very glad I
did all this, because it makes me feel more confident when facing the
customer.
Still pending: why is it so much slower in a client/server environment than
in a single-pc setup ? Could it be because of the dimension security ?
Many thanks (especially to Chris!)
Tom
"Chris Webb" wrote:
> Oh, and one other important point - make sure you have SP3 PTS installed on
> your client, it can sometimes make a big difference with performance.
>
> "Chris Webb" wrote:
>
> > Hi Tom,
> >
> > First of all, thanks for the detailed explanation. It makes it much easier
> > to suggest solutions...
> >
> > On the modelling side, you might be interested to read the following posting
> > on how I generally approach this kind of problem:
> > http://groups.google.co.uk/groups?hl=en&lr=&ie=UTF-8&selm=Ow%23V1FezCHA.1644%40TK2MSFTNGP12
> > With a little bit of work you would be able to adapt this to your own
> > scenario, which might allow you to run these kind of queries from any
> > front-end; was this requirement the reason why you built your own front-end?
> >
> > Also, does your customer dimension have any levels or is it just a flat
> > list? If the latter, I would strongly recommend putting in two or three
> > levels to group the customers even if you don't have a business need for them
> > and end up making them invisible. It should also help.
> >
> > Regarding MDX, the most efficient queries are always going to be the ones
> > using NONEMPTYCROSSJOIN rather than FILTER. Looking at your second query, you
> > might find that performance improves if you get rid of the unnecessary
> > EXTRACTs and put the contents of your WHERE clause into the NECJ function
> > too. Here's what it might look like (nb might be syntax errors!):
> >
> > with
> > set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]'
> > set [PrevSelectedMonths] as
> >
> > 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])'
> >
> > member [Measures].[SumPeriod]
> > as 'sum([SelectedMonths], [Measures].[Amount])'
> >
> > member [Measures].[SumPeriod Y-1]
> > as 'sum([PrevSelectedMonths], [Measures].[Amount])'
> >
> > set [y-axis]
> > as 'Hierarchize(
> > NonEmptyCrossjoin([DimClient].Members,{[SelectedMonths]+[PrevSelectedMonths]},{[Measures].[Amount]},{[DimSomeOtherDimension].[SomeMember]},1))'
> >
> > set [x-axis-products]
> > as 'Hierarchize(
> > Generate(
> > NonEmptyCrossjoin([DimProduct].[SubProduct].members,
> > {[SelectedMonths]+[PrevSelectedMonths]},{[Measures].[Amount]},{[DimSomeOtherDimension].[SomeMember]},1)
> > ,{[DimProduct].CurrentMember,[DimProduct].CurrentMember.Parent})'
> >
> > select
> > {
> > Crossjoin([x-axis-products], {[Measures].[SumPeriod],
> > [Measures].[SumPeriod Y-1]})
> > } on columns,
> > {
> > [y-axis]
> > } on rows
> > from TheCube
> > where ([DimSomeOtherDimension].[SomeMember])
> >
> >
> > However, I'm not sure that any of the above will make a real difference to
> > performance, which is your biggest issue. If you've read the AS Performance
> > Guide you've probably already got a good partitioning strategy (by month I
> > assume) and run the Usage-Based Optimisation wizard. However, I think the
> > main reason you've got poor performance is that your queries run at the
> > Client level, which has 12000 members and is unlikely to have any
> > aggregations associated with it because of the 1/3 rule. In this case it
> > might be worth building your own aggregations at the Client level using the
> > Partition Manager tool available in the SQL 2000 Resource Kit, and seeing if
> > that improves matters (although it would increase the size of the cube and
> > processing times). For hints on how to know which aggregations to build when
> > you're doing it manually, have a look at this thread:
> > http://groups.google.co.uk/groups?hl=en&lr=&ie=UTF-8&threadm=9BB53680-AF5D-4A33-BA54-71FC3A7E255F%40microsoft.com&rnum=1&prev=/groups%3Fq%3DWebb%2BDSN%2BRequested%26hl%3Den%26lr%3D%26ie%3DUTF-8%26group%3Dmicrosoft.public.sqlserver.olap%26selm%3D9BB53680-AF5D-4A33-BA54-71FC3A7E255F%2540microsoft.com%26rnum%3D1
> >
> > ...and the following, which contains a better explanation of how to
> > understand the query log:
> > http://www.mosha.com/msolap/articles/querylog.htm
> >
> > Because you have control over your front-end, it might also be worth playing
> > around with some connection string properties. The infamous Execution
> > Location/Default Isolation Mode pairing is always worth a try, as is trying
> > different values for Large Level Threshold. Also, in your case, try values of
> > 0 to 7 for the Cache Policy property, and try values of 0.01 for CacheRatio
> > and CacheRatio2.
> >
> > Finally, my last observation is that any report where you expect to get up
> > to 12000 rows and 120 columns is always going to perform slowly, and is more
> > data than anyone can take in. It needs to be broken down into more digestible
> > chunks --> smaller queries!
> >
> > HTH,
> >
> > Chris
> >
> >
> >
> > "Tom VdP" wrote:
> >
> > >
> > > [ This article is also available in a better readable format here:
> > > http://users.pandora.be/see/mdx/news.htm ]
> > >
> > >
> > > Given a cube with the following properties:
> > > - a measure Amount
> > > - a time dimension DimTime with a granularity of months (only Years and
> > > Months, no lower levels)
> > > - a client dimension DimClient
> > > - a product dimension DimProduct (consisting of a level Product and a level
> > > SubProduct)
> > > - a few other dimensions
> > >
> > > Consider the following requirement:
> > > A report must be made that shows the total Amount in a given period compared
> > > to the total Amount in the same period one year ago for each
> > >
> > > client for each product.
> > >
> > > The report has the following layout:
> > > (hopefully you read this in a non-proportional font, if not consider the
> > > link mentioned above)
> > >
> > >
> > > Product A SubProduct A Product B
> > > ..
> > > SumPeriod SumPeriod Y-1 SumPeriod SumPeriod Y-1 SumPeriod
> > > SumPeriod Y-1 .. ..
> > > All DimClient x y x y x
> > > y x y
> > > Client 1 x y x y x
> > > y x y
> > > Client 2 x y x y x
> > > y x y
> > > .. x y x y x
> > > y x y
> > >
> > >
> > > In pseudo-MDX: each x above = Sum(DimTime.startmonth:DimTime.endmonth,
> > > Measures.[Amount]).
> > > The start and end dates of the period are parameters: they are entered by
> > > the user and injected into the MDX. This makes it impossible to
> > >
> > > aggregate the sums beforehand.
> > >
> > > The report should only include those clients that have at least one
> > > (sub)product. Also only those (sub)products that are associated with
> > >
> > > at least one client should be shown.
> > >
> > > Because of the nature of this report a lot of data will be presented to the
> > > user. The dimension DimClient contains 12000 elements,
> > >
> > > DimProducts contains 60 elements. Depending on the chosen period and
> > > possibly other slicers (among other dimensions) the report may
> > >
> > > return up to 12000 rows and 120 columns.
> > >
> > > This looks like a pretty standard report to me, yet I could not find an
> > > MDX-pattern for this type of problem on the net. Neither did I
> > >
> > > find a hint in Spofford's book.
> > >
> > > Here are the 2 approaches I devised myself.
> > >
> > >
> > > MDX 1 - Selection based on Filter()
> > > -----------------------------------
> > >
> > > with
> > > set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]'
> > > set [PrevSelectedMonths] as
> > >
> > > 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])'
> > >
> > > member [Measures].[SumPeriod]
> > > as 'sum([SelectedMonths], [Measures].[Amount])'
> > >
> > > member [Measures].[SumPeriod Y-1]
> > > as 'sum([PrevSelectedMonths], [Measures].[Amount])'
> > >
> > > member [Measures].[AllClientPeriodSums] as '([DimClient].[All
> > > DimClient],[Measures].[SumPeriod])+([DimClient].[All
> > >
> > > DimClient],[Measures].[SumPeriod Y-1])'
> > >
> > > select
> > > {
> > > Crossjoin(
> > > Hierarchize(
> > > Filter(
> > > [DimProduct].[SubProduct].members + [DimProduct].[Product].members
> > > ,not(isempty([Measures].[AllClientPeriodSums]))))
> > > ,{[Measures].[SumPeriod],[Measures].[SumPeriod Y-1]})
> > > } on columns,
> > > {
> > > Filter([DimClient].members,([Measures].[SumPeriod]<>0 or
> > > [Measures].[SumPeriod Y-1]<>0))
> > > } on rows
> > > from TheCube
> > > where ([DimTime].[2004],[DimSomeOtherDimension].[SomeMember])
> > >
> > >
> > > The above in words:
> > > - Filter all Products and SubProducts for which the "All" level in DimClient
> > > has a non-empty value for the sums over the sets
> > >
> > > SelectedMonths or PrevSelectedMonths. If that value is non-empty, at least
> > > 1 client has a value for that (sub)product, so that
> > >
> > > (sub)product must be retained.
> > > - Filter all Clients based on the test if the sums are <> 0. (this should
> > > be: not(isempty(SumPeriod) or isempty(SumPeriod Y-1)), but due
> > >
> > > to extra requirements of our users it is implemented like this).
> > >
> > >
> > >
> > > MDX 2 - Selection based on NonEmptyCrossjoin()
> > > ----------------------------------------------
> > >
> > > with
> > > set [SelectedMonths] as '[DimTime].&[2004_01]:[DimTime].&[2004_08]'
> > > set [PrevSelectedMonths] as
> > >
> > > 'ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_01]):ParallelPeriod([DimTime].[Year],1,[DimTime].&[2004_08])'
> > >
> > > member [Measures].[SumPeriod]
> > > as 'sum([SelectedMonths], [Measures].[Amount])'
> > >
> > > member [Measures].[SumPeriod Y-1]
> > > as 'sum([PrevSelectedMonths], [Measures].[Amount])'
> > >
> > > set [y-axis]
> > > as 'Hierarchize(
> > > Extract(
> > >
> > > NonEmptyCrossjoin([SelectedMonths]+[PrevSelectedMonths],[DimClient].Members,{[Measures].[Amount]},2)
> > > ,[DimClient]))'
> > >
> > > set [x-axis-products]
> > > as 'Hierarchize(
> > > Generate(
> > > Extract(
> > >
> > > NonEmptyCrossjoin([SelectedMonths]+[PrevSelectedMonths],[DimProduct].[SubProduct].members,{[Measures].[Amount]},2)
> > > ,[DimProduct])
> > > ,{[DimProduct].CurrentMember,[DimProduct].CurrentMember.Parent}))'
> > >
> > > select
> > > {
> > > Crossjoin([x-axis-products], {[Measures].[SumPeriod],
> > > [Measures].[SumPeriod Y-1]})
> > > } on columns,
> > > {
> > > [y-axis]
> > > } on rows
> > > from TheCube
> > > where ([DimSomeOtherDimension].[SomeMember])
> > >
> > >
> > > The above in words:
> > > - Find non-empty crossings between the period sets and clients based on the
> > > Amount measure. From this set, extract the Clients and
> > >
> > > hierarchize. These are the clients that go on the Y-axis of the report.
> > > - Find non-empty crossings between the period sets and subproducts based on
> > > the Amount measure. Extract the Product members from that
> > >
> > > set. Add the parents, i.e. the Products and hierarchize. These are the
> > > products that will go on the X-axis of the report.
> > >
> > >
> > > Questions
> > > ---------
> > >
> > > 1) Both solutions are rather slow. Is there an MDX 3 I should implement
> > > instead ? I there a better approach ? I am rather new to MDX
> > >
> > > and I find it dazzling at times...
> > >
> > > 2) For some slicer elements (the [DimSomeOtherDimension].[SomeMember] above)
> > > the report is up to 60% slower than without any slicer
> > >
> > > element at all! Without the where-clause in the MDX the report executes
> > > faster than for MDX with a where-clause that limits the data to
> > >
> > > some 70%. What is happening ??
> > >
> > > 3) Is there a way to incorporate via cube-design a better/faster solution ?
> > > Keep in mind that both start and end months are variables.
> > >
> > > 4) [May be unrelated to this particular report] I noticed that even on a
> > > medium sized desktop pc that runs Analysis Services locally the
> > >
> > > performance is way better than in a client/server setup with a high-end
> > > desktop and a high-end server... This has me baffled. The
> > >
> > > network usage is low and never a bottleneck. The server is dedicated and
> > > there are no other users executing reports. We use Excel VBA
> > >
> > > with ADO MD to get to our data. I did follow all steps in the Microsoft
> > > whitepaper on Analysis Services performance.
> > >
> > > Many, many thanks for reading up to this point!
> > > Many more for all helpful information and remarks.
> > > Tom
> > >
> > >
- Next message: Tom VdP: "Re: Using today's date"
- Previous message: Karen Middleton: "How to programmatically load data into a localcube"
- In reply to: Chris Webb: "RE: Better approach for common business problem ? (long post)"
- Next in thread: Chris Webb: "RE: Better approach for common business problem ? (long post)"
- Reply: Chris Webb: "RE: Better approach for common business problem ? (long post)"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|