Re: How useful is Sequence Clustering algorithm?

Tech-Archive recommends: Fix windows errors by optimizing your registry



I have indeed tested the algorithm in more detail and have identified the
circumstances in which the algorithm produces recommendations with
probability 1E-14 or the internal error. This occurs whenever I set the
MAXIMIM_SEQUENCE_STATES parameter to value less than the number of
distinct pages (non-sequence attributes).
I was under the assumption that the MAXIMUM_SEQUENCE_STATES parameter
tells the algorithm the max value of sequence attribute (and should it
should be below 100 for meaningful models), while the MAXIMUM_STATES is
the number of distinct values for non-sequence attribute. Am I getting
this wrong or what?

AFAIK you are completely right aout the parameters. If the
MAXIMUM_SEQUENCE_STATES parameter is lower that the actual number of states,
then feature selection is invoked to use the most representative states
only. It seems like you are getting a bug with this feature selection. Try
to report this at MS Connect, as it seems you narrowed down the problem.

Anyway, now I set the value of both parameters well above the acutal
numbers and the algorithm seems to predict pretty well.

I have one follow-up question, though - is there any way to optimize the
performance of AS for making singleton predictions? It takes a couple of
seconds on average to get a recommendation from a model I described above
(Intel Pentium 4, 3.2 GHz, 4 GB RAM) and this time rockets to 2 minutes
and more with models that have around 2.000 distinct values of
non-sequence attribute.

You can try to warm up the AS cache with reading the model content in
advance, before you start predictions, with DMX query like
SELECT * FROM [SC model].CONTENT

--
Dejan Sarka
http://blogs.solidq.com/EN/dsarka/


.