Re: How to interpret CASES in Microsft Clustering Alg

Tech-Archive recommends: Fix windows errors by optimizing your registry

From: Jamie MacLennan \(MS\) (jamiemac_at_online.microsoft.com)
Date: 06/15/04


Date: Tue, 15 Jun 2004 09:35:28 -0700

The Microsoft Clustering algorithm is a probabilistic clustering algorithm.
This means that each case - row in your data - is assigned to each cluster
with a particular probability. For example - Case 1 could be in cluster 1
with 45% probability, cluster 2 with 25% probability, cluster 3 with 30%
probability and cluster 4 with 0% probability. If you asked "what cluster
is case 1 in?" the answer would be "Cluster 1" - the most likely cluster.
When you ask "How many cases are in cluster 1?" you get the sum of the
probabilities of the cases being in cluster 1. That is why you get a
non-integer number of cases. (This is called "soft-clustering")

If you want a count of the number of cases most likely to be in a cluster
(the "hard-clustering" number) you can get the results with a query

e.g.

SELECT t.CaseID, Cluster() FROM MyClusterModel
PREDICTION JOIN
<source-data-query>
ON <mapping>

Then you can do a distinct count on the cluster in SQL

-- 
-Jamie MacLennan
SQL Server Data Mining
This posting is provided "AS IS" with no warranties, and confers no rights.
"Dariusz Jankowski" <Dariusz Jankowski@discussions.microsoft.com> wrote in
message news:AACBCA4B-2C09-4FA3-9061-06A133D547CD@microsoft.com...
> Hi,
>
> I have problem. I analyze data using Microsoft Clustering Algorithm.
> When I Browse reuslts in column cases I have not integer count of casas.
> For example:
>
> In Node ALL:
> Value    Cases     Probability
> All clust.   107             100%
> false          72             67,29%
> true           35            32,71%
> missing        0             0,00%
>
> Cluster1:
> Value    Cases     Probability
> All clust.   17,78             100%
> false         15,45            86,88%
> true            2,33            13,12%
> missing        0                0,00%
>
>
> Why I have 17,78  and not 18 or 17 ?
> How to interpret it?
>
>
> Thanks for any answer,
>
> Dariusz Jankowski


Relevant Pages