Re: Clustering wrong results in SQL 2005
- From: Vereenka <vereenka@xxxxxxxxx>
- Date: Wed, 13 Feb 2008 00:45:27 +0100
Dejan Sarka pisze:
Hi!
Could you check whether you get the same result with hard clustering, i.e. with K-means algorithm? The default Expectation-Maximization algorithm, also known as soft clustering, actually assigns each case to each cluster with some probability, while K-means assigns each case to one cluster with probability 1. Maybe you can check probabilities for each cluster with teh ClusterProbability DMX function?
Hi,
I've built 3 additional models to cover all clustering methods.
The results are the same for each pair: EM algorithms and k-means algorithms.
In each model I still get the 50% probability of P2 in cluster 1 - in NODE_DISTRIBUTION, also for k-means. The other Ps are assigned to 1 cluster with probability 1, just as you write ;)
Maybe the SUPPORT in NODE_DISTRIBUTION could be helpful - I noticed it was higher than in NODE_SUPPORT but I don't know how it was calculated. The support for the whole node is equal to number of cases assigned to the cluster but what about the SUPPORT in NODE_DISTRIBUTION? E.g. I received SUPPORT=6 for P1=Existing while the number of cases in this cluster was only 3. Also support for P3=Missing was 6 while the total number of cases containing P3 was only 5.
The ClusterProbability results are clear in k-means if my query covers existing cases. It has difficulties in cases built as extensions of historical ones - it is specific for this algorithm, isnt' it?
In case of EM, if ClusterProbability is equal for clusters, Cluster() returns the one with lower node id (number in node name).
What would you suggest? Clustering function is promising but I can't load real data until I understand "the P2" effect ;)
Ver.
.
- Follow-Ups:
- Re: Clustering wrong results in SQL 2005
- From: Dejan Sarka
- Re: Clustering wrong results in SQL 2005
- References:
- Clustering wrong results in SQL 2005
- From: Vereenka
- Re: Clustering wrong results in SQL 2005
- From: Dejan Sarka
- Clustering wrong results in SQL 2005
- Prev by Date: Re: trouble with viewing lift chart
- Next by Date: Re: Clustering wrong results in SQL 2005
- Previous by thread: Re: Clustering wrong results in SQL 2005
- Next by thread: Re: Clustering wrong results in SQL 2005
- Index(es):
Relevant Pages
|