Re: Training Decision Tree

Tech-Archive recommends: Repair Windows Errors & Optimize Windows Performance





Thank you :) What I mean is that I have two datasets onw to train the
decision tree and one for its evaluation. I want to use that specific
dataset to evaluate the decision tree and not one selected randomply form
SQL Server. I am using C4.5 algorithm and I have a training dataset ans a
test dataset. I want to use these two in SQL Server so I can compare the
C4.5 algorithm with the decision tree algorithm of SQL Server(using
entropy because C4.5 is an entropy based classifier).

Ok, so you have two datasets. You can use Lift Chart and / or Calssification
Matrix to measure the performance of the algorithm based on predictions on
test data set. Check the Lift Chart topic in Books OnLine if this suits you
(ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/uas9/html/ab77eca1-bd48-4fef-b27f-ff5b648e0501.htm).
In MS Decision Trees, you can control couple of things through algorithm
parameters. You can use entropy as the score method of a split if you set
the
SCORE_METHOD parameter to 1. Check the Microsoft Decision Trees Algorithm
topic in Books OnLine
(ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/uas9/html/95ffe66f-c261-4dc5-ad57-14d2d73205ff.htm).

--
Dejan Sarka
http://www.solidqualitylearning.com/blogs/


This is what i tried to do. First i created a mining structure using the test data set. I changed the parameters and i check the performance in lift chart.
Then i created a SSIS package with OLE DB Source the validation data set and i used it as an input to a data mining query. I run the package and browse the decision tree. The problem is that i got the same results as the first time (when i used the test data set), something that it is not possible.
What am i doing wrong?
.



Relevant Pages

  • Re: On the complexity of determining whether n numbers are distinct
    ... I want to prove that any algorithm A, in the worst-case, has to do at ... this can be shown by reducing the problem to one of sorting but the ... log n lower bound in the algebraic decision tree model of computation. ... I didn't know that the lower-bound was dependent on the model of ...
    (comp.theory)
  • Re: Training Decision Tree
    ... decision tree and one for its evaluation. ... I want to use these two in SQL Server so I can compare the ... C4.5 algorithm with the decision tree algorithm of SQL Server(using ... entropy because C4.5 is an entropy based classifier). ...
    (microsoft.public.sqlserver.datamining)
  • Re: On the complexity of determining whether n numbers are distinct
    ... pairwise comparisons) requires Omegacomparisons to solve the ... I thought that the Ben-Or result applied only to the sorting of real ... But I think it also generalizes the decision tree model ... algorithm might as well pretend the inputs are real numbers. ...
    (comp.theory)
  • Re: On the complexity of determining whether n numbers are distinct
    ... pairwise comparisons) requires Omegacomparisons to solve the ... I thought that the Ben-Or result applied only to the sorting of real ... But I think it also generalizes the decision tree model ... algorithm might as well pretend the inputs are real numbers. ...
    (comp.theory)
  • Re: why is Naive Bayes model working so slow?
    ... SQL Server Data Mining ... I transformed data set into SQL server database. ... SSAS built both models (decision tree and naive bayes) very fast. ...
    (microsoft.public.sqlserver.datamining)