Re: Question about decision tree algorithm in sqlserver2000



Thanks, it really helps. As your explaination,
if my target is discrete, the algorithm will
build a classification tree. How can I force
the algorithm to build a regressiion tree for
me even my target is discrete(true,false)? Is there some parameter here or the algorithm does
the job automatically?

The algorithm (like some others in SQL Server 2005) is actually a family of algorithms. Depending on the modeling, the algorithm may be a decision or a regression tree. If you model your target as discrete or discretized, the algorithm will build a classification tree (histograms in nodes/leaves).
If your target, on the other hand, is continuous (numeric or dateTime), the algorithm will build regression trees that have regression formulae in nodes and leaves.

The algorithm actually does a better job in providing high lift than high true rate. It is optimized for accuracy across the whole range of target values (i.e. high lift) and not for a single target state (which would favor hight true positives). You can measure the lift using the Accuracy Chart view in the mining model viewer.
Note that, if your target is continuous (i.e. a regression tree) than the lift chart is replaced by a scatter plot

bogdan


Thanks for your kind reply.
I still some other question about the detail of dtree algorithm. Is it a classification tree of regression tree? If my goal is high lift instead
of high true rate, is it suitable to use dtree
algorithm?

Hello

1. If there're only 5% or even lower positive samples in my train/test
data, will it affect the performance of decision tree algorithm?
Yes, it will most likely affect the performance of the algorithm. . To
improve the accuracy of the model, I'd suggest sampling your data so that
the distributions are more balanced. An article on performing stratified
sampling is available here:
http://www.sqlserverdatamining.com/DMCommunity/TipsNTricks/2615.aspx


2. Can I set some parameter to affect the penalty of FP(false positive)
and FN (false nagative)? Because the business impact of FP and FN are
quite different.
There is no algorithm parameter that handles this.
However, the model viewers in SQL Server 2005 allow an analysis of FP vs FN
by using the Classification Matrix view.

Hope this helps

--
--
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Please do not send email directly to this alias. It is for newsgroup
purposes only.

thanks,
bogdan

<anonymous_user@xxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:fde7cb95-3517-466d-8dec-014602d9f9cd@xxxxxxxxxxxxxxxxxxxxxxx
I've no idea about the implementation of this decision tree algorithm, so
my questions maybe silly-_-
My task is to create a decision tree to predict if one customer will buy
our product given some infomation.
1. If there're only 5% or even lower positive samples in my train/test
data, will it affect the performance of decision tree algorithm?
2. Can I set some parameter to affect the penalty of FP(false positive)
and FN (false nagative)? Because the business impact of FP and FN are
quite different.

Thanks in advance!


.



Relevant Pages

  • Re: Question about decision tree algorithm in sqlserver2000
    ... if your target is discrete, such an equation cannot be built. ... The model decides whether a regression or a classification tree has to be built based on the content type of the target. ... Another algorithm in the SQL Server suite uses a form of regression to estimate the pobabilities for each possible target value ... if your target is continuous (i.e. a regression tree) than the lift chart is replaced by a scatter plot ...
    (microsoft.public.sqlserver.datamining)
  • LOS Optimization Using Binary Tree Structures (with demo)
    ... fast way to calculate LOS using a binary tree (properly called a binary ... describe the visibility-dependency between tiles - as in describing ... Using octants ... The cool thing about using a relational-based LOS algorithm is that you ...
    (rec.games.roguelike.development)
  • Re: How come Ada isnt more popular?
    ... beneficial for the memory-management aspect of such an algorithm. ... When the GC hits it just traverses the tree ... E.g a chess playing program (any ... Furthermore generational garbage collection AFAIK has ...
    (comp.lang.ada)
  • Re: Question about decision tree algorithm in sqlserver2000
    ... If your target, on the other hand, is continuous, the algorithm will build regression trees that have regression formulae in nodes and leaves. ... if your target is continuous (i.e. a regression tree) than the lift chart is replaced by a scatter plot ...
    (microsoft.public.sqlserver.datamining)
  • Ultimate, God-like Algorithm
    ... An infinite tree is described by including "self references" in place ... self reference makes the tree infinite. ... believe that I have an algorithm that can show equality in m * n time. ...
    (sci.math)

Loading