Re: Question about decision tree algorithm in sqlserver2000



Hello

1. If there're only 5% or even lower positive samples in my train/test
data, will it affect the performance of decision tree algorithm?
Yes, it will most likely affect the performance of the algorithm. . To
improve the accuracy of the model, I'd suggest sampling your data so that
the distributions are more balanced. An article on performing stratified
sampling is available here:
http://www.sqlserverdatamining.com/DMCommunity/TipsNTricks/2615.aspx


2. Can I set some parameter to affect the penalty of FP(false positive)
and FN (false nagative)? Because the business impact of FP and FN are
quite different.
There is no algorithm parameter that handles this.
However, the model viewers in SQL Server 2005 allow an analysis of FP vs FN
by using the Classification Matrix view.

Hope this helps

--
--
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Please do not send email directly to this alias. It is for newsgroup
purposes only.

thanks,
bogdan

<anonymous_user@xxxxxxxxxxxxxxxxxxxxxxx> wrote in message
news:fde7cb95-3517-466d-8dec-014602d9f9cd@xxxxxxxxxxxxxxxxxxxxxxx
I've no idea about the implementation of this decision tree algorithm, so
my questions maybe silly-_-
My task is to create a decision tree to predict if one customer will buy
our product given some infomation.
1. If there're only 5% or even lower positive samples in my train/test
data, will it affect the performance of decision tree algorithm?
2. Can I set some parameter to affect the penalty of FP(false positive)
and FN (false nagative)? Because the business impact of FP and FN are
quite different.

Thanks in advance!


.


Loading