RE: Outliers

Tech-Archive recommends: Fix windows errors by optimizing your registry



I replied in the original thread

thanks
bogdan


>
> I'm a newbie to both data mining generally and Sql Server BI in particular.
>
> I've played with the Linear Regression and Decision Tree algorithms a fair amount, but I have a question.
>
> I've created a test database with both a discrete and continuous attribute. The continuous attribute runs linearly within one of the discrete attributes and the nwill change and run linearly in a different direction.
>
> My whole point was to try and get the DT engine to create new nodes when it detects a change in the 'line'. This works great and it is way cool. The machine predicts everything perfectly and it works just as I would have expected.
>
> So now the caveat.
>
> Any regression based analysis, or so it seems to me, has major problems with outliers. If I change even just a few records of data so that they do not regress well, then the predictions, as you would expect, degrade badly.
>
> I see nothing in the BI stuff about outlier removal or analysis of any kind.
>
> I am presuming that the Model Accuracy tab will be of limited use here.
>
> What can I do to detect outliers?
>
> Hope this isn't too dumb...
>
.



Relevant Pages

  • RE: Outliers
    ... > I'm a newbie to both data mining generally and Sql Server BI in ... > I've played with the Linear Regression and Decision Tree algorithms a ... > Any regression based analysis, or so it seems to me, has major problems ... > What can I do to detect outliers? ...
    (microsoft.public.sqlserver.datamining)
  • Outliers
    ... I'm a newbie to both data mining generally and Sql Server BI in particular. ... I've played with the Linear Regression and Decision Tree algorithms a fair amount, ... Any regression based analysis, or so it seems to me, has major problems with outliers. ...
    (microsoft.public.sqlserver.datamining)
  • Outliers
    ... I'm a newbie to both data mining generally and Sql Server BI in particular. ... I've played with the Linear Regression and Decision Tree algorithms a fair amount, ... Any regression based analysis, or so it seems to me, has major problems with outliers. ...
    (microsoft.public.sqlserver.datamining)
  • Re: Difference between Data Mining and Machine Learning
    ... "Machine learning" is a fairly general term for model building that ... It could be applied to regression, ... regression is rarely used for data mining. ... would be significant predictors. ...
    (sci.stat.consult)
  • Re: Approximate solution to linear regression
    ... regression software and spend many hours ... Unless...you buy their more expensive "data mining" packages that cost ... the extensive data mining routines cost X * $0 ...
    (sci.stat.consult)