how do I identify statistical outliers with excel?



Jeff apparently wrote:
Subject: how do I identify statistical outliers with excel?

You start by putting your inquiry in the __body__ of your posting, not
just in the subject line. Some news readers -- notably
http://www.microsoft.com/communities/newsgroups/en-us/default.aspx --
do not show the entire subject line, and they do not permit posting
responses when the body of the posting is empty.

I will offer some answers to your question if you promise not to use
this information to blindly and/or automatically exclude outliers from
your data. There are many reasons for deprecating automatic exclusion;
some people even deprecate excluding outliers manually. The primary
reason for excluding outliers is if you believe they represent data
entry errors. But some people argue that even that is not a good
reason, pointing to some unusual historical anecdotes.

Two numerical methods are commonly used to identify outliers: (1)
relying on the interquartile range (IQR); and (2) relying on the
standard deviation (SD). I prefer #1; #2 seems valid only when the
distribution is known to be "normal". In either case, it is best to
graph the data to confirm that the data are truly outliers in a
subjective sense. Often, "outliers" are merely indicative of extreme
skew in the distribution.

One way to graph the data in Excel is to select the data, then click on
the Chart Wizard on the toolbar and select the XY (Scatter) chart type.
Some writers suggest using histograms for this purpose. I disagree
because the shape of a histogram is greatly affected by the width of
the bars with respect to the data distribution, which is an art and
error-prone.

One definition of an outlier is that it is outside"x" times the
interquartile range from first or third quartile. "X" is typically 1.5
(mild) or 3 (extreme). I prefer 3 because it is more conservative. In
Excel, the following compute the first and third quartiles in A1 and A2
and the interquartile range (IQR) in B1 for the data in D1:D100, for
example:

A1: =QUARTILE(D1:D100,1)
A2: =QUARTILE(D1:D100,3)
B1: =A2 - A1

Then the following determines if D1 is an "extreme" outlier:

=if(or(D1 < A1 - 3*B1, D1 > A2 + 3*B1), "outlier?", "")

Another definition of an outlier is that it is outside "x" standard
deviations (SD) from the average. "X" is typically 3 or 4. I would
prefer 4 because it is more consistent with the definition of "extreme
outlier" above. In Excel, if A1 contains =AVERAGE(D1:D100) and B1
contains =STDEV(D1:D100), for example, the following determines if D1
is an outlier:

=if(or(D1 < A1 - 4*B1, D1 > A1 + 4*B1), "outlier?", "")

.



Relevant Pages

  • Re: Extreme values vs. outliers
    ... 2)If it occurs due to chance - Proceed to "Extreme value analysis". ... distribution and you assumed a Cauchy distribution in your analysis, ... then the extreme values are not "outliers". ... the noise has a Cauchy distribution and you assumed a Gaussian ...
    (sci.stat.math)
  • RE: Math question
    ... what you are willing to assume about your data (Excel cannot read your mind). ... routinely eliminating outliers can be dangerous. ... methods that are robust to outliers, ... faulty measurement. ...
    (microsoft.public.excel)
  • Re: Is it possible to drag points in an Excel 2007 Scatter Chart?
    ... for removing "outliers" from long lists of data. ... Outliers are easily spotted on a plot, so in order to easily delete the ... Jon Peltier, Microsoft Excel MVP ...
    (microsoft.public.excel.charting)
  • Re: how do I identify statistical outliers with excel?
    ... some people even deprecate excluding outliers manually. ... distribution is known to be "normal". ... Excel, the following compute the first and third quartiles in A1 and A2 ... Then the following determines if D1 is an "extreme" outlier: ...
    (microsoft.public.excel.misc)
  • How do I test outlier in Excel?
    ... I have datasets and would like to test for outliers. ... Now I can do al math in Excel by myself, but I wonder whether this has ... Prev by Date: ...
    (microsoft.public.excel.misc)