# how do I identify statistical outliers with excel?

*From*: joeu2004@xxxxxxxxxxx*Date*: 7 Jan 2007 03:02:21 -0800

Jeff apparently wrote:

Subject: how do I identify statistical outliers with excel?

You start by putting your inquiry in the __body__ of your posting, not

just in the subject line. Some news readers -- notably

http://www.microsoft.com/communities/newsgroups/en-us/default.aspx --

do not show the entire subject line, and they do not permit posting

responses when the body of the posting is empty.

I will offer some answers to your question if you promise not to use

this information to blindly and/or automatically exclude outliers from

your data. There are many reasons for deprecating automatic exclusion;

some people even deprecate excluding outliers manually. The primary

reason for excluding outliers is if you believe they represent data

entry errors. But some people argue that even that is not a good

reason, pointing to some unusual historical anecdotes.

Two numerical methods are commonly used to identify outliers: (1)

relying on the interquartile range (IQR); and (2) relying on the

standard deviation (SD). I prefer #1; #2 seems valid only when the

distribution is known to be "normal". In either case, it is best to

graph the data to confirm that the data are truly outliers in a

subjective sense. Often, "outliers" are merely indicative of extreme

skew in the distribution.

One way to graph the data in Excel is to select the data, then click on

the Chart Wizard on the toolbar and select the XY (Scatter) chart type.

Some writers suggest using histograms for this purpose. I disagree

because the shape of a histogram is greatly affected by the width of

the bars with respect to the data distribution, which is an art and

error-prone.

One definition of an outlier is that it is outside"x" times the

interquartile range from first or third quartile. "X" is typically 1.5

(mild) or 3 (extreme). I prefer 3 because it is more conservative. In

Excel, the following compute the first and third quartiles in A1 and A2

and the interquartile range (IQR) in B1 for the data in D1:D100, for

example:

A1: =QUARTILE(D1:D100,1)

A2: =QUARTILE(D1:D100,3)

B1: =A2 - A1

Then the following determines if D1 is an "extreme" outlier:

=if(or(D1 < A1 - 3*B1, D1 > A2 + 3*B1), "outlier?", "")

Another definition of an outlier is that it is outside "x" standard

deviations (SD) from the average. "X" is typically 3 or 4. I would

prefer 4 because it is more consistent with the definition of "extreme

outlier" above. In Excel, if A1 contains =AVERAGE(D1:D100) and B1

contains =STDEV(D1:D100), for example, the following determines if D1

is an outlier:

=if(or(D1 < A1 - 4*B1, D1 > A1 + 4*B1), "outlier?", "")

.

**Follow-Ups**:**Re: how do I identify statistical outliers with excel?***From:*Don Guillett

- Prev by Date:
**Re: how can we hide the steps of macro running** - Next by Date:
**Re: Upper case data entry** - Previous by thread:
**Re: Recovering a sheet** - Next by thread:
**Re: how do I identify statistical outliers with excel?** - Index(es):