Re: How to Trap Rogue Data ?

From: Martin Brown (|||newspam|||_at_nezumi.demon.co.uk)
Date: 07/16/04


Date: Fri, 16 Jul 2004 09:51:45 +0100

In message <5m2ef0h1c9sgqg81ukmgbdre7u7cjp5cel@4ax.com>, bill..
<b@c.com> writes
>
>I have an application that generates hourly system performance
>logfiles which I graph to look for long term trending.
>The metric I use gradually varies from 1% to about 15% depending on
>various external factors - such as time of day and day of week.
>
>My problem is that the logfiles sometime hiccup and generate bad data
>resulting is huge spikes in my curve. I have trapped for the big ones
>> 20% in my source data but I need something smarter so I can catch
>large deviations from the curve.
>
>Unfortunately I do not have the option to fix the application that
>generated the bad data.
>
>
>Are there any statistical function that I can use to look for such
>deviations?

If you are sure they are definitely limited to single spikes on a
nominal but noisy smooth trend line then the local second derivative
estimator is a reasonable test. Set a threshold on that to decide on
rogue points.

ABS(x[i-1]+x[i+1]-2x[i])

You really need to be sure that they *are* rogue though. The test has no
way of knowing what you really intend. An alternative is to use 1-Norm
fitting which will safely ignore modest numbers of rogue points.
>
>Something that would allow me to toss any data over 5% from the trend
>line would be perfect.
>
>eg
>
>value
>1.2
>1.9
>2.4
>3.1
>2.6
>11.3 toss this one using NA() since it is way off the curve
>3.4
>4.6
>6.3
>7.5
>9.3
>11.3 keep this one since it is not too far off the curve
>8.8

I prefer my plots with all noise displayed and to fix the problem at
source. You never know when sensor or equipment failure might produce
real spikes in the signal that a filter will helpfully throw away.

Regards,

-- 
Martin Brown


Relevant Pages

  • Re: How to Trap Rogue Data ?
    ... >>resulting is huge spikes in my curve. ... >fitting which will safely ignore modest numbers of rogue points. ... >>Something that would allow me to toss any data over 5% from the trend ...
    (microsoft.public.excel.charting)
  • How to Trap Rogue Data ?
    ... My problem is that the logfiles sometime hiccup and generate bad data ... resulting is huge spikes in my curve. ... large deviations from the curve. ... Something that would allow me to toss any data over 5% from the trend ...
    (microsoft.public.excel.charting)
  • Re: appropriate filter
    ... It is amazing that you can fit my curve with Lev-Mar Curve Fitting. ... However my goal is hysteresis loop and it is directly ... any way to eliminate the spikes because without eliminating the spikes, ...
    (comp.lang.labview)