How to Trap Rogue Data ?

I have an application that generates hourly system performance
logfiles which I graph to look for long term trending. 
The metric I use gradually varies from 1% to about 15% depending on
various external factors - such as time of day and day of week.

My problem is that the logfiles sometime hiccup and generate bad data
resulting is huge spikes in my curve. I have trapped for the big ones
> 20% in my source data but I  need something smarter so I can catch
large deviations from the curve.

Unfortunately I do not have the option to fix the application that
generated the bad data.


Are there any statistical function that I can use to look for such
deviations?

Something that  would allow me to toss any data over 5% from the trend
line would be perfect.

eg

value
1.2
1.9
2.4
3.1
2.6
11.3  toss this one using NA() since it is way off the curve
3.4
4.6
6.3
7.5
9.3
11.3  keep this one since it is not too far off the curve
8.8






Any suggestions?

Thanks

Bill



0
bill
7/15/2004 11:08:09 PM
excel.charting 18370 articles. 0 followers. Follow

4 Replies
542 Views

Similar Articles

[PageSpeed] 0

Bill -

Not too sophisticated, but what I've sometimes done is run a sort of 
moving average in the next column, of the two or three points above and 
below the row, and in the column after that only include data that 
varies by less than X from the averages. It's easier than a trendline, 
and if your spikes are one point wide, it works fine.

- Jon
-------
Jon Peltier, Microsoft Excel MVP
Peltier Technical Services
Tutorials and Custom Solutions
http://PeltierTech.com/
_______

bill.. wrote:

> I have an application that generates hourly system performance
> logfiles which I graph to look for long term trending. 
> The metric I use gradually varies from 1% to about 15% depending on
> various external factors - such as time of day and day of week.
> 
> My problem is that the logfiles sometime hiccup and generate bad data
> resulting is huge spikes in my curve. I have trapped for the big ones
> 
>>20% in my source data but I  need something smarter so I can catch
> 
> large deviations from the curve.
> 
> Unfortunately I do not have the option to fix the application that
> generated the bad data.
> 
> 
> Are there any statistical function that I can use to look for such
> deviations?
> 
> Something that  would allow me to toss any data over 5% from the trend
> line would be perfect.
> 
> eg
> 
> value
> 1.2
> 1.9
> 2.4
> 3.1
> 2.6
> 11.3  toss this one using NA() since it is way off the curve
> 3.4
> 4.6
> 6.3
> 7.5
> 9.3
> 11.3  keep this one since it is not too far off the curve
> 8.8
> 
> 
> 
> 
> 
> 
> Any suggestions?
> 
> Thanks
> 
> Bill
> 
> 
> 

0
DOjonNOT (619)
7/16/2004 2:11:11 AM
In message <5m2ef0h1c9sgqg81ukmgbdre7u7cjp5cel@4ax.com>, bill.. 
<b@c.com> writes
>
>I have an application that generates hourly system performance
>logfiles which I graph to look for long term trending.
>The metric I use gradually varies from 1% to about 15% depending on
>various external factors - such as time of day and day of week.
>
>My problem is that the logfiles sometime hiccup and generate bad data
>resulting is huge spikes in my curve. I have trapped for the big ones
>> 20% in my source data but I  need something smarter so I can catch
>large deviations from the curve.
>
>Unfortunately I do not have the option to fix the application that
>generated the bad data.
>
>
>Are there any statistical function that I can use to look for such
>deviations?

If you are sure they are definitely limited to single spikes on a 
nominal but noisy smooth trend line then the local second derivative 
estimator is a reasonable test. Set a threshold on that to decide on 
rogue points.

ABS(x[i-1]+x[i+1]-2x[i])

You really need to be sure that they *are* rogue though. The test has no 
way of knowing what you really intend. An alternative is to use 1-Norm 
fitting which will safely ignore modest numbers of rogue points.
>
>Something that  would allow me to toss any data over 5% from the trend
>line would be perfect.
>
>eg
>
>value
>1.2
>1.9
>2.4
>3.1
>2.6
>11.3  toss this one using NA() since it is way off the curve
>3.4
>4.6
>6.3
>7.5
>9.3
>11.3  keep this one since it is not too far off the curve
>8.8

I prefer my plots with all noise displayed and to fix the problem at 
source. You never know when sensor or equipment failure might produce 
real spikes in the signal that a filter will helpfully throw away.

Regards,
-- 
Martin Brown
0
Martin
7/16/2004 8:51:45 AM
Thanks for the suggestions 

I understand averaging the previous annd next points but  what is
1-Norm fitting?


Bill

On Fri, 16 Jul 2004 09:51:45 +0100, Martin Brown
<|||newspam|||@nezumi.demon.co.uk> wrote:

>In message <5m2ef0h1c9sgqg81ukmgbdre7u7cjp5cel@4ax.com>, bill.. 
><b@c.com> writes
>>
>>I have an application that generates hourly system performance
>>logfiles which I graph to look for long term trending.
>>The metric I use gradually varies from 1% to about 15% depending on
>>various external factors - such as time of day and day of week.
>>
>>My problem is that the logfiles sometime hiccup and generate bad data
>>resulting is huge spikes in my curve. I have trapped for the big ones
>>> 20% in my source data but I  need something smarter so I can catch
>>large deviations from the curve.
>>
>>Unfortunately I do not have the option to fix the application that
>>generated the bad data.
>>
>>
>>Are there any statistical function that I can use to look for such
>>deviations?
>
>If you are sure they are definitely limited to single spikes on a 
>nominal but noisy smooth trend line then the local second derivative 
>estimator is a reasonable test. Set a threshold on that to decide on 
>rogue points.
>
>ABS(x[i-1]+x[i+1]-2x[i])
>
>You really need to be sure that they *are* rogue though. The test has no 
>way of knowing what you really intend. An alternative is to use 1-Norm 
>fitting which will safely ignore modest numbers of rogue points.
>>
>>Something that  would allow me to toss any data over 5% from the trend
>>line would be perfect.
>>
>>eg
>>
>>value
>>1.2
>>1.9
>>2.4
>>3.1
>>2.6
>>11.3  toss this one using NA() since it is way off the curve
>>3.4
>>4.6
>>6.3
>>7.5
>>9.3
>>11.3  keep this one since it is not too far off the curve
>>8.8
>
>I prefer my plots with all noise displayed and to fix the problem at 
>source. You never know when sensor or equipment failure might produce 
>real spikes in the signal that a filter will helpfully throw away.
>
>Regards,

0
bill
7/16/2004 12:15:58 PM
In message <1lhff01hbmo0pjvba9lka2c48qu4tggua4@4ax.com>, bill.. 
<b@c.com> writes
>
>Thanks for the suggestions
>
>I understand averaging the previous annd next points but  what is
>1-Norm fitting?

Classical least squares minimises sum of squares of residuals. = 2-Norm

( y[i] - F(x[i])) ^ 2

1-Norm - minimises sum of absolute magnitude of residuals

| y[i] - F(x[i] |

For the simple case of a small dataset.

2-Norm estimator:       Mean    = average of all N data values

1-Norm estimator        Median = mid point of data histogram

The same trick can be applied to polynomial fitting with robust 
statistics. It is much better at ignoring modest numbers of erratic 
outliers in noisy data.

Regards,
-- 
Martin Brown
0
Martin
7/17/2004 12:18:15 PM
Reply:

Similar Artilces:

summing two columns of different data
I need to sum a dollar total from two different types of columns. One column identifies the item and another column associates a dollar amount with that item. The items are different tree species and the dollar amounts are the pruning costs associated with a specific tree species in a given year. For example: Column A Column B Column C Species 2011 2012 Lemon tree $100 Apple tree $50 Lemon tree $150 Cherry tree $50 $50 I want to sum the total dollar amount to prune only the lemon trees in 2012. Presently, I am using a COUNTIF to sum the total numb...

data in row didnt match import type 01-01-06
while importing almost 800 accounts , i got the following error for majority of accounts though i imported only 7 accounts. 'Data in row did not match the import type' what is the solution? How can i match it? whats is the way of comparision? is it possible you are trying to import accountnames into fields where crm is expecting the actual account guid rather than name? ========================= John O'Donnell Microsoft CRM MVP http://www.microsoft.com/businesssolutions/crm/community/microsoft_crm_community.mspx "Aami" <aamir34@hotmail.com> wrote in message ...

summarizing worksheet data
Hi, I want to create what should be a relatively simple spreadsheet with, say, fifty worksheets in the workbook. Each worksheet would be identical in terms of number of rows & columns, as well as the type of data those columns contain. On the first worksheet, however, I'd like to be able to summarize the cumulative totals of all the other sheets. So, for instance, if cell a20 on worksheets 2 through 50 contained a formula that calculated a sum of the figures in a1:a19, I'd like to put a formula in a cell in the first worksheet that will total the cell a20 in all the other w...

Gathering and adding data from different rows
Hi all, I run a report at work every day that tells me how many shares of a company have been sold and from what year these shares were granted to the participant. The year and the amount of shares are on different lines. For instance, the year would be on D5 and the share amount would be on D10. This is a daily function and the data changes every day. I am required to manually scan these reports and add up the shares for each year separately. I have to separate the amounts for 2003, 1997 and 1996. I want to create a spreadsheeet that I can dump this data into and have the spreadsheet find...

Unable to update public free/busy data
With Exchange 5.5, we are getting this error periodically in Microsoft Outlook. I believe this was caused because we used to have another exchange server which is no longer. I think the free/busy folder was stored on this other server. I wasn't here when this server existed so I'm somewhat unaware of information regarding that server. I found KB article 184151 which gave some instructions on how to resolve this, but I get stuck at step 5 where it asks me to double-click on the Schedule+ Free Busy Information. "The contents of this public folder are currently unavailable. Eit...

Entering Data Into Text Boxes and Worksheet Protection
I'm wondering if there's a way to enter data into a text box once a worksheet has been protected. Currently, once the sheet is protected, there's no way to enter data into a text box. Any thoughts appreciated. Thanks. Stephan There are two different textboxes--one from the drawing toolbar and one from the control toolbox toolbar. But you can rightclick on each and choose: format control (or Format CheckBox There's a tab call protection. Each has a Lock option. (the drawing toolbar has Locked and Lock Text) After I unchecked each of them, I could use them (when I protect...

two checking accounts data downloads to just one of the accts
I have two checking accounts with our bank and when I download transactions into Microsoft Money for either account, all the transactions go to only one of the accounts set up in Money. I was prompted for the first download to our "main" account. Then when I went to download for the other account, I wasn't prompted where to download to... and all the transactions went into the first account. Did that make sense? My bank required we switch to Money from Quicken, so I am new to this. Tried to search in users guide and online for help. Help would be much appreciated - ...

Access unwanted automatic data entry
when entering data in my Access table, if i use the tab ket to advance to the next field, it often enters a number in that field. Why does it do that nad how can I make it stop! Are entering data direcctly into Access table? If so, stop doing that, unless you only do it occasionally to repair data in database and you know database design and manage it. "Pat the biologist" <Pat the biologist@discussions.microsoft.com> wrote in message news:6B724974-C937-4C5C-BDBA-1A086C0712A1@microsoft.com... > when entering data in my Access table, if i use the tab ket to advance to...

Macro to pull every Nth row of data
I have a spreadsheet of data and I need to pull every 60th row out onto another sheet. Any simple macros? Thanks. Amy How about Sub Test() Source_Sheet = "Sheet1" Target_Sheet = "Sheet2" n = 5000 ' your last line of data on Sheet1 Target_Row = 1 Sheets(Target_Sheet).Select For nCount = 1 To n Step 60 Worksheets(Source_Sheet).Cells(nCount, 1).EntireRow.Copy Worksheets(Target_Sheet).Cells(Target_Row, 1).Select Selection.PasteSpecial Paste:=xlValues, Operation:=xlNone, SkipBlanks:= _ F...

Example using MFC WinInet classes with POST of FILE and other INPUT data
I am looking for a good example for POSTing both FILE and other INPUT data to a URL. I understand that the FILE must use a multi-part encoding but I do not understand how the atcual file contents get POSTed or how the INPUT data gets POSTed also. I am guessing the INPUT data can be added as part of the query parameters to the URL. ...

reading csv data that looks like a date but isn't
Hi Can anyone please help me? I have a .csv file. One of the fields is entered as "01-04" (with the quotes). How do I get Excel to read this as a string and not interpret it as a date (which it displays as 01-Apr)? Changing the format doesn't help - the damage is done on reading the csv data. Thanks -Howard ________________________________________________________ Howard Lethbridge lethcons@ntlworld.com Hi again I should have said that I want to do this when I double-click on the file. I know I can do this if I open Excel and do Data/Get External Data/Import Text File... ...

Last cell with data in a range
I enter weekly data into a spreadsheet with a summary page at the front. After every week, when new data is entered, I want the formula at the front to use the cell with the new data instead of me having to change the existing formula. For example, my data looks like this: A B C 276 300 421 175 0 0 0 I need a formula that will automatically detect the last number >0 in column C. Thank you for your assistance! =Countif(C2:C1000,">0") so your main page formula might be something like: =...

Inserting data from an Excel file to a database
Hi, I have an auto generated excel file(.xlsx) in a particular format . I tried importing the file to the SQL server database using the query below. SELECT * INTO temp_ExcelData2 FROM OPENROWSET('Microsoft.ACE.OLEDB. 12.0','Excel 12.0 Xml;Database=C:\test \DashboardData.xlsx;HDR=No;IMEX=1','SELECT * FROM [Sheet1$]') It is showing an error as Could not locate registry entry for OLE DB provider 'Microsoft.ACE.OLEDB.12.0'. SELECT * INTO temp_ExcelData2 FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0','Excel 12.0 Xml;Database=C:\test \DashboardData.xlsx;H...

Recover exchange 2000 data
Hi Can anyone help with this please? i have an SBS2000 server (with exchange 2000) that wont boot due to the disk filling up and corrupting files. i have put in a new SBS2003 server and have it running ok but need to get the exchange data form the old server to the new..... is there a simple way to do this? like connecting the old data drive and "importing" the mail boxes or even just the emails.... thanks for any help Dan In news:58940B53-E821-4E3E-9FA3-BE988DEAD37B@microsoft.com, Dan da Box <Dan da Box@discussions.microsoft.com> typed: > Hi Can anyone help with t...

Retrieving data from Great Plains
I am attempting to integrate an application for one of our customers by retreiving data from great plains for display in the application. What I want to do is get a list of vendors from the great plains application, however, I am unsure if I should be using eConnect to do this or if it's recomended to access the GP tables directly. If table access is allowed then I'm set that's quite simple to do now that I've found the data architecture documents. However, if you are supposed to use eConnect I am completely lost. How would I go about retrieving a list of vendors using...

Update Chart Source Data
I used the "Record Macro" function to get the code shown below. When I recorded the macro, the value of A1 was 392. But when I actually ran the macro, the value of A1 was 393. As you can see, the "Record Macro" function "hard coded" the value of 392. How can I make this work such that the SeriesCollection is updated with the CURRENT value of A1? Sheets("Chart2").Select Range("A1").Select Selection.Copy Sheets("Chart1").Select ActiveChart.PlotArea.Select Application.CutCopyMode = False Act...

Data Validation in XL2007 suddenly stops functioning
I created a workbook in XL2003 that included some named ranges on Sheet2 that fed some data validation lists on Sheet1. Everything worked as expected. My colleague opened up the workbook in XL2007 and used it several times, and everything worked as expected. Then she called me, saying that it was broken. Of the 6 named ranges and related data validation columns on the other sheet, none of them were working anymore. I went over to her PC, and checked everything I could think of- I could set up working data validation test cells on Sheet2, but could not get the ones on Sheet1 to...

Global Data Catolog/Exchange
Folks I am having problems installing exchange on one of my servers. I had exchange 07 installed but it would not load in the MMC so I uninstalled it. Tried to reinstall it and get a message that the Global Data Catolog cannot be found. I have aslo noticed that I cannot access Sharepoint which is installed on the same server. Neither is in production yet but plan was to go live this week. I have uploaded 2 files from the DC: ntfrsutl.txt - result fo running ntfrsutl ds on the DC dcdiag - result of running dcdiag on the DC I have also included the IP config files from ...

Removing data from a list.
I was just wondering how could I remove duplicates from a long list, I also want to eliminate the one I searched for initially: Lets say I have: Hello Hello Hello1 Hello2 I'd like to seach for Hello, and have both of them removed. Any ideas on how I can do this ? ------------------------------------------------ ~~ Message posted from http://www.ExcelTip.com/ ~~ View and post usenet messages directly from http://www.ExcelForum.com/ Hi David, Here is a general macro I've used in the past for removing duplicate rows based on the data in a particular column. I'm sure you can ...

Pivot Table: showing data as % of subtotal
Hi, I have pivot table data in this format: Name Type Amount ABC 1 $100 2 $200 3 $300 4 $400 ABC Total $1,000 DEF 1 $150 2 $260 3 $310 4 $420 DEF Total $1,140 and so on .... .... .... I want to see the data as follows: Name Type Amount % ABC...

Appending data in cells that utilize a Dropdown list.
Hello all, I would like to append entries in cells that utilize a dropdown list. Currently, when I select (in this case names) from my dropdown list I cannot append them if I want to add more than one to a cell. If I select another name from the dropdown, it erases the first entry. Is there some way to append entries within cells that use this form of validation so that I can enter more than one in a cell from the dd list? -- Thanks, Geek using Office XP Professional Hi Geek Debra Dalgleish has an example workbook showing how to do this http://www.contextures.com/excelfiles.html...

Importing vcalendar data to multiple users
I'm running Exchange 5.5. Several times a year I have to create a large group of users. I'd like to be able to pre-populate each user's calendar with pertinent calendar events. Is there a way to import this type of data in batch mode? See if http://www.slipstick.com/calendar/holidays.htm#company helps. ME wrote: > I'm running Exchange 5.5. Several times a year I have to create a > large group of users. I'd like to be able to pre-populate each user's > calendar with pertinent calendar events. > > Is there a way to import this type of data in batch...

Sample Data Question
Does the sample data installed using the Microsoft.Crm.Tools.SampleDataWizard.exe inlude sample accounts, contacts, leads, opportunities... I need this data for testing and an none of these was installed, knowing that I installed Microsoft CRM Server 3.0 and the sample data successfully. Hi , Yes the sample data includes all you mentioned. Mike "Mr. Bean" <mkhiami@gmail.com> wrote in message news:1141117198.422438.167040@i39g2000cwa.googlegroups.com... > Does the sample data installed using the > Microsoft.Crm.Tools.SampleDataWizard.exe inlude sample accounts, > ...

Cell changes alignment after data entry
When I select a row and enter data with Excel 2002, some of the cells change alignment to centered from not centered upon hitting Enter. This problem did not occur on the same worksheet using Excel 2000. Any idea why, or how to prevent? My updates are current. What did you type into the cells? If you started with ^ (caret) and you have tools|options|transition Tab|transition navigation keys checked, you'll get this behavior (it's to match what Lotus 123 accepted). (and you don't have any event macros running and the cells aren't formatted as Centered???) Howard Mo...

data validation
my excel skills is no more than having a certain cell multiplyin another cell. scenerio: i have 2 cells, with drop down box. box2 is dependant to selections o box1. after i chose the item in box2, i would like, let say, cell A3, A4, A to display certain data about the item i selected in box2. lets say, i selected "Peter" on box2, i would like A3 to display hi "sex", A4 being "age", A5 is "height". thank -- Message posted from http://www.ExcelForum.com Hi have a look at http://www.contextures.com/xlDataVal02.html for dependent data validation lis...