Garbage date data now my headache

Okay folks I have a painful .csv file that I recieve daily that is now my 
problem. Granted I am only as good as my data so let me show you an example 
of the data:

Column A                        Column B
Date Code                      Date Code Answer
X71229                          12/29/2007
X71203330382               12/03/2007
X71023 G3                     10/23/2007
X71003DWOODS            10/03/2007
X70913 D5                     09/13/2007
X70913                          09/13/2007
20071215                       12/15/2007
20071005                       10/05/2007
81007                            08/10/2007
80211                            08/02/2011
071220                           12/20/2007

So as you can see are some of the problems with the data:
the string length of the data changes, the format of some cells are text and 
some are numbers, some of the data has letter in front and others don't, some 
have the full year typed out then month then day and others don't. the data 
is what I call garbage but like I stated it is my headache to fix.

So my goal is to try and come up with some methodology to fix it. Any takers 
ready for this headache.

-- 
GS
0
12/28/2007 5:43:01 PM
excel 39879 articles. 2 followers. Follow

4 Replies
648 Views

Similar Articles

[PageSpeed] 57

Excel, as all computer programs, is a beast of logic.  I can think of 
various techniques to clean up the first column and put those entries into a 
more manageable format, but the logic of some of your examples is beyond 
logic.  For instance, why is 80211 08/02/11 and not 02/11/2008 while 71229 
is 12/29/2007 and not 07/12/2029?  That particular twist of logic (or lack 
of) is not something that a computer can handle.  Perhaps you have some 
insight that you might be able to share that could be used by a computer to 
differentiate those values.  Perhaps some data in some other column could 
point to a particular logic pattern to use for that entry.
A cleanup of the first column would be like removing leading and trailing 
text, as well as removing the G3 and D5.  Then, with a logic pattern 
furnished by you, Excel could convert all the numbers to dates.  Post back 
if any of the above could help you.  Otto
"Gary F Shelton" <GaryFShelton@discussions.microsoft.com> wrote in message 
news:5DA74809-6A25-4116-8F2D-1CB59B946934@microsoft.com...
> Okay folks I have a painful .csv file that I recieve daily that is now my
> problem. Granted I am only as good as my data so let me show you an 
> example
> of the data:
>
> Column A                        Column B
> Date Code                      Date Code Answer
> X71229                          12/29/2007
> X71203330382               12/03/2007
> X71023 G3                     10/23/2007
> X71003DWOODS            10/03/2007
> X70913 D5                     09/13/2007
> X70913                          09/13/2007
> 20071215                       12/15/2007
> 20071005                       10/05/2007
> 81007                            08/10/2007
> 80211                            08/02/2011
> 071220                           12/20/2007
>
> So as you can see are some of the problems with the data:
> the string length of the data changes, the format of some cells are text 
> and
> some are numbers, some of the data has letter in front and others don't, 
> some
> have the full year typed out then month then day and others don't. the 
> data
> is what I call garbage but like I stated it is my headache to fix.
>
> So my goal is to try and come up with some methodology to fix it. Any 
> takers
> ready for this headache.
>
> -- 
> GS 


0
ottokmnop (389)
12/28/2007 7:29:55 PM
Otto,

Let me give a brief of what I know and then I will lead into some 
assumptions of mine. 

My data is for consumable products that have a shelf life. My company pays a 
vendor to store our product. The vendor is geographically located in Atlanta, 
Chicago, New York, Los Angeles, etc. When I get data from this vendor I go to 
their website where the data is consolidated. So what I assume is happening 
and some of the patters I am seeing is that the date data formats change when 
the their is a change in Warehouses which leads me to believe that each 
geographically located Warehouse inputs and uploads their data in different 
formats. As you pointed out some of the data is just down right garbage data 
like the 080211 stuff as that had to be a human error in someone on their end 
mistypig it... But you are right that I am going to have to use some other 
records in my data set to try and make logical patterns out of the date code 
data. I think the big thing I will need to do is get the data parsed out to 
the most logical pattern I can and then manually alter the outliers... I 
guess that I can do is work the.csv file before I try and import it into 
excel ... The other thing that is frustrating is that this vendor must make 
lots of changes to their website titles as last month they had certain title 
names and they had them in certain columns and now in December they made 
changes... So as you can imagine it makes it even harder for me try and 
standardize a process on my end... I will be working on this file some more 
and if I do get stuck I am sure I will ask some more questions...
-- 
GS


"Otto Moehrbach" wrote:

> Excel, as all computer programs, is a beast of logic.  I can think of 
> various techniques to clean up the first column and put those entries into a 
> more manageable format, but the logic of some of your examples is beyond 
> logic.  For instance, why is 80211 08/02/11 and not 02/11/2008 while 71229 
> is 12/29/2007 and not 07/12/2029?  That particular twist of logic (or lack 
> of) is not something that a computer can handle.  Perhaps you have some 
> insight that you might be able to share that could be used by a computer to 
> differentiate those values.  Perhaps some data in some other column could 
> point to a particular logic pattern to use for that entry.
> A cleanup of the first column would be like removing leading and trailing 
> text, as well as removing the G3 and D5.  Then, with a logic pattern 
> furnished by you, Excel could convert all the numbers to dates.  Post back 
> if any of the above could help you.  Otto
> "Gary F Shelton" <GaryFShelton@discussions.microsoft.com> wrote in message 
> news:5DA74809-6A25-4116-8F2D-1CB59B946934@microsoft.com...
> > Okay folks I have a painful .csv file that I recieve daily that is now my
> > problem. Granted I am only as good as my data so let me show you an 
> > example
> > of the data:
> >
> > Column A                        Column B
> > Date Code                      Date Code Answer
> > X71229                          12/29/2007
> > X71203330382               12/03/2007
> > X71023 G3                     10/23/2007
> > X71003DWOODS            10/03/2007
> > X70913 D5                     09/13/2007
> > X70913                          09/13/2007
> > 20071215                       12/15/2007
> > 20071005                       10/05/2007
> > 81007                            08/10/2007
> > 80211                            08/02/2011
> > 071220                           12/20/2007
> >
> > So as you can see are some of the problems with the data:
> > the string length of the data changes, the format of some cells are text 
> > and
> > some are numbers, some of the data has letter in front and others don't, 
> > some
> > have the full year typed out then month then day and others don't. the 
> > data
> > is what I call garbage but like I stated it is my headache to fix.
> >
> > So my goal is to try and come up with some methodology to fix it. Any 
> > takers
> > ready for this headache.
> >
> > -- 
> > GS 
> 
> 
> 
0
12/28/2007 10:11:01 PM
On Fri, 28 Dec 2007 09:43:01 -0800, Gary F Shelton
<GaryFShelton@discussions.microsoft.com> wrote:

>Okay folks I have a painful .csv file that I recieve daily that is now my 
>problem. Granted I am only as good as my data so let me show you an example 
>of the data:
>
>Column A                        Column B
>Date Code                      Date Code Answer
>X71229                          12/29/2007
>X71203330382               12/03/2007
>X71023 G3                     10/23/2007
>X71003DWOODS            10/03/2007
>X70913 D5                     09/13/2007
>X70913                          09/13/2007
>20071215                       12/15/2007
>20071005                       10/05/2007
>81007                            08/10/2007
>80211                            08/02/2011
>071220                           12/20/2007
>
>So as you can see are some of the problems with the data:
>the string length of the data changes, the format of some cells are text and 
>some are numbers, some of the data has letter in front and others don't, some 
>have the full year typed out then month then day and others don't. the data 
>is what I call garbage but like I stated it is my headache to fix.
>
>So my goal is to try and come up with some methodology to fix it. Any takers 
>ready for this headache.


The formats that you have are translatable and consistent except for the next
to last two that start with an eight.

All of the others are basically in a YMD format that is either
	yymmdd  or
	ymmdd   or
	yyyymmdd
(with extraneous information possible before and after)

The two dates that start with an eight are both in mddyy format.

How do you know that the Date Code Answer for those two entries is what it is?

If we could apply the same logic, we might have a chance.  Maybe there is
something in another field?
--ron
0
ronrosenfeld (3122)
12/29/2007 12:40:01 AM
On Fri, 28 Dec 2007 09:43:01 -0800, Gary F Shelton
<GaryFShelton@discussions.microsoft.com> wrote:

>Okay folks I have a painful .csv file that I recieve daily that is now my 
>problem. Granted I am only as good as my data so let me show you an example 
>of the data:
>
>Column A                        Column B
>Date Code                      Date Code Answer
>X71229                          12/29/2007
>X71203330382               12/03/2007
>X71023 G3                     10/23/2007
>X71003DWOODS            10/03/2007
>X70913 D5                     09/13/2007
>X70913                          09/13/2007
>20071215                       12/15/2007
>20071005                       10/05/2007
>81007                            08/10/2007
>80211                            08/02/2011
>071220                           12/20/2007
>
>So as you can see are some of the problems with the data:
>the string length of the data changes, the format of some cells are text and 
>some are numbers, some of the data has letter in front and others don't, some 
>have the full year typed out then month then day and others don't. the data 
>is what I call garbage but like I stated it is my headache to fix.
>
>So my goal is to try and come up with some methodology to fix it. Any takers 
>ready for this headache.

As I think about this more, it occurs to me that you will also need to know the
purpose of the "X".  If it is a placeholder for a "0", that would be good.

However, if it is some other kind of indicator, that might be followed by
either a five or six digit date code, then other entries become ambiguous.

For example, an entry similar to your #2:

X710101330382 could be
2007-10-10  or
1971-01-01



--ron
0
ronrosenfeld (3122)
12/29/2007 3:05:34 AM
Reply:

Similar Artilces:

Calculating Due Date
I have an Effective_Date field in my table and need to calculate due dates. If the date is before or equal to the 15th of the month, show the Effective Date, if after show the first of the next month, that would be the 1st due date. Then I need to show the second date as 30 days from the first date and the third date as 60 days from then. The other problem I'm have is in a report I want to show all of those with a date in say "October" whether it is the 1st, 2nd or 3rd due date. Can this be done with an expression, or do I need to create fields for those dates? Thanks...

Need formula to check data in 2 columns to sum 3rd column
Need a formula to enter on Report Sheet under Month as indicated from tables below. The data base worksheet has info inserted daily and the report will be on another worksheet to calculate as data is updated. Any ideas?? (Need total paid for all rows paid to Name 1 with dates in month of Jan) Data Base Worksheet Report Sheet NAME DATE PAID NAME JAN FEB Name 1 01/01/04 $200.00 Name 1 ??Paid ??Paid Name 1 01/22/04 $200.00 Name 2 ??Paid ??Paid Name 2 01/15/04 $200.00 Name 1 02/04/04 $200.00 Hi try the following =SUMPRODUCT(--(A1:A100="Name 1"),--(YEAR(B1:B100)=2004),--(MONTH(B1:...

Same proc... same data source... different results
Hi guys, A developer in our office has come to me asking for assistance, and I have no idea what can be causing this. We're running SQL Server 2005 SP3. We have a (when printer) 23 page long stored proc which gathers and formats data for a report. The proc makes use of Temp tables, table variables, FOR XML PATH, Rank(), Partion etc. She runs the proc... and stores the result set to a .txt file. She then runs it again, stores the data to a 2nd txt file. She then uses WinMerge to look at the differences between the two files, and they differ. Out of the 80,000 rows or some...

dowloading data of CSV file to excel file
Hi, I have the CSV file with more that 200000 lines and I need to import it into the excel sheet. I have tried using Import wizard and it did not work. I am using the Excel 2003. Please let me know the other way I can import the complete data to the excel. Regards, Igneshwara Reddy xl2003 only has 64k rows in a worksheet. Maybe you could use your favorite text editor and split your file into 3 smaller pieces. And then import each piece into its own sheet???? Igneshwara reddy wrote: > > Hi, > > I have the CSV file with more that 200000 lines and I need ...

Error creating dates
I can't create new dates in Microsoft Outlook Calendary When i do double click nothing happens... Any idea? Thanks in advance. Reboot your computer. --� Milly Staples [MVP - Outlook] Post all replies to the group to keep the discussion intact. All unsolicited mail sent to my personal account will be deleted without reading. After furious head scratching, Virgulilla asked: | I can't create new dates in Microsoft Outlook Calendary | | When i do double click nothing happens... | Any idea? | | Thanks in advance. ...

Graphs/Charts by country. Same Graph, different Source Data
Hi I have sales information for 5 countries, each on separate sheets ( the sheets are identical in format) . I have made graphs for the first country, and I want to copy them to the other countries, and change the references to pick up the right country information. Is there a quick way of doing this or do i have to go into source information for each series for each graph and change the sheet( country ) reference I asppreciate any answers to this as atthe moment I've spent the better part of a day changing sheet references Rgds Dan ...

Report Generation Variables for Date
I'm trying to define a filter in a report, using the last month start and last month end as low and high limits. Can somebody tell me the name of this variables, like <Today> is self explanable. -- alvaroguzman this is all I have: <MonthStart> first day of month <WeekStart> date of first day of this week <LastWeekStart> date of first day of last week <YearStart> first day of year <Now> current date/time <ReportDate> date of report "Alvaro Guzman" <aguzmanc2005@hotmail.com> wrote in message news:7C2DA569-6C0C-420C-9F49-4FC...

Converting Date to Beginning of the Month
Hello All, I have an issue with dates. I am trying to convert any date such as 11/27/2006, 3/14/2005, to essentially only the month and year. I have a formula that will only work correctly if all dates are the first of the month so I need to convert the above dates to this: 11/1/2006 and 3/1/2005. Is there a way to do this? Any help would be greatly appreciated! Thanks Try this: =Date-DAY(Date)+1 A1 = 11/27/2006 =A1-DAY(A1)+1 = 11/1/2006 Biff <carlsondaniel@gmail.com> wrote in message news:1164840724.358071.298260@j72g2000cwa.googlegroups.com... > Hello All, > > I h...

Copying data #3
Hi I need help in copying data. I have data exported from a text file and has somthing like following format - Main Grp: 600 Sub Grp: 11890 grp desc amount 7878 ABCDEMF 89887 ABCDEMF 89784 ABCDEMF 45654 ABCDEMF 56587 <<page break>>> Main Grp: 600 Sub Grp: 11900 grp desc amount 7900 ABCDEMF 56844 ABCDEMF 12546 ABCDEMF 74125 ABCDEMF 36985 so need to copy 'grp' below alongwith the lines of data within the group and 'sub grp' also alongside the data.. may be on right side after amount. I tried with Edit/Go to.. and spe...

combone data from more that one page
what is the easiest way (if possible)to conduct analysis of data from more than one page consider I have 5identical pages (one for each sales regions) with lots of products in column a and months in row 1 with data in a table is there anyway i can conduct analysis of the 5 sales regions as a whole or by area ideally i would like a pivot that had the regions as a field and the months and products so i can look at the data in anyway poss If you want to create a pivot table from the data, you should store it all on one sheet, if it won't exceed the available number of rows. Before you com...

displaying multiple dates in a table
any help/advice is much appreciated.... let me explain... i have this table w/c have multiple date/time fields... corresponding to birthdays of people. a single record of a person contains his birthday, his spouse's name & birthday, and also his sons/daughters names & birthdays... now i have already done a query in w/c i can get all persons birthday to display per month (month parameter)... i have done a query just like the one i made above, and it will display the spouses & sons/daughters birthday corresponding to the month i typed in. but the query also displaye...

Filter data from two worksheets from same excel file
i have a set of account numbers in 2 worksheets( created at diff dates). i want to find out the ones which are not there in the earlier. Both worksheets are saved in a single file. The account numbers are of 11 digits and the each sheet contains hundreds of such account numbers To compare entries in Column A of sheet2 with entries in Column A Sheet1 in your sheet2 (new column G) enter in first data row(say G2)) =COUNTIF(Sheet1!A:A,Sheet2!G2)=0 and Copy down as far as is needed FALSE = These values are on Both Sheets TRUE = These values Are NOT on Sheet1 "anu" wrote: > i...

How can I eliminate rows that contain the same data as the row ab.
How can I compare rows in Excel aand eliminate those rows that have duplicate information in them? Use data>filter>advanced filter, select unique records only -- Regards, Peo Sjoblom "mjlucas49" <mjlucas49@discussions.microsoft.com> wrote in message news:56E5DD0C-E9D0-4C25-8956-C96F285D320B@microsoft.com... > How can I compare rows in Excel aand eliminate those rows that have duplicate > information in them? ...

conditional format -- if not a date, then highlight cell
How do I enter a conditional format to highlight the cell if it's not a valid date? I tried "formula is": =if(not isdate(C10)) <>if(isdate(C10)) Your help would be greatly appreciated!! I think I'd put some data validation on the field to ensure that it's a date as well. -- HTH, Barb Reinhardt "laavista" wrote: > How do I enter a conditional format to highlight the cell if it's not a valid > date? > > I tried "formula is": > > =if(not isdate(C10)) > <>if(isdate(C10)) > ...

date and time formating in one cell
I have the following date/time stamps in one cell: 20031229152357 20031230150834 20031230150834 etc... As you can tell, the date is yyyy\m\d and time is h:mm:ss The problem I have is the numbers appear exactly as above. My question is how do I get the respective date and time to show up in the cells? Right now, I am only getting the "##############" to show in the cells. Thanks for your help, Kurt Kurt, Are you not inputting this data as time, that is 29/12/2003 -- HTH Bob Phillips ... looking out across Poole Harbour to the Purbecks (remove nothere from the email a...

Wokflow email with Date field inside
Hi, I need to sent an email with Dete field and every time CRM sent Value like this: "10-08-2008 00:00 (GMT+01:00) Sarajevo, Skopje, Warsaw, Zagreb" I want sent only Date like this: "10-08-2008" nothing more... Is it possible? Thanks for any help As well i know you can call Assembly in workflow to change this format. /aamir There's actually a Custom CRM Attribute for Time Zone. Create an Integer (Int) field, and from the Format picklist, select "Time Zone". You'll then see a "Default Value" picklist, where you can choose a default time...

Data Validation
I am using data validation from a list in Excel 2007. When I click the drop down arrow and the cell was previously blank, I do not see my options. I have to scroll up to see my choices. I realize that it is doing this because blank cells exist in my list and it jumps to the first one when I click the down arrow, I then see what is below that cell in the list which are more blank cells. Is there something that I can do so that if the cell is blank, when I click the drop down arrow, it will show me the top of the list first? I realize one work around is to leave the top row blank. But othe...

specific data points in data range
Hi all and happy new year. Is it possible to plot certain points and then a range of data on one series line and if so how ? Basically I have 4 years of data (monthly) but now want to report jan & Jul results for 2006 - 2008 and then all of 2009 data. thanks in anticipation Roy -- thanks Roy ...

Automatic data update
Dear All, I want to know whether there is such function in Ms Excel?.. For example, I have some data series consisting 100 data..and every month I will add 1 data into the series.. And I want to create a chart showing the 10 latest data... in this regard, every time I add a data, automatically (or by clicking some button)...the chart will updating the data by add the new one and remove the oldest one,... so the chart still consist the 10 latest data... what i've been doing right now is only by dragging the data source area.... and since i will deal with more than 20 charts, this ac...

How do I get total value data labels in a stacked bar chart?
I have a 3-D stacked bar chart with four series and I want to have the total value in each category be displayed in a data label. Can I do this, and if so, how?? Hi, This should help http://www.andypope.info/charts/StackColTotal.htm Cheers Andy blemerson wrote: > I have a 3-D stacked bar chart with four series and I want to have the total > value in each category be displayed in a data label. Can I do this, and if > so, how?? -- Andy Pope, Microsoft MVP - Excel http://www.andypope.info Thanks, Andy. The CFO is thrilled. Heather "Andy Pope" wrote: > Hi, >...

Need to import select Access data into Excel using two variables
I have a large amount of data in Access, more than Excel can manage. The data consists of many columns; has Col_1 as unique key 1,2,3,4,5,6,7,8 etc… and contains data that includes Col_2, Col_3, Col_4 etc…. It takes a combination of Col_2 and Col_3 to get the data I want. Col_2 has duplicate values and Col_3 makes the subset unique. One more thing… Col_3, Col_4 etc…may have blanks throughout the table and can’t be changed. I don’t own it. I want to be able to open Excel, type “Number from Col_2” into Col_A , “(Number fromCol_3) or (blank)”into Col_B and have the Col_4, Col_5 etc… d...

Access 2007 Auto update Date
I wish to update a date automatically in a form, after data has been changed. I have tried to do as the link belows shows, but I am not having any joy. I do not need the time just the date. If anyone can write what I need & give me a step by step Or a link, I would appreciate it. I have no knowledge of vba so the full vba would be appreciated. Thanks http://office.microsoft.com/en-gb/access/HA010345351033.aspx ...

Date Difference 02-24-10
Good Morning. I am trying to figure out how to calculate the number of days between two dates. I have a table callled events. That table has event date, event type, event outcome. For example a defendant will have an Advisement Date (AA as stored in the table) and say a Preliminary Hearing date (PH). When i enter information I will enter the date, event type and the event outcome will be (PD for Pending, CN for Continued, or CP for completed). For example Case Number 10-12345 will have an AA on 01/05/10 which is Completed and a Preliminary Hearing on 02/05/10 which was C...

charts with Timestamps
when plotting a chart with timestamps (9/28/05 3:05 AM) used on the X-axis, only the 'Date' part is taken into account. The 'time' part is dropped. When converting all timestamp to numbers (e.g. 38623,.12876) then of course the chart is plotted correctly. How can I have a continuous timeline where date and time are taken into account on a chart ? A time scale axis of a line chart only considers whole numbers (i.e., days, not times). You need to make an XY chart, not a line chart. Note that the terminology is misleading: either type chart can have any combination of ...

Help, please: to move data from OExpress 6 to O2000 ?
Help, please... On a pretty good Win XP PC, I need to upgrade from Outlook Express 6 to Outlook 2000. I installed Outlook and it imported all the folders/messages/identities from OE6. Outlook 2000 starts up, and all seems fine, and I'm looking at the new message that it created... "Welcome to Outlook 2000". If I try to view any other message, it locks up. I see the perpetual hourglass. It's dead. So, I installed Outlook 2000 on another XP PC, created a mail account, and it seems to work fine. I can read and send messages. Cool. How can I export all of the folders/me...