What is the easiest way of reading a line at a time through a textual CSV
file, and then extracting the comma-separated elements from each line?
|
|
0
|
|
|
|
Reply
|
Stanza
|
1/21/2010 5:08:21 PM |
|
"Stanza" <stanza@devnull.com> wrote in message
news:-OqdnbEEGM-UF8XWnZ2dnUVZ8r-dnZ2d@brightview.com...
> What is the easiest way of reading a line at a time through a textual CSV
> file, and then extracting the comma-separated elements from each line?
>
CStdioFile::ReadString will read a text file a line at a time.
The CString class provides everything you need to parse the lines.
CString::Find can be used to find the next comma. CString::Mid can be used
to extract an element.
--
Scott McPhillips [VC++ MVP]
|
|
0
|
|
|
|
Reply
|
Scott
|
1/21/2010 5:20:50 PM
|
|
"Scott McPhillips [MVP]" wrote...
> "Stanza" wrote...
>> What is the easiest way of reading a line at a time through a textual CSV
>> file, and then extracting the comma-separated elements from each line?
>>
>
> CStdioFile::ReadString will read a text file a line at a time.
>
> The CString class provides everything you need to parse the lines.
> CString::Find can be used to find the next comma. CString::Mid can be
> used to extract an element.
>
You might also consider using CString::Tokenize() to parse the line.
-JJ
|
|
0
|
|
|
|
Reply
|
James
|
1/21/2010 5:52:01 PM
|
|
"Stanza" <stanza@devnull.com> wrote in message
news:-OqdnbEEGM-UF8XWnZ2dnUVZ8r-dnZ2d@brightview.com...
> What is the easiest way of reading a line at a time through a textual CSV
> file, and then extracting the comma-separated elements from each line?
>
In addition to the other replies that you have recieved, you may wish to use
C++ standard library regular expressions provided in the TR1 extensions.
They are quite easy to use and would likely need less code than the other
methods mentioned and above all are portable.
-Pete
|
|
0
|
|
|
|
Reply
|
Pete
|
1/21/2010 9:34:19 PM
|
|
This is a handy tokenizer that might work for you:
http://www.codeproject.com/KB/string/ctokenex.aspx
Tom
"Stanza" <stanza@devnull.com> wrote in message
news:-OqdnbEEGM-UF8XWnZ2dnUVZ8r-dnZ2d@brightview.com...
> What is the easiest way of reading a line at a time through a textual CSV
> file, and then extracting the comma-separated elements from each line?
>
|
|
0
|
|
|
|
Reply
|
Tom
|
1/22/2010 12:46:12 AM
|
|
Stanza wrote:
> What is the easiest way of reading a line at a time through a textual
> CSV file, and then extracting the comma-separated elements from each line?
"Easiest" depends on what language and framework you are using and how
you hold, store, process the data in memory.
Assuming C language, the traditional implementation is to use
strtok(), is a C/C++ simple example:
// File: d:\wc5beta\testtok.cpp
// compile with: cl testtok.cpp
#include <stdio.h>
#include <afx.h>
int main(char argc, char *argv[])
{
//
// get file name from command line
//
char *pfn = (argc>1)?argv[1]:NULL;
if (!pfn) {
printf("- syntax: testeol csv_filename\n");
return 1;
}
//
// open text file for reading
//
FILE *fv = fopen(pfn,"rt");
if (!fv) {
printf("ERROR %d Opening file\n",GetLastError());
return 1;
}
//
// read each line using fgets() and parse
// the "," and cr/lf (\r\n) token characters.
//
char *tok = ",\r\n";
int nLine = 0;
char szLine[1024];
memset(&szLine,sizeof(szLine),0);
while (fgets(szLine,sizeof(szLine)-1,fv)) {
nLine++;
printf("# %d | %s",nLine, szLine);
//
// parse the line by the tok characters
//
char *fld = strtok(szLine, tok);
while(fld) {
printf("- [%s]\n",fld);
fld = strtok(NULL, tok);
}
}
fclose(fv);
return 0;
}
So for example testdata.csv file containing these lines:
hector santos,email1@whatever.com
stanza,email2@whatever2.com
Joe Newcomer,email3@whatever3.com
compiling and running testtok testdata.csv, you get:
# 1 | hector santos,email1@whatever.com
- [hector santos]
- [email1@whatever.com]
# 2 | stanza,email2@whatever2.com
- [stanza]
- [email2@whatever2.com]
# 3 | Joe Newcomer,email3@whatever3.com
- [Joe Newcomer]
- [email3@whatever3.com]
This is very simplistic and doesn't many design issues in regards to
parsing csv bases files.
The #1 design issue is the idea of "escaping" the token character you
are using to separate fields, in this case the comma (',') because it
is possible to have the comma with the field strings. That depends on
the type and data specifications. Maybe your program doesn't expect
them and maybe the creator the file will never ADD them and/or escapes
them. All this is implementation base.
For example, the data file can have a 3rd field that is a description
like field, OR the name field can have commas this, thus introduce the
idea that it can escaping is requiring. i.e, the data file can look
like this:
hector santos,email1@whatever.com,whatever,whatever,whatever
stanza,email2@whatever2.com,"whatever,whatever,whatever"
Joe Newcomer,email3@whatever3.com
Serlace, tom,email4@whatever4.com
So you can roll up sleeves and begin to use the above simple C/C++
code as a basis to fine tune the reading requirements for your CSV by
adding token escaping concepts, or you can use 3rd party libraries and
functions available to do these things, and your requirements will be
that these 3rd party libraries and function have the features of
escaping tokens.
Now, I purposely creates the testdata.csv above that would normally be
considered bad formatting and doesn't promote or help good csv
reading. A good practice it surround the fields with double quotes
and that MAY be enough for escaping embedded commas, for example,
the first line has a 3rd field:
whatever,whatever,whatever
well, if you parsing only by comma, the field results in just
"whatever". So what is normally done is use lines like the 2nd line
where the 3rd field is quoted:
"whatever,whatever,whatever"
The same issue with the 4th line with the first "expected" field has:
Serlace, tom,
and this causes your fields to be shifted and off.
There are other concepts to deal with, namely, how you are reading
into memory storage, if needed or if your processing each line and
forgetting about it.
So writing a robust CSV reader that takes into account, such as:
- escaping and embedded tokens
- reading into memory
are common design requirements here. It really isn't that hard. I
would encourage to learn and gain the rewarding experiences to program
this yourself. It covers ideas that will be common ideas in a
programmers life. I will say, that sometimes it pays do to just a
byte stream parser instead of using strtok() checking each possible
token and delimiter, double quoted strings, etc. For example, instead
of the strtok block of lines, you can use something like:
char *p = szLine;
while (*p) {
switch(*p) {
case '\r':
... add logic for this ...
break;
case '\n':
... add logic for this ...
break;
case '\"':
... add logic for this ...
break;
case ',':
... add logic for this ...
break;
}
p++;
}
It can be simple to complex depending on the CSV reading requirements.
Anyway, if you just wish to get a solution, you can use one the many
3rd party libraries, classes, that will do these things for you.
If you using another language, the same ideas apply, but some
languages already have a good library, like .NET perhaps. It has an
excellent text I/O reader class in its collections library, See
OpenTextFieldParser(). It supports CSV reading and covers the two
important ideas above for escaping and storage.
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/22/2010 3:52:07 AM
|
|
On Jan 21, 6:08=A0pm, "Stanza" <sta...@devnull.com> wrote:
> What is the easiest way of reading a line at a time through a textual CSV
> file, and then extracting the comma-separated elements from each line?
Look for a CSV-parsing library on the net! There are some, and I never
used any, but if I had to do CVS parsing, that's by far the first
thing I'd do.
He who thinks parsing CSV (in fact, ANY text) is easy, and that you
can best do it "by hand", consider taking into account:
* quotes
* double quotes
* locale separator character (CSV is kinda poor name, D(elimiter)SV is
better)
* "header" line / '*' character at the first line
* various conventions/de facto standards on what CSV files are
* what your clients will desire WRT some/all of above.
Who, except students learning programming, wants to spend time on
that... Crap?!?! Abstain from NIH, save yourself development time and
just look for a library that fits your case.
Goran.
|
|
0
|
|
|
|
Reply
|
Goran
|
1/22/2010 8:22:08 AM
|
|
Goran,
Many times even with 3rd party libraries, you still have to learn how
to use it. Many times, the attempt to generalized does not cover all
bases. What if there is a bug? Many times with CSV, it might requires
upfront field definition or its all viewed as strings. So the
"easiest" does not always mean use a 3rd party solution.
Of course the devil is in the details and it helps when the OP
provides info, like what language and platform. If he said .NET, as
I mention the MS .net collection library has a pretty darn good reader
class with the benefits of supporting OOPS as well which allows you to
create a data "class" that you pass to the line reader.
Guess what? There is still a learning curve here to understand the
interface, to use it right as there would be with any library.
So the easiest? For me, it all depends - a simple text reader and
strtok() parser and work in the escaping issues can be both very easy
and super fast! with no dependency on 3rd party QA issues.
For me, I have never come across a library or class that could handle
everything and if it did, required a data definition interface of some
sort - like the .NET collection class offers. If he using .NET, then
I recommend using this class as the "easiest."
--
Goran wrote:
> On Jan 21, 6:08 pm, "Stanza" <sta...@devnull.com> wrote:
>> What is the easiest way of reading a line at a time through a textual CSV
>> file, and then extracting the comma-separated elements from each line?
>
> Look for a CSV-parsing library on the net! There are some, and I never
> used any, but if I had to do CVS parsing, that's by far the first
> thing I'd do.
>
> He who thinks parsing CSV (in fact, ANY text) is easy, and that you
> can best do it "by hand", consider taking into account:
>
> * quotes
> * double quotes
> * locale separator character (CSV is kinda poor name, D(elimiter)SV is
> better)
> * "header" line / '*' character at the first line
> * various conventions/de facto standards on what CSV files are
> * what your clients will desire WRT some/all of above.
>
> Who, except students learning programming, wants to spend time on
> that... Crap?!?! Abstain from NIH, save yourself development time and
> just look for a library that fits your case.
>
> Goran.
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/22/2010 9:02:50 AM
|
|
Hector Santos wrote:
> Goran,
>
> Many times even with 3rd party libraries, you still have to learn how to
> use it. Many times, the attempt to generalized does not cover all
> bases. What if there is a bug? Many times with CSV, it might requires
> upfront field definition or its all viewed as strings. So the "easiest"
> does not always mean use a 3rd party solution.
>
> Of course the devil is in the details and it helps when the OP provides
> info, like what language and platform. If he said .NET, as I mention
> the MS .net collection library has a pretty darn good reader class with
> the benefits of supporting OOPS as well which allows you to create a
> data "class" that you pass to the line reader.
>
> Guess what? There is still a learning curve here to understand the
> interface, to use it right as there would be with any library.
>
> So the easiest? For me, it all depends - a simple text reader and
> strtok() parser and work in the escaping issues can be both very easy
> and super fast! with no dependency on 3rd party QA issues.
>
> For me, I have never come across a library or class that could handle
> everything and if it did, required a data definition interface of some
> sort - like the .NET collection class offers. If he using .NET, then I
> recommend using this class as the "easiest."
Case in point.
Even with the excellent .NET text I/O class and a CSV reader wrapper,
it only offers a generalized method to parse fields. This still
requires proper setup and conditions that might occur. It might
require specific addition logic to handle situations where it does not
cover, like when fields span across multiple lines. For example:
1,2,3,4,5,"hector
, santos",6
7,8
9,10
That might be 1 data record with 10 fields.
However, even if the library allows you to do this, in my opinion,
only an experienced implementator knows what to look for, see how to
do it with the library to properly address this.
Here is a VB.NET test program I wrote a few years back for a VERY long
thread regarding this topic and how to handle the situation for a
fella that had this need of fields spanning across multiple rows.
------------- CUT HERE -------------------
'--------------------------------------------------------------
' File : D:\Local\wcsdk\wcserver\dotnet\Sandbox\readcsf4.vb
' About:
'--------------------------------------------------------------
Option Strict Off
Option Explicit On
imports system
imports system.diagnostics
imports system.console
imports system.reflection
imports system.collections.generic
Imports system.text
Module module1
//
// Dump an object
//
Sub dumpObject(ByVal o As Object)
Dim t As Type = o.GetType()
WriteLine("Type: {0} Fields: {1}", t, t.GetFields().Length)
For Each s As FieldInfo In t.GetFields()
Dim ft As Type = s.FieldType()
WriteLine("- {0,-10} {1,-15} => {2}", s.Name, ft,
s.GetValue(o))
Next
End Sub
//
// Data definition "TRecord" class, for this example
// 9 fields are expected per data record.
//
Public Class TRecord
Public f1 As String
Public f2 As String
Public f3 As String
Public f4 As String
Public f5 As String
Public f6 As String
Public f7 As String
Public f8 As String
Public f9 As String
Public Sub Convert(ByRef flds As List(Of String))
Dim fi As FieldInfo() = Me.GetType().GetFields()
Dim i As Integer = 0
For Each s As FieldInfo In fi
Dim tt As Type = s.FieldType()
If (i < flds.Count) Then
If TypeOf (s.GetValue(Me)) Is Integer Then
s.SetValue(Me, CInt(flds.Item(i)))
Else
s.SetValue(Me, flds.Item(i))
End If
End If
i += 1
Next
End Sub
Public Sub New()
End Sub
Public Sub New(ByVal flds As List(Of String))
Convert(flds)
End Sub
Public Shared Narrowing Operator CType(_
ByVal flds As List(Of String)) As TRecord
Return New TRecord(flds)
End Operator
Public Shared Narrowing Operator CType(_
ByVal flds As String()) As TRecord
Dim sl As New List(Of String)
For i As Integer = 1 To flds.Length
sl.Add(flds(i - 1))
Next
Return New TRecord(sl)
End Operator
End Class
Public Class ReaderCVS
Public Shared data As New List(Of TRecord)
'
' Read cvs file with max_fields, optional eolfilter
'
Public Function ReadCSV( _
ByVal fn As String, _
Optional ByVal max_fields As Integer = 0, _
Optional ByVal eolfilter As Boolean = True) As Boolean
Try
Dim tr As New TRecord
max_fields = tr.GetType().GetFields().Length()
data.Clear()
Dim rdr As FileIO.TextFieldParser
rdr = My.Computer.FileSystem.OpenTextFieldParser(fn)
rdr.SetDelimiters(",")
Dim flds As New List(Of String)
While Not rdr.EndOfData()
Dim lines As String() = rdr.ReadFields()
For Each fld As String In lines
If eolfilter Then
fld = fld.Replace(vbCr, " ").Replace(vbLf,"")
End If
flds.Add(fld)
If flds.Count = max_fields Then
tr = flds
data.Add(tr)
flds = New List(Of String)
End If
Next
End While
If flds.Count > 0 Then
tr = flds
data.Add(tr)
End If
rdr.Close()
Return True
Catch ex As Exception
WriteLine(ex.Message)
WriteLine(ex.StackTrace)
Return False
End Try
End Function
Public Sub Dump()
WriteLine("------- DUMP ")
debug.WriteLine("Dump")
For i As Integer = 1 To data.Count
dumpObject(data(i - 1))
Next
End Sub
End Class
Sub main(ByVal args() As String)
Dim csv As New ReaderCVS
csv.ReadCSV("test1.csf")
csv.Dump()
End Sub
End Module
------------- CUT HERE -------------------
Mind you, the above written 2 years ago while I was still learning
..NET library and I was participating in support questions to learn
myself to do common concept ideas in the .NET environment.
Is the above simple for most beginners? I wouldn't say so, but then
again, I tend to be a "tools" writer and try to generalized an tool,
hence when I spent the time to implement a data class using an object
dump function to debug it all. Not eveyone needs this. Most of the
time, the field types are known so a reduction can be done, or better
yet, you can take the above, have it read the first line as the field
definition line and generalize the TRecord class to make it all dynamic.
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/22/2010 9:41:50 AM
|
|
Note, if anyone is trying this out, I added the C/C++ inline //
comments after I posted the code (my default language today). For
VB.NET it is a single quote. So if you can get compiler error, change
the inline characters.
--
HLS
Hector Santos wrote:
> Module module1
>
> //
> // Dump an object
> //
>
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/22/2010 9:55:00 AM
|
|
> Guess what? There is still a learning curve here to understand the
> interface, to use it right as there would be with any library.
Perhaps. But imagine:
class CCsvReader
{
CCsvFile(fileName);
size_t GetRecordCount() const;
const CCsvRecord& operator[](size_t index) const;
};
class CCsvRecord
{
CString operator[](size_t fieldIndex);
CString operator[](LPCTSTR FieldName);
};
Effort in learning __that__ certainly beats effort of rolling your
own. Of course, that's provided that fits your use-case and that there
is a similar library. But that's done by Googling, newsgrouping and
reading.
> So the easiest? =A0For me, it all depends - a simple text reader and
> strtok() parser and work in the escaping issues can be both very easy
> and super fast! with no dependency on 3rd party QA issues.
Are you suggesting that +/- long existing library, probably seen by
many internet eyes, will have quality issues and your own code just
won't? This, frankly, smacks of hubris.
Goran.
|
|
0
|
|
|
|
Reply
|
Goran
|
1/22/2010 10:05:13 AM
|
|
Goran wrote:
> Effort in learning __that__ certainly beats effort of rolling your
> own. Of course, that's provided that fits your use-case and that there
> is a similar library. But that's done by Googling, newsgrouping and
> reading.
Still a learning curve for most. You know the old saying "Teach a man
how to fish...." moral.
> Are you suggesting that +/- long existing library, probably seen by
> many internet eyes, will have quality issues and your own code just
> won't? This, frankly, smacks of hubris.
Since I didn't say nor imply it, the rhetorical suggestion is what
walks, talks and smell of hubris. :)
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/22/2010 11:04:34 AM
|
|
On Jan 22, 12:04=A0pm, Hector Santos <sant9...@nospam.gmail.com> wrote:
In the interest of honesty: I, too, wrote CSV parsers (or participated
in writing them) in my time.
My bias is clear, however: was that my, or my employer's, time well
spent? I don't think so.
> Goran wrote:
> > Effort in learning __that__ certainly beats effort of rolling your
> > own. Of course, that's provided that fits your use-case and that there
> > is a similar library. But that's done by Googling, newsgrouping and
> > reading.
>
> Still a learning curve for most.
It's still better to learn to call some new functions, than do the
silly grunt work of writing e.g. strtok-s. (And that, for something as
common as CSV parsing.)
> =A0You know the old saying "Teach a man
> how to fish...." moral.
What, as in "if I write CSV parser, I learned something?" If you are
learning and but want to exercise on CSV, OK (I said that in my first
post). But otherwise, there's more to learn by looking at an existing
CSV parser than in rolling your own. Especially if one finds something
comprehensive. And even if one finds/corrects bugs in it.
Goran.
|
|
0
|
|
|
|
Reply
|
Goran
|
1/22/2010 1:10:27 PM
|
|
Goran wrote:
> On Jan 22, 12:04 pm, Hector Santos <sant9...@nospam.gmail.com> wrote:
>
> In the interest of honesty: I, too, wrote CSV parsers (or participated
> in writing them) in my time.
>
> My bias is clear, however: was that my, or my employer's, time well
> spent? I don't think so.
It depends Goran. I rather have someone be able to think for himself,
solve problems without have to depend too much on 3rd party solutions,
and generally, when used as a "tool", like a hammer or screwdriver, it
is normally because you already know how to use the hammer or
screwdriver. IOW, if you know what you are doing, then go ahead and
get that library, rather than get the library because you (speaking in
general) lack a understanding of what the problem was to solve it. It
becomes a crutch.
I personally believe the IDE and evolution of component (modular)
engineering has placated sound engineering thinking. The technology
was meant not only to increase productivity, but to merge disciplines
and lower the cost of expertise.
>> Still a learning curve for most.
>
> It's still better to learn to call some new functions, than do the
> silly grunt work of writing e.g. strtok-s. (And that, for something as
> common as CSV parsing.)
Yes, when you already know what you are doing. Thats a mark of a good
programmer with insight into problem solving.
>> You know the old saying "Teach a man how to fish...." moral.
>
> What, as in "if I write CSV parser, I learned something?"
Sure, if you never did it before.
> If you are learning and but want to exercise on CSV, OK (I said that
> in my first post). But otherwise, there's more to learn by looking
> at an existing CSV parser than in rolling your own.
We have to respectfully agree to disagree. :) I sincerely doubt most
people will understand how to use a library if he/she didn't have a
fundamental understanding in what to look for and how to use it.
> Especially if one finds something
> comprehensive. And even if one finds/corrects bugs in it.
Fix after the fact programming. Love it! :) I have fired a well known
developer for that mindset. What ever happen to a QA engineering mantra?
"Getting it right... the first time!"
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/22/2010 4:14:29 PM
|
|
That's one of the things that MFC really has going for it. There is a lot
of code available and you typically get source with it so, even if there is
some learning curve, you still get a jump start on getting your job done
even if you just see how it's done in the sample code.
Tom
"Hector Santos" <sant9442@nospam.gmail.com> wrote in message
news:#A$vtG0mKHA.5464@TK2MSFTNGP02.phx.gbl...
> Goran,
>
> Many times even with 3rd party libraries, you still have to learn how to
> use it. Many times, the attempt to generalized does not cover all bases.
> What if there is a bug? Many times with CSV, it might requires upfront
> field definition or its all viewed as strings. So the "easiest" does not
> always mean use a 3rd party solution.
>
> Of course the devil is in the details and it helps when the OP provides
> info, like what language and platform. If he said .NET, as I mention the
> MS .net collection library has a pretty darn good reader class with the
> benefits of supporting OOPS as well which allows you to create a data
> "class" that you pass to the line reader.
>
> Guess what? There is still a learning curve here to understand the
> interface, to use it right as there would be with any library.
>
> So the easiest? For me, it all depends - a simple text reader and
> strtok() parser and work in the escaping issues can be both very easy and
> super fast! with no dependency on 3rd party QA issues.
>
> For me, I have never come across a library or class that could handle
> everything and if it did, required a data definition interface of some
> sort - like the .NET collection class offers. If he using .NET, then I
> recommend using this class as the "easiest."
>
|
|
0
|
|
|
|
Reply
|
Tom
|
1/22/2010 6:40:06 PM
|
|
One thing most parsers don't handle correctly, that's I've seen, is double
double quotes for strings if you want to have a quote as part of the string
like:
"This is my string "Tom" that I am using", "Next token", "Next token"
In the above, from my perspective, the parser should read the entire first
string since we didn't come to a delimiter yet, but a lot of tokenizers
choke on this sort of thing.
Tom
"Hector Santos" <sant9442@nospam.gmail.com> wrote in message
news:eeMYgc0mKHA.5464@TK2MSFTNGP02.phx.gbl...
> Hector Santos wrote:
>
>> Goran,
>>
>> Many times even with 3rd party libraries, you still have to learn how to
>> use it. Many times, the attempt to generalized does not cover all bases.
>> What if there is a bug? Many times with CSV, it might requires upfront
>> field definition or its all viewed as strings. So the "easiest" does not
>> always mean use a 3rd party solution.
>>
>> Of course the devil is in the details and it helps when the OP provides
>> info, like what language and platform. If he said .NET, as I mention
>> the MS .net collection library has a pretty darn good reader class with
>> the benefits of supporting OOPS as well which allows you to create a data
>> "class" that you pass to the line reader.
>>
>> Guess what? There is still a learning curve here to understand the
>> interface, to use it right as there would be with any library.
>>
>> So the easiest? For me, it all depends - a simple text reader and
>> strtok() parser and work in the escaping issues can be both very easy and
>> super fast! with no dependency on 3rd party QA issues.
>>
>> For me, I have never come across a library or class that could handle
>> everything and if it did, required a data definition interface of some
>> sort - like the .NET collection class offers. If he using .NET, then I
>> recommend using this class as the "easiest."
>
> Case in point.
>
> Even with the excellent .NET text I/O class and a CSV reader wrapper, it
> only offers a generalized method to parse fields. This still requires
> proper setup and conditions that might occur. It might require specific
> addition logic to handle situations where it does not cover, like when
> fields span across multiple lines. For example:
>
> 1,2,3,4,5,"hector
> , santos",6
> 7,8
> 9,10
>
> That might be 1 data record with 10 fields.
>
> However, even if the library allows you to do this, in my opinion, only an
> experienced implementator knows what to look for, see how to do it with
> the library to properly address this.
>
> Here is a VB.NET test program I wrote a few years back for a VERY long
> thread regarding this topic and how to handle the situation for a fella
> that had this need of fields spanning across multiple rows.
>
> ------------- CUT HERE -------------------
> '--------------------------------------------------------------
> ' File : D:\Local\wcsdk\wcserver\dotnet\Sandbox\readcsf4.vb
> ' About:
> '--------------------------------------------------------------
> Option Strict Off
> Option Explicit On
>
> imports system
> imports system.diagnostics
> imports system.console
> imports system.reflection
> imports system.collections.generic
> Imports system.text
>
> Module module1
>
> //
> // Dump an object
> //
>
> Sub dumpObject(ByVal o As Object)
> Dim t As Type = o.GetType()
> WriteLine("Type: {0} Fields: {1}", t, t.GetFields().Length)
> For Each s As FieldInfo In t.GetFields()
> Dim ft As Type = s.FieldType()
> WriteLine("- {0,-10} {1,-15} => {2}", s.Name, ft, s.GetValue(o))
> Next
> End Sub
>
> //
> // Data definition "TRecord" class, for this example
> // 9 fields are expected per data record.
> //
>
> Public Class TRecord
> Public f1 As String
> Public f2 As String
> Public f3 As String
> Public f4 As String
> Public f5 As String
> Public f6 As String
> Public f7 As String
> Public f8 As String
> Public f9 As String
>
> Public Sub Convert(ByRef flds As List(Of String))
> Dim fi As FieldInfo() = Me.GetType().GetFields()
> Dim i As Integer = 0
> For Each s As FieldInfo In fi
> Dim tt As Type = s.FieldType()
> If (i < flds.Count) Then
> If TypeOf (s.GetValue(Me)) Is Integer Then
> s.SetValue(Me, CInt(flds.Item(i)))
> Else
> s.SetValue(Me, flds.Item(i))
> End If
> End If
> i += 1
> Next
> End Sub
>
> Public Sub New()
> End Sub
>
> Public Sub New(ByVal flds As List(Of String))
> Convert(flds)
> End Sub
>
> Public Shared Narrowing Operator CType(_
> ByVal flds As List(Of String)) As TRecord
> Return New TRecord(flds)
> End Operator
>
> Public Shared Narrowing Operator CType(_
> ByVal flds As String()) As TRecord
> Dim sl As New List(Of String)
> For i As Integer = 1 To flds.Length
> sl.Add(flds(i - 1))
> Next
> Return New TRecord(sl)
> End Operator
> End Class
>
> Public Class ReaderCVS
>
> Public Shared data As New List(Of TRecord)
>
> '
> ' Read cvs file with max_fields, optional eolfilter
> '
> Public Function ReadCSV( _
> ByVal fn As String, _
> Optional ByVal max_fields As Integer = 0, _
> Optional ByVal eolfilter As Boolean = True) As Boolean
> Try
> Dim tr As New TRecord
> max_fields = tr.GetType().GetFields().Length()
> data.Clear()
>
> Dim rdr As FileIO.TextFieldParser
> rdr = My.Computer.FileSystem.OpenTextFieldParser(fn)
> rdr.SetDelimiters(",")
> Dim flds As New List(Of String)
> While Not rdr.EndOfData()
> Dim lines As String() = rdr.ReadFields()
> For Each fld As String In lines
> If eolfilter Then
> fld = fld.Replace(vbCr, " ").Replace(vbLf,"")
> End If
> flds.Add(fld)
> If flds.Count = max_fields Then
> tr = flds
> data.Add(tr)
> flds = New List(Of String)
> End If
> Next
> End While
> If flds.Count > 0 Then
> tr = flds
> data.Add(tr)
> End If
> rdr.Close()
> Return True
>
> Catch ex As Exception
> WriteLine(ex.Message)
> WriteLine(ex.StackTrace)
> Return False
> End Try
> End Function
>
> Public Sub Dump()
> WriteLine("------- DUMP ")
> debug.WriteLine("Dump")
> For i As Integer = 1 To data.Count
> dumpObject(data(i - 1))
> Next
> End Sub
>
> End Class
>
> Sub main(ByVal args() As String)
> Dim csv As New ReaderCVS
> csv.ReadCSV("test1.csf")
> csv.Dump()
> End Sub
>
> End Module
> ------------- CUT HERE -------------------
>
> Mind you, the above written 2 years ago while I was still learning .NET
> library and I was participating in support questions to learn myself to do
> common concept ideas in the .NET environment.
>
> Is the above simple for most beginners? I wouldn't say so, but then
> again, I tend to be a "tools" writer and try to generalized an tool, hence
> when I spent the time to implement a data class using an object dump
> function to debug it all. Not eveyone needs this. Most of the time, the
> field types are known so a reduction can be done, or better yet, you can
> take the above, have it read the first line as the field definition line
> and generalize the TRecord class to make it all dynamic.
>
> --
> HLS
|
|
0
|
|
|
|
Reply
|
Tom
|
1/22/2010 6:42:01 PM
|
|
I'd say, "it depends". For example, I have a program where I have
specialized parsing needs and the program needs to be really small and not
include any external code. I wrote my own specialized parser and it was a
good use of time imo. I've found that most of the parsers that are "in the
box" libraries are very limited in scope.
Tom
"Goran" <goran.pusic@gmail.com> wrote in message
news:14051f46-8820-46bd-9cc8-10705b7b402e@l19g2000yqb.googlegroups.com...
> On Jan 22, 12:04 pm, Hector Santos <sant9...@nospam.gmail.com> wrote:
>
> In the interest of honesty: I, too, wrote CSV parsers (or participated
> in writing them) in my time.
>
..
|
|
0
|
|
|
|
Reply
|
Tom
|
1/22/2010 6:44:06 PM
|
|
Have you tried AfxExtractSubString()?
"Stanza" <stanza@devnull.com> wrote in message
news:-OqdnbEEGM-UF8XWnZ2dnUVZ8r-dnZ2d@brightview.com...
> What is the easiest way of reading a line at a time through a textual CSV
> file, and then extracting the comma-separated elements from each line?
>
|
|
0
|
|
|
|
Reply
|
David
|
1/22/2010 7:45:00 PM
|
|
Tom Serface wrote:
> One thing most parsers don't handle correctly, that's I've seen, is
> double double quotes for strings if you want to have a quote as part of
> the string like:
>
> "This is my string "Tom" that I am using", "Next token", "Next token"
>
> In the above, from my perspective, the parser should read the entire
> first string since we didn't come to a delimiter yet, but a lot of
> tokenizers choke on this sort of thing.
Often, it takes two to tango. A writer needs to escape tokens in
order to reach some level of sanity. i.e, borrowing a C slash for \".
"This is my string \"Tom\" that I am using"
Or use some encoding method, each HTTP Escape! :)
The above is simple if just delimiting by comma. So watching for an
embedded comma is required. For example:
"This is my string "Tom, Hector" that I am using"
That can be easily handled if the design assumption is each field is
double quoted. The first token:
"This is my string "Tom,
does not end in double quote, so you continue with a concatenation of
the next token.
Hector" that I am using"
to complete the first field.
But overall, I found unless its really simple, it helps if you have
field type definitions known before hand.
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/22/2010 8:37:30 PM
|
|
See below...
On Thu, 21 Jan 2010 22:52:07 -0500, Hector Santos <sant9442@nospam.gmail.com> wrote:
>Stanza wrote:
>
>> What is the easiest way of reading a line at a time through a textual
>> CSV file, and then extracting the comma-separated elements from each line?
>
>"Easiest" depends on what language and framework you are using and how
>you hold, store, process the data in memory.
>
>Assuming C language, the traditional implementation is to use
>strtok(), is a C/C++ simple example:
>
>// File: d:\wc5beta\testtok.cpp
>
>// compile with: cl testtok.cpp
>
>#include <stdio.h>
>#include <afx.h>
>
>int main(char argc, char *argv[])
***
This should be _tmain, the first argument is int, and the second argument is _TCHAR *
argv[].
****
>{
> //
> // get file name from command line
> //
>
> char *pfn = (argc>1)?argv[1]:NULL;
****
char is so yesterday. It should not be used to teach anything any longer. TCHAR,
LPCTSTR, LPTSTR are appropriate, or for purists, WCHAR, LPWSTR, LPCWSTR. In addition,
string parsing in terms of character arrays is so obsolete; CString or std::string should
be used for any examples.
****
>
> if (!pfn) {
> printf("- syntax: testeol csv_filename\n");
> return 1;
> }
>
> //
> // open text file for reading
> //
>
> FILE *fv = fopen(pfn,"rt");
> if (!fv) {
> printf("ERROR %d Opening file\n",GetLastError());
> return 1;
> }
>
> //
> // read each line using fgets() and parse
> // the "," and cr/lf (\r\n) token characters.
> //
>
> char *tok = ",\r\n";
>
> int nLine = 0;
> char szLine[1024];
****
INSTANTLY, we see completely obsolete, dangerous, and teaching-away-from-best-practice
here. NEVER allocate a fixed buffer on the stack.
****
> memset(&szLine,sizeof(szLine),0);
****
This is totally useless. Since the buffer is about to be overwritten with input, zeroing
it is silly.
****
> while (fgets(szLine,sizeof(szLine)-1,fv)) {
****
sizeof() is bad teaching. _countof would be appropriate, but at the VERY least, the
correct code would be in terms of
(sizeof(szline)/sizeof(TCHAR)) -1
This code looks like something from K&R C programming first edition.
****
> nLine++;
> printf("# %d | %s",nLine, szLine);
****
_tprintf(_T("# %5d | %s\n"), nLine, szline);
Unicode-aware, keeps columns aligned, has a newline at the end.
****
>
> //
> // parse the line by the tok characters
> //
> char *fld = strtok(szLine, tok);
****
strtok is bad practice. strtok_s, or _tcstok_s, is a better choice, because these have a
separate context that can be maintained, allowing several ...tok calls to be applied at
the same time (for example, subscanning an number looking for a decimal point). The old
strtok was what could best be called a "childish" design, with a single, implicit,
internal static context pointer. I would actively teach against ever using strtok (or
even _tcstok) in any program today. If you want to have locale-specific parsing, you may
even want to look at _tcstok_s_l, which allows a locale specification.
****
> while(fld) {
> printf("- [%s]\n",fld);
> fld = strtok(NULL, tok);
> }
> }
>
> fclose(fv);
> return 0;
>}
>
>So for example testdata.csv file containing these lines:
>
>hector santos,email1@whatever.com
>stanza,email2@whatever2.com
>Joe Newcomer,email3@whatever3.com
>
>compiling and running testtok testdata.csv, you get:
>
># 1 | hector santos,email1@whatever.com
>- [hector santos]
>- [email1@whatever.com]
># 2 | stanza,email2@whatever2.com
>- [stanza]
>- [email2@whatever2.com]
># 3 | Joe Newcomer,email3@whatever3.com
>- [Joe Newcomer]
>- [email3@whatever3.com]
>
>This is very simplistic and doesn't many design issues in regards to
>parsing csv bases files.
>
>The #1 design issue is the idea of "escaping" the token character you
>are using to separate fields, in this case the comma (',') because it
>is possible to have the comma with the field strings. That depends on
>the type and data specifications. Maybe your program doesn't expect
>them and maybe the creator the file will never ADD them and/or escapes
>them. All this is implementation base.
>
>For example, the data file can have a 3rd field that is a description
>like field, OR the name field can have commas this, thus introduce the
>idea that it can escaping is requiring. i.e, the data file can look
>like this:
>
>hector santos,email1@whatever.com,whatever,whatever,whatever
>stanza,email2@whatever2.com,"whatever,whatever,whatever"
>Joe Newcomer,email3@whatever3.com
>Serlace, tom,email4@whatever4.com
>
>So you can roll up sleeves and begin to use the above simple C/C++
>code as a basis to fine tune the reading requirements for your CSV by
>adding token escaping concepts, or you can use 3rd party libraries and
>functions available to do these things, and your requirements will be
>that these 3rd party libraries and function have the features of
>escaping tokens.
>
>Now, I purposely creates the testdata.csv above that would normally be
>considered bad formatting and doesn't promote or help good csv
>reading. A good practice it surround the fields with double quotes
>and that MAY be enough for escaping embedded commas, for example,
>the first line has a 3rd field:
>
> whatever,whatever,whatever
>
>well, if you parsing only by comma, the field results in just
>"whatever". So what is normally done is use lines like the 2nd line
>where the 3rd field is quoted:
>
> "whatever,whatever,whatever"
>
>The same issue with the 4th line with the first "expected" field has:
>
> Serlace, tom,
>
>and this causes your fields to be shifted and off.
>
>There are other concepts to deal with, namely, how you are reading
>into memory storage, if needed or if your processing each line and
>forgetting about it.
>
>So writing a robust CSV reader that takes into account, such as:
>
> - escaping and embedded tokens
> - reading into memory
>
>are common design requirements here. It really isn't that hard. I
>would encourage to learn and gain the rewarding experiences to program
>this yourself. It covers ideas that will be common ideas in a
>programmers life. I will say, that sometimes it pays do to just a
>byte stream parser instead of using strtok() checking each possible
>token and delimiter, double quoted strings, etc. For example, instead
>of the strtok block of lines, you can use something like:
>
> char *p = szLine;
> while (*p) {
> switch(*p) {
> case '\r':
> ... add logic for this ...
> break;
> case '\n':
> ... add logic for this ...
> break;
> case '\"':
> ... add logic for this ...
> break;
> case ',':
> ... add logic for this ...
> break;
> }
> p++;
> }
>
>It can be simple to complex depending on the CSV reading requirements.
****
This is an overly-simplified example of the Finite State Machine recognizer pattern. For
example, you can do something like
typedef enum {S0, Sign, Digit, Decimal, Fraction} States;
State state = S0; // the initial FSM state is always called S0 for historical reasons
int sign = 1;
TCHAR token;
while(*p != _T('\0'))
{
switch(state)
{
case S0:
switch(*p)
{
case _T(' '):
case _T('\t'):
case _T('\r'):
p++;
continue;
case _T('\n'):
... handling here depends on what you have as input
... if it is guaranteed to be a single line, this is just like
... \r; otherwise, you terminate the parse and set up so
... the next parse starts the next line in state S0
...return, continue, here, as appropriate
case _T('+'):
case _T('-'):
state = Sign;
sign = -1;
token = p;
p++;
continue;
case _T('0'):
...
case _T('9'):
state = Digit;
token = p;
p++;
continue;
case _T('.'): // note: localize this test!
state = Decimal;
token = p;
p++;
continue;
default:
// report error
return FALSE; // or whatever your error recovery is
};
case Sign:
switch(*p)
{
case _T('0'):
...
case _T('9'):
state = Digit;
p++;
continue;
case _T('.'): // note: localize this test!
state = Decimal;
p++;
continue;
case _T(' '):
case _T('\t'):
...other whitespace cases
p++;
continue;
case _T(','):
handle token just parsed
...return, continue, etc. as appropriate
default:
// error, + or - not followed by digit or decimal pt
}
(I have to leave for a concert at this point, leave the rest as An Exercise For The
Reader)
Overall, I find that parsers that are based on simplistic models that simply look for a
delimiter and assume that everything between the delimiters is syntactically correct are
naive, and certainly not robust enough for real programs. Generalizations include
extending this to recognize strings, quoted strings (allowing embedded commas inside the
quotes), etc. To me, correctness is essential.
joe
****
>
>Anyway, if you just wish to get a solution, you can use one the many
>3rd party libraries, classes, that will do these things for you.
>
>If you using another language, the same ideas apply, but some
>languages already have a good library, like .NET perhaps. It has an
>excellent text I/O reader class in its collections library, See
>OpenTextFieldParser(). It supports CSV reading and covers the two
>important ideas above for escaping and storage.
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
1/22/2010 10:28:40 PM
|
|
Tom Serface wrote:
> One thing most parsers don't handle correctly, that's I've seen, is
> double double quotes for strings if you want to have a quote as part of
> the string like:
>
> "This is my string "Tom" that I am using", "Next token", "Next token"
>
> In the above, from my perspective, the parser should read the entire
> first string since we didn't come to a delimiter yet, but a lot of
> tokenizers choke on this sort of thing.
Another thing is tolerating files that have \n or \r line endings rather than \r\n.
--
David Wilkinson
Visual C++ MVP
|
|
0
|
|
|
|
Reply
|
David
|
1/23/2010 12:34:10 AM
|
|
Many of these issues depend on what you consider valid syntax.
For example, one possible implementation is to consider a line that does not contain a
matching quote to be syntactically incorrect, and be rejected as bad data, with an error
message indicating a fundamental failure of the data format.
Escape notions have interesting concepts. For example, some languages (like XML) accept
either single quote delimiters or double quote delimiters, and you use the opposite of the
one you want:
"He said 'This is really bad' quite loudly"
'He said "This is really bad" quite loudly"
but doesn't work in the case
'He shouted "I can't do this!" quite loudly"
It may surprise people to realize that the escape convention of \" is very rare, limited
to C and C++. A far more popular language, SQL, requires you double the quotes [I once
offered expert testimony in a legal case where company A said company B stole their code,
and as evidence showed that the allegedly stolen code had a subroutine to double quote
marks. I showed that the two algorithms were quite different, producing different results
for the same input (an issue of interpretation of the input syntax: were existing double
quotes as the first and last character doubled again, or eliminated? The "stolen" code
dropped them. Also, the person who was the opposing "expert" claimed that there was no
interface to any other code, but the subroutine was mandated by the fact that SQL, which
is the other code both applications interfaced to, DEMANDS that quotes be doubled, and
consequently ANY code that talked to SQL would have to have a double-the-quotes
subroutine]
Similarly, there is an issue of delimiters. For example, if there is no escape
convention, you can use something like the C/C++ string concatenation:
'He shouted "I can' "'t do this!" '" quite loudly", 12345
Depending on what font you have, it may be hard to tell where I used double-single and
single-double (in Arial, they are really hard to tell part) but you can implement a rule
that a comma separator is required to delimit sequences of strings where it is always
legal to have a quoted string separated by 0 or more non-end-of-line whitespace from
another sequence of quoted strings, and these are "compile-time concatenated".
I recently did a project (my PowerPoint Indexer) where I decided to double commas to make
them significant. So if you wrote
item1, item2, item3
this was a sequence of three items,
item1
item2
item3
but if you wrote
item1,, item2, item3
this was treated as a sequence of two items:
item1, item2
item3
or you could write
item1, item2,, item3
which became two items
item1
item2, item3
So it is important to decide what you mean when you define the syntax.
joe
On Fri, 22 Jan 2010 15:37:30 -0500, Hector Santos <sant9442@nospam.gmail.com> wrote:
>Tom Serface wrote:
>
>> One thing most parsers don't handle correctly, that's I've seen, is
>> double double quotes for strings if you want to have a quote as part of
>> the string like:
>>
>> "This is my string "Tom" that I am using", "Next token", "Next token"
>>
>> In the above, from my perspective, the parser should read the entire
>> first string since we didn't come to a delimiter yet, but a lot of
>> tokenizers choke on this sort of thing.
>
>
>Often, it takes two to tango. A writer needs to escape tokens in
>order to reach some level of sanity. i.e, borrowing a C slash for \".
>
> "This is my string \"Tom\" that I am using"
>
>Or use some encoding method, each HTTP Escape! :)
>
>The above is simple if just delimiting by comma. So watching for an
>embedded comma is required. For example:
>
> "This is my string "Tom, Hector" that I am using"
>
>That can be easily handled if the design assumption is each field is
>double quoted. The first token:
>
> "This is my string "Tom,
>
>does not end in double quote, so you continue with a concatenation of
>the next token.
>
> Hector" that I am using"
>
>to complete the first field.
>
>But overall, I found unless its really simple, it helps if you have
>field type definitions known before hand.
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
1/23/2010 4:36:42 AM
|
|
I can generally write an FSM parser in an hour or so, depending on the syntax. I wrote an
XML parser, recursive descent, in eight hours, start to finish. The constraints were
strange, and involved "no public source code, ever", which I thought was foolish, but they
were paying. I did tell them there were a number of cheats, such as it did not handle all
possible encodings of XML files, a constraint they found acceptable.
joe
On Fri, 22 Jan 2010 10:40:06 -0800, "Tom Serface" <tom@camaswood.com> wrote:
>That's one of the things that MFC really has going for it. There is a lot
>of code available and you typically get source with it so, even if there is
>some learning curve, you still get a jump start on getting your job done
>even if you just see how it's done in the sample code.
>
>Tom
>
>"Hector Santos" <sant9442@nospam.gmail.com> wrote in message
>news:#A$vtG0mKHA.5464@TK2MSFTNGP02.phx.gbl...
>> Goran,
>>
>> Many times even with 3rd party libraries, you still have to learn how to
>> use it. Many times, the attempt to generalized does not cover all bases.
>> What if there is a bug? Many times with CSV, it might requires upfront
>> field definition or its all viewed as strings. So the "easiest" does not
>> always mean use a 3rd party solution.
>>
>> Of course the devil is in the details and it helps when the OP provides
>> info, like what language and platform. If he said .NET, as I mention the
>> MS .net collection library has a pretty darn good reader class with the
>> benefits of supporting OOPS as well which allows you to create a data
>> "class" that you pass to the line reader.
>>
>> Guess what? There is still a learning curve here to understand the
>> interface, to use it right as there would be with any library.
>>
>> So the easiest? For me, it all depends - a simple text reader and
>> strtok() parser and work in the escaping issues can be both very easy and
>> super fast! with no dependency on 3rd party QA issues.
>>
>> For me, I have never come across a library or class that could handle
>> everything and if it did, required a data definition interface of some
>> sort - like the .NET collection class offers. If he using .NET, then I
>> recommend using this class as the "easiest."
>>
>
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
1/23/2010 4:43:26 AM
|
|
Joseph M. Newcomer wrote:
> This code looks like something from K&R C programming first edition.
HA! you shouldn't be ashame about it, Joey! You're too easy. :)
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/23/2010 4:44:44 AM
|
|
One of the rules we developed about forty years ago (1968) is that \r is meaningless noise
treated as whitespace, and \n is a newline. This works until you import a text file
creating on a pre-OS X Mac, where \r is the newline character.
joe
On Fri, 22 Jan 2010 19:34:10 -0500, David Wilkinson <no-reply@effisols.com> wrote:
>Tom Serface wrote:
>> One thing most parsers don't handle correctly, that's I've seen, is
>> double double quotes for strings if you want to have a quote as part of
>> the string like:
>>
>> "This is my string "Tom" that I am using", "Next token", "Next token"
>>
>> In the above, from my perspective, the parser should read the entire
>> first string since we didn't come to a delimiter yet, but a lot of
>> tokenizers choke on this sort of thing.
>
>Another thing is tolerating files that have \n or \r line endings rather than \r\n.
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
1/23/2010 5:43:06 AM
|
|
On Fri, 22 Jan 2010 23:44:44 -0500, Hector Santos <sant9442@nospam.gmail.com> wrote:
>Joseph M. Newcomer wrote:
>
>> This code looks like something from K&R C programming first edition.
>
>
>HA! you shouldn't be ashame about it, Joey! You're too easy. :)
Huh? I'd be embarassed to publish an algorithm that was based on K&R C. It represents
the best of mediocre programming of thirty years ago. When I first read it, in 1975, I
said "This language is really badly done", and "strings with fixed size buffers are a
total disaster" and "strtok is one of the worst designs I have ever seen". This is
because for at least 7 years I had been using languages that didn't have these defects.
About 14 years ago, I started programming Win32 "Unicode-aware" and using my own "safe"
libraries for strings; in the intervening years, I moved to strsafe.h, and then to CString
as a way of life. The number of memory clobbers I have has essentially dropped to zero.
K&R C has no safe string operations (use of strcat and strcpy is a firable offense in some
programming shops), is 8-bit-character-only, and still thinks that it makes sense to
create strings by using declarations that declare 8-bit character arrays of static sizes.
There are some VERY rare situations in which this can be done, and I try to avoid them
more and more. The great thing about VS2008 is that it flags all these as warnings, and
in any real build environment, it is appropriate to compile both at /W4 and with "treat
warnings as errors" enabled. Most of the K&R C programming style represents bad style.
Even the second edition, the first example of character array usage (page 29) does not
handle boundaries properly; the "copy" function does not accept a size_t of maximum size
of the destination. Microsoft identified THOUSANDS of potential problems caused by the
failure to pass in buffer lengths, and required massive rewrites to pass buffer lengths
in. So the very, very first example a programmer sees of how to do character arrays DOES
IT COMPLETELY WRONG according to modern programming standards! It even has the comment
"assume 'to' is big enough", and that's how several rather unpleasant pieces of malware
have successfully attacked systems (starting with the infamous RTM Worm of 1988, oh, by
the way, THAT WAS 22 YEARS AGO! You would have thought that in the intervening decades
the idea of fixed-sized buffers without boundary checking would have disappeared
COMPLETELY)
The example also shows the horror of embedding an assignment statement in an if-test. How
can we expect people to learn good practice when one of the canonical introductory texts
teaches AWAY from best practice, in its very first character string example!
Then on page 33 the same error is repeated in the 'copy' function. The error is repeated
again in the getop function on page 78. This sort of code might have been acceptable in
1975 (although I found it offensive back then) but it is definitely NOT remotely
acceptable as a programming model in 2010.
strcpy is defined to cause buffer overrun on page 105. Nobody observes that this a
Really, Really, REALLY BAD IDEA!
At this point, any responsible modern programmer throws the book at the wall in disgust.
joe
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
1/23/2010 6:16:22 AM
|
|
Joseph M. Newcomer wrote:
> One of the rules we developed about forty years ago (1968) is that \r is meaningless noise
> treated as whitespace, and \n is a newline. This works until you import a text file
> creating on a pre-OS X Mac, where \r is the newline character.
> joe
Don't confuse raw vs cooked vs display/print device vs storage systems!
\r\n has their basis as hardware device codes for the harder devices
of the day; printers, teletypes, dumb terminals, etc
\r <CR> is what it is - a carriage return (move it to the first
column) of the printer head! Note the operative word - Carriage!
\n <LF> is what it is - a line feed (move carriage head down one line)
of the printer head!
When the consoles came, the printer head was now your cursor. That is
why it is paired whether there are from translations or not.
Now, your Terminal and Printer could have OPTIONAL translation for an
automatic line feed (/n) with each carriage return (/r) which means it
APPEAR as it was a line delimiter as in in the unix wienie world. In
the MAC word, a /n is the line delimiter. DOS of courses uses /r/n
(<CR><LF>) pairs.
But it is your terminal or printer providing the illusion with
translations which may be default depending on the OS it connected
to). So if you dumped a unix file or mac file to a printer, it did
the proper translation for you. The printer or carriage or laser
point did not change, you still need to tell it to go left, right, up
or down!
Geez, Meaningless?
This again is a example of insane revisionist comments.
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/23/2010 7:43:35 AM
|
|
Yes, after some time I have a parser that I like, but it has a lot of hand
coding in it. I agree that it is a matter of taste how the strings are
formed, but unfortunately, I don't have a lot of control over the input to
out program sometimes. I'm not a big fan of the \ escape thing in CSV files
since that seems odd to uninitiated users.
Not having the separator should be considered a syntax error though. That
much seems fair. We've mostly gone to XML for input and output these days
and that's solved a lot of issues, but raised a whole lot of other ones of
course.
Tom
"Hector Santos" <sant9442@nospam.gmail.com> wrote in message
news:ef#g5K6mKHA.5692@TK2MSFTNGP04.phx.gbl...
> Tom Serface wrote:
>
>> One thing most parsers don't handle correctly, that's I've seen, is
>> double double quotes for strings if you want to have a quote as part of
>> the string like:
>>
>> "This is my string "Tom" that I am using", "Next token", "Next token"
>>
>> In the above, from my perspective, the parser should read the entire
>> first string since we didn't come to a delimiter yet, but a lot of
>> tokenizers choke on this sort of thing.
>
>
> Often, it takes two to tango. A writer needs to escape tokens in order to
> reach some level of sanity. i.e, borrowing a C slash for \".
>
> "This is my string \"Tom\" that I am using"
>
> Or use some encoding method, each HTTP Escape! :)
>
> The above is simple if just delimiting by comma. So watching for an
> embedded comma is required. For example:
>
> "This is my string "Tom, Hector" that I am using"
>
> That can be easily handled if the design assumption is each field is
> double quoted. The first token:
>
> "This is my string "Tom,
>
> does not end in double quote, so you continue with a concatenation of the
> next token.
>
> Hector" that I am using"
>
> to complete the first field.
>
> But overall, I found unless its really simple, it helps if you have field
> type definitions known before hand.
>
>
> --
> HLS
|
|
0
|
|
|
|
Reply
|
Tom
|
1/23/2010 7:46:47 AM
|
|
Yes, that's become particularly important to me in recent years since I've
had to work with files from other platforms (like Mac or other Unix based
systems). I guess that why we get to keep working. So many things to
consider.
Tom
"David Wilkinson" <no-reply@effisols.com> wrote in message
news:#lNeGP8mKHA.5552@TK2MSFTNGP05.phx.gbl...
> Tom Serface wrote:
>> One thing most parsers don't handle correctly, that's I've seen, is
>> double double quotes for strings if you want to have a quote as part of
>> the string like:
>>
>> "This is my string "Tom" that I am using", "Next token", "Next token"
>>
>> In the above, from my perspective, the parser should read the entire
>> first string since we didn't come to a delimiter yet, but a lot of
>> tokenizers choke on this sort of thing.
>
> Another thing is tolerating files that have \n or \r line endings rather
> than \r\n.
>
> --
> David Wilkinson
> Visual C++ MVP
|
|
0
|
|
|
|
Reply
|
Tom
|
1/23/2010 7:48:49 AM
|
|
Well, you could have used Xerces and spent 8 days getting it to work instead
:o)
Tom
"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
news:99vkl5p0cdc2ngvsqpdn0h9rhr5sn8fnal@4ax.com...
> I can generally write an FSM parser in an hour or so, depending on the
> syntax. I wrote an
> XML parser, recursive descent, in eight hours, start to finish. The
> constraints were
> strange, and involved "no public source code, ever", which I thought was
> foolish, but they
> were paying. I did tell them there were a number of cheats, such as it
> did not handle all
> possible encodings of XML files, a constraint they found acceptable.
> joe
|
|
0
|
|
|
|
Reply
|
Tom
|
1/23/2010 7:50:31 AM
|
|
I think Joe is saying it is meaningless these days because there is no
carriage to return any longer. I think most of us consider \n synonymous
with Enter and that implies the start of a new line. A lot of this is
carry over from the days of teletype and paper terminals and we're just
stuck with it as part of ASCII.
Tom
"Hector Santos" <sant9442@nospam.gmail.com> wrote in message
news:uqDAH$$mKHA.1548@TK2MSFTNGP04.phx.gbl...
>
> Joseph M. Newcomer wrote:
>
>> One of the rules we developed about forty years ago (1968) is that \r is
>> meaningless noise
>> treated as whitespace, and \n is a newline. This works until you import
>> a text file
>> creating on a pre-OS X Mac, where \r is the newline character.
>> joe
>
>
> Don't confuse raw vs cooked vs display/print device vs storage systems!
>
> \r\n has their basis as hardware device codes for the harder devices of
> the day; printers, teletypes, dumb terminals, etc
>
> \r <CR> is what it is - a carriage return (move it to the first column) of
> the printer head! Note the operative word - Carriage!
>
> \n <LF> is what it is - a line feed (move carriage head down one line) of
> the printer head!
>
> When the consoles came, the printer head was now your cursor. That is why
> it is paired whether there are from translations or not.
>
> Now, your Terminal and Printer could have OPTIONAL translation for an
> automatic line feed (/n) with each carriage return (/r) which means it
> APPEAR as it was a line delimiter as in in the unix wienie world. In the
> MAC word, a /n is the line delimiter. DOS of courses uses /r/n (<CR><LF>)
> pairs.
>
> But it is your terminal or printer providing the illusion with
> translations which may be default depending on the OS it connected to).
> So if you dumped a unix file or mac file to a printer, it did the proper
> translation for you. The printer or carriage or laser point did not
> change, you still need to tell it to go left, right, up or down!
>
> Geez, Meaningless?
>
> This again is a example of insane revisionist comments.
>
> --
> HLS
|
|
0
|
|
|
|
Reply
|
Tom
|
1/23/2010 7:58:13 AM
|
|
Not so Tom.
It is all the still the same! Trust me! Its what we do! This is my
business. (http://www.santronics.com) It is what we do as one of the
early pioneers in the telecommunications market. It is all still the
same. It a natural part of our framework and everyone else in the same
market. It is a fundamental understanding in this market. If you
don't follow it, you will not be compatibility with the rest of the world.
Our software covers every aspect of the communications market, from
mail readers, telecommunication programs, mail/file distribution and
hosting, dialup vs internet, name it. Your mail post here is
guaranteed to be read by some users in the world with one of our mail
reading devices. Your mail is guaranteed to be stored and forwarded
(gated) to servers using our product, and honestly, if you recently
saw a doctor and a health claim was filed on your behalf, the chances
are really good our software was somewhere in the network loop in
getting that claim collected, processed and the doctor paid!
When you hit ENTER, depending on the device and the OS, it will do the
translation for you.
If you going to display a text file on the screen or send it to a
printer, the device is doing the translation for you or not.
Storage is different because the OS may use 1 EOL (END OF LINE)
character or two. Sure, one can say that is a "WASTE" but you also
have to think of the consequences in overall global portability and
interfacing with other software and hardware devices.
Ultimately, regardless of how it is stored, a translation needs to
take place if you are going to display or print it correctly. If that
was not the case, then I am sure Tom you have seen times where a
printout was all one black line or jagged across a page.
Now, internet based mail protocols, it uses CRLF for many historical
reasons. When a MAC or UNIX mail software sends email or news it must
implement translations otherwise it is broken.
Same with FTP, a well designed server and client needs to take this
into account.
Same with the HTTP protocol - the CRLF is the standard. So that means
that if you are in the MAC/UNIX world, the interface software MUST do
translations.
For some parts of a user software, like a mail reader, most good ones
needs to be DOS/UNIX/MAC ready in reading a text file and these
software generally have sound/solid logic for reading such files.
This is an example where as Joe indicated, a "/n" may be read as a
NEWLINE (EOL is my preferred terminology) but only if there is no /r
that proceeds it.
It is not old, it still here, it fundamental in telecommunications and
no way we can't live without it. But the software and devices today
are so highly engineered to deal with all situations, it is all
transparent to users. :)
--
Tom Serface wrote:
> I think Joe is saying it is meaningless these days because there is no
> carriage to return any longer. I think most of us consider \n
> synonymous with Enter and that implies the start of a new line. A lot
> of this is carry over from the days of teletype and paper terminals and
> we're just stuck with it as part of ASCII.
>
> Tom
>
> "Hector Santos" <sant9442@nospam.gmail.com> wrote in message
> news:uqDAH$$mKHA.1548@TK2MSFTNGP04.phx.gbl...
>>
>> Joseph M. Newcomer wrote:
>>
>>> One of the rules we developed about forty years ago (1968) is that \r
>>> is meaningless noise
>>> treated as whitespace, and \n is a newline. This works until you
>>> import a text file
>>> creating on a pre-OS X Mac, where \r is the newline character.
>>> joe
>>
>>
>> Don't confuse raw vs cooked vs display/print device vs storage systems!
>>
>> \r\n has their basis as hardware device codes for the harder devices
>> of the day; printers, teletypes, dumb terminals, etc
>>
>> \r <CR> is what it is - a carriage return (move it to the first
>> column) of the printer head! Note the operatie word - Carriage!
>>
>> \n <LF> is what it is - a line feed (move carriage head down one line)
>> of the printer head!
>>
>> When the consoles came, the printer head was now your cursor. That is
>> why it is paired whether there are from translations or not.
>>
>> Now, your Terminal and Printer could have OPTIONAL translation for an
>> automatic line feed (/n) with each carriage return (/r) which means it
>> APPEAR as it was a line delimiter as in in the unix wienie world. In
>> the MAC word, a /n is the line delimiter. DOS of courses uses /r/n
>> (<CR><LF>) pairs.
>>
>> But it is your terminal or printer providing the illusion with
>> translations which may be default depending on the OS it connected
>> to). So if you dumped a unix file or mac file to a printer, it did the
>> proper translation for you. The printer or carriage or laser point
>> did not change, you still need to tell it to go left, right, up or down!
>>
>> Geez, Meaningless?
>>
>> This again is a example of insane revisionist comments.
>>
>> --
>> HLS
>
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/23/2010 8:44:06 AM
|
|
Tom Serface wrote:
> I think Joe is saying it is meaningless these days because there is no
> carriage to return any longer. I think most of us consider \n
> synonymous with Enter and that implies the start of a new line. A lot
> of this is carry over from the days of teletype and paper terminals and
> we're just stuck with it as part of ASCII.
>
I just wanted to add, yes, \n is viewed as a new line, but that is
only in the DOS/Windows world. Not the case in the "other" worlds!
In the DOS/Windows programming the default is COOKED mode when you
open a text file. COOKED means it will do translations for you - in
both directions. In RAW mode, there is no translation and you must be
specific, <CR><LF> or \r\n.
In MS C/C++, file I/O runtime library function
_setmode()
can be used to set/change the binary (RAW) or text (COOKED)
translation mode. For example,
_setmode( _fileno( stdin ), _O_BINARY );
_setmode( _fileno( stdout ), _O_BINARY );
will set a standard I/O console program to be compatibility with the
UNIX/MAC/DOS world because you are dealing with RAW bytes, no
transparent translations being done.
Here is a quick portable "fetch" program you can use to GET a HTTP
resource from a web site:
================= CUT HERE ======================
/* fetch.c -- fetch via HTTP and dump the entire session to stdout
very stupidly. Illustrate need to change the stdout
default _O_TEXT cooked mode to _O_BINARY raw mode.
*/
#ifdef _WIN32
#include <windows.h>
#include <stdio.h>
#include <string.h>
#include <winsock.h>
#include <fcntl.h>
#include <io.h>
#pragma comment(lib,"wsock32.lib")
#define close(a) closesocket(a)
#define read(a,b,c) recv(a,b,c,0)
#define write(a,b,c) send(a,b,c,0)
#else
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <signal.h>
#endif
main(argc, argv)
int argc;
char **argv;
{
int pfd; /* fd from socket */
int len;
char *hostP, *fileP;
char buf[1024];
struct hostent *hP; /* for host */
struct sockaddr_in sin;
#ifdef _WIN32
WSADATA wd;
if (WSAStartup(MAKEWORD(1, 1), &wd) != 0) {
exit(1);
}
_setmode( _fileno( stdin ), _O_BINARY );
_setmode( _fileno( stdout ), _O_BINARY );
#endif
if ( argc != 3 ) {
fprintf( stderr, "Usage: %s host file\n", argv[0] );
exit( 1 );
}
hostP = argv[1];
fileP = argv[2];
hP = gethostbyname( hostP );
if ( hP == NULL ) {
fprintf( stderr, "Unknown host \"%s\"\n", hostP );
exit( 1 );
}
pfd = socket( AF_INET, SOCK_STREAM, 0 );
if ( pfd < 0 ) {
perror( "socket" );
exit( 1 );
}
sin.sin_family = hP->h_addrtype;
memcpy( (char *)&sin.sin_addr, hP->h_addr, hP->h_length );
sin.sin_port = htons( 80 );
if ( connect( pfd, (struct sockaddr *)&sin, sizeof(sin) ) < 0 ) {
perror( "connect" );
close( pfd );
exit( 1 );
}
sprintf( buf, "GET %s HTTP/1.0\r\n"
"host: %s\r\n"
"accept: *.*\r\n\r\n", fileP, hostP);
write( pfd, buf, strlen(buf));
while ( ( len = read( pfd, buf, sizeof(buf)) ) > 0)
fwrite( buf, 1, len, stdout );
close( pfd );
fflush( stdout );
exit( 0 );
}
================= CUT HERE ======================
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/23/2010 9:15:39 AM
|
|
A cynic after my own heart.
I've used XPAT in the past. But the attorneys had the development people on a short leash
about "open source", proving once again that the GPL is one of the worst ideas to have
ever been invented.
joe
On Fri, 22 Jan 2010 23:50:31 -0800, "Tom Serface" <tom@camaswood.com> wrote:
>Well, you could have used Xerces and spent 8 days getting it to work instead
>:o)
>
>Tom
>
>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>news:99vkl5p0cdc2ngvsqpdn0h9rhr5sn8fnal@4ax.com...
>> I can generally write an FSM parser in an hour or so, depending on the
>> syntax. I wrote an
>> XML parser, recursive descent, in eight hours, start to finish. The
>> constraints were
>> strange, and involved "no public source code, ever", which I thought was
>> foolish, but they
>> were paying. I did tell them there were a number of cheats, such as it
>> did not handle all
>> possible encodings of XML files, a constraint they found acceptable.
>> joe
>
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
1/23/2010 6:15:59 PM
|
|
See below...
On Sat, 23 Jan 2010 02:43:35 -0500, Hector Santos <sant9442@nospam.gmail.com> wrote:
>
>Joseph M. Newcomer wrote:
>
>> One of the rules we developed about forty years ago (1968) is that \r is meaningless noise
>> treated as whitespace, and \n is a newline. This works until you import a text file
>> creating on a pre-OS X Mac, where \r is the newline character.
>> joe
>
>
>Don't confuse raw vs cooked vs display/print device vs storage systems!
****
Historically, this has been a problem since the 1960s, AT LEAST. So it is unlikely I have
confused them.
****
>
>\r\n has their basis as hardware device codes for the harder devices
>of the day; printers, teletypes, dumb terminals, etc
>
>\r <CR> is what it is - a carriage return (move it to the first
>column) of the printer head! Note the operative word - Carriage!
****
This is news? I knew this in 1965.
****
>
>\n <LF> is what it is - a line feed (move carriage head down one line)
>of the printer head!
****
In 1968 I wrote an optimized plot program that took advantage of this capability, so it is
unlikely I would not understand it.
****
>
>When the consoles came, the printer head was now your cursor. That is
>why it is paired whether there are from translations or not.
>
>Now, your Terminal and Printer could have OPTIONAL translation for an
>automatic line feed (/n) with each carriage return (/r) which means it
>APPEAR as it was a line delimiter as in in the unix wienie world. In
>the MAC word, a /n is the line delimiter. DOS of courses uses /r/n
>(<CR><LF>) pairs.
****
Yes, and generally we considered this a real mistake in the design, done by engineers who
had no concept of reality.
****
>
>But it is your terminal or printer providing the illusion with
>translations which may be default depending on the OS it connected
>to). So if you dumped a unix file or mac file to a printer, it did
>the proper translation for you. The printer or carriage or laser
>point did not change, you still need to tell it to go left, right, up
>or down!
>
>Geez, Meaningless?
****
I was not talking about display. I was talking about reading stored information from a
file. At no point were we talking about displays; we were talking about parsing files. Or
had you missed that little point?
In parsing a file, most systems use one of two conventions: \n to end a line (if you are
Unix) and \r\n to end a line (MS-DOS, Windows). The Mac introduced a serious aberration
into this, using \r to end a line. Across dozens of operating systems, over many decades,
the only conventions used were either the Unix convention or what became the MS-DOS
convention (it adopted a long-standing tradition dating to the mid-1960s). In parsing
files, therefore, we learned early on that \r is meaningless and \n is a line terminator.
Now, if you want to change the discussion to display on dumb terminals, we can have a
completely different discussion. For example, the IBM 2741 vs. the IBM 1050 conventions,
the Model 33 TTY conventions, the conventions used by perhaps a dozen different video
terminal vendors, etc. All of these involve how those characters were used to DISPLAY
information. When parsing information, however, \r and \n are both considered
"whitespace" and the \r is meaningless. So don't confuse display with storage. Oh, wait
a minute, that's what you told me...
joe
>
>This again is a example of insane revisionist comments.
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
1/23/2010 6:22:46 PM
|
|
Logically, many rendering programs will try \r as "reset cursor to beginning margin"
(which in most languages is the left side of the display area, but in some languages like
Arabic and Hebrew is the right side). But that's a display technique, and he somehow made
the leap from my discussing how to parse stored data to thinking I was talking about
display rendering, a topic that was not under discussion.
joe
On Fri, 22 Jan 2010 23:58:13 -0800, "Tom Serface" <tom@camaswood.com> wrote:
>I think Joe is saying it is meaningless these days because there is no
>carriage to return any longer. I think most of us consider \n synonymous
>with Enter and that implies the start of a new line. A lot of this is
>carry over from the days of teletype and paper terminals and we're just
>stuck with it as part of ASCII.
>
>Tom
>
>"Hector Santos" <sant9442@nospam.gmail.com> wrote in message
>news:uqDAH$$mKHA.1548@TK2MSFTNGP04.phx.gbl...
>>
>> Joseph M. Newcomer wrote:
>>
>>> One of the rules we developed about forty years ago (1968) is that \r is
>>> meaningless noise
>>> treated as whitespace, and \n is a newline. This works until you import
>>> a text file
>>> creating on a pre-OS X Mac, where \r is the newline character.
>>> joe
>>
>>
>> Don't confuse raw vs cooked vs display/print device vs storage systems!
>>
>> \r\n has their basis as hardware device codes for the harder devices of
>> the day; printers, teletypes, dumb terminals, etc
>>
>> \r <CR> is what it is - a carriage return (move it to the first column) of
>> the printer head! Note the operative word - Carriage!
>>
>> \n <LF> is what it is - a line feed (move carriage head down one line) of
>> the printer head!
>>
>> When the consoles came, the printer head was now your cursor. That is why
>> it is paired whether there are from translations or not.
>>
>> Now, your Terminal and Printer could have OPTIONAL translation for an
>> automatic line feed (/n) with each carriage return (/r) which means it
>> APPEAR as it was a line delimiter as in in the unix wienie world. In the
>> MAC word, a /n is the line delimiter. DOS of courses uses /r/n (<CR><LF>)
>> pairs.
>>
>> But it is your terminal or printer providing the illusion with
>> translations which may be default depending on the OS it connected to).
>> So if you dumped a unix file or mac file to a printer, it did the proper
>> translation for you. The printer or carriage or laser point did not
>> change, you still need to tell it to go left, right, up or down!
>>
>> Geez, Meaningless?
>>
>> This again is a example of insane revisionist comments.
>>
>> --
>> HLS
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
1/23/2010 6:24:23 PM
|
|
Thanks for everyone's contributions. There's quite a lot here to digest. Re
strtok - I remember using this many years ago, and as far as I recall it
would jump over empty csv entries, so the string "one,,three" would return
"one" followed by "three".
|
|
0
|
|
|
|
Reply
|
Stanza
|
1/23/2010 6:52:05 PM
|
|
On Jan 22, 9:37=A0pm, Hector Santos <sant9...@nospam.gmail.com> wrote:
> Tom Serface wrote:
> > One thing most parsers don't handle correctly, that's I've seen, is
> > double double quotes for strings if you want to have a quote as part of
> > the string like:
>
> > "This is =A0my string "Tom" that I am using", "Next token", "Next token=
"
>
> > In the above, from my perspective, the parser should read the entire
> > first string since we didn't come to a delimiter yet, but a lot of
> > tokenizers choke on this sort of thing.
>
> Often, it takes two to tango. =A0A writer needs to escape tokens in
> order to reach some level of sanity. i.e, borrowing a C slash for \".
>
> =A0 =A0 =A0"This is =A0my string \"Tom\" that I am using"
>
> Or use some encoding method, each HTTP Escape! :)
You really should stop with NIH.
I have never seen HTTP escaping in CSV files. I know of two relevant
conventions: "Unix" one for D(elimiter)SV files, where escape
character is backlash, and "Windows" (RFC 4180) one, with quote
character escaping. What the hell do you think are you doing,
inventing things like that?
Did it occur that CSV files are useless on their own. People use tools
(e.g. Excel) to view them. I don't think that works with HTTP
escaping, and I would be surprised if it did.
So let me tell you something here: you are proposing that people write
their own CSV parser, and you yourself claim to have written one or
more. But frankly, you don't seem to know what CSV files are, neither
by spec, neither in practice.
That's EXACTLY the kind of attitude I am denouncing here. That's
EXACTLY why first thing to do is to look for existing code, NOT roll
your own based on poor understanding of the problem, or worse yet,
defining an old problem anew, on a whim.
Goran.
|
|
0
|
|
|
|
Reply
|
Goran
|
1/23/2010 6:56:58 PM
|
|
Goran wrote:
>
> I have never seen HTTP escaping in CSV files.
I'm sorry about that. But its out there, maybe because layman web
programmers were trying put CSV lines over HTTP and it was naturally
escaped, but whatever reasons, its out there.
> I know of two relevant
> conventions: "Unix" one for D(elimiter)SV files, where escape
> character is backlash, and "Windows" (RFC 4180) one, with quote
> character escaping. What the hell do you think are you doing,
> inventing things like that?
I knew Yakov, and RFC 4180 was written in 2005, and ABSOLUTELY, check
it out, http-like %XX escaping is a recommendation. But practically
applications predated Yakoc RFC recommendation by several decades!
And who's reinventing things, I'm not the trying to change CSV to DSV.
its COMMA, ok?
> Did it occur that CSV files are useless on their own.
No. It didn't. Is that another opinion of yours?
> People use tools (e.g. Excel) to view them.
Describe PEOPLE!
> I don't think that works with HTTP escaping, and I would be
> surprised if it did.
Do you know what HTTP escaping is for god sake?
> So let me tell you something here:
Please do.
> you are proposing that people write their own CSV parser,
Hello? Did ANYONE with a sane mind here read that I said people
should writ their own CVS, no DVS parser over anything else?
> and you yourself claim to have written one or
> more. But frankly, you don't seem to know what CSV files are, neither
> by spec, neither in practice.
Ha! And you surely showing you know a lot!
> That's EXACTLY the kind of attitude I am denouncing here. That's
> EXACTLY why first thing to do is to look for existing code, NOT roll
> your own based on poor understanding of the problem, or worse yet,
> defining an old problem anew, on a whim.
Well, good luck in trying to stop it because generally most
programmers are interesting in knowing how to write code, not always
depend on others!! Sound like you would are not very good at neither!
Really, you are not. Honestly. I would never hire a person like you
and even if you didn't want work for me, you are certainly not showing
you have good qualifications for programming on your own. How do you
like those adam apples!?
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/23/2010 7:27:44 PM
|
|
Stanza wrote:
> Thanks for everyone's contributions. There's quite a lot here to digest.
> Re strtok - I remember using this many years ago, and as far as I recall
> it would jump over empty csv entries, so the string "one,,three" would
> return "one" followed by "three".
True.
Stanza you really could short circuit the craziest here by describing
what language and platform you are using or want this solution. It
would certainly help GORAN! :)
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/23/2010 7:47:26 PM
|
|
There is one explanation for this spin. You are visually handicap. If
that is the case, I wish to extend my apology for my ignorance.
--
HLS
Joseph M. Newcomer wrote:
> Logically, many rendering programs will try \r as "reset cursor to beginning margin"
> (which in most languages is the left side of the display area, but in some languages like
> Arabic and Hebrew is the right side). But that's a display technique, and he somehow made
> the leap from my discussing how to parse stored data to thinking I was talking about
> display rendering, a topic that was not under discussion.
> joe
>
> On Fri, 22 Jan 2010 23:58:13 -0800, "Tom Serface" <tom@camaswood.com> wrote:
>
>> I think Joe is saying it is meaningless these days because there is no
>> carriage to return any longer. I think most of us consider \n synonymous
>> with Enter and that implies the start of a new line. A lot of this is
>> carry over from the days of teletype and paper terminals and we're just
>> stuck with it as part of ASCII.
>>
>> Tom
>>
>> "Hector Santos" <sant9442@nospam.gmail.com> wrote in message
>> news:uqDAH$$mKHA.1548@TK2MSFTNGP04.phx.gbl...
>>> Joseph M. Newcomer wrote:
>>>
>>>> One of the rules we developed about forty years ago (1968) is that \r is
>>>> meaningless noise
>>>> treated as whitespace, and \n is a newline. This works until you import
>>>> a text file
>>>> creating on a pre-OS X Mac, where \r is the newline character.
>>>> joe
>>>
>>> Don't confuse raw vs cooked vs display/print device vs storage systems!
>>>
>>> \r\n has their basis as hardware device codes for the harder devices of
>>> the day; printers, teletypes, dumb terminals, etc
>>>
>>> \r <CR> is what it is - a carriage return (move it to the first column) of
>>> the printer head! Note the operative word - Carriage!
>>>
>>> \n <LF> is what it is - a line feed (move carriage head down one line) of
>>> the printer head!
>>>
>>> When the consoles came, the printer head was now your cursor. That is why
>>> it is paired whether there are from translations or not.
>>>
>>> Now, your Terminal and Printer could have OPTIONAL translation for an
>>> automatic line feed (/n) with each carriage return (/r) which means it
>>> APPEAR as it was a line delimiter as in in the unix wienie world. In the
>>> MAC word, a /n is the line delimiter. DOS of courses uses /r/n (<CR><LF>)
>>> pairs.
>>>
>>> But it is your terminal or printer providing the illusion with
>>> translations which may be default depending on the OS it connected to).
>>> So if you dumped a unix file or mac file to a printer, it did the proper
>>> translation for you. The printer or carriage or laser point did not
>>> change, you still need to tell it to go left, right, up or down!
>>>
>>> Geez, Meaningless?
>>>
>>> This again is a example of insane revisionist comments.
>>>
>>> --
>>> HLS
> Joseph M. Newcomer [MVP]
> email: newcomer@flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/23/2010 8:02:00 PM
|
|
Goran wrote:
> I have never seen HTTP escaping in CSV files. I know of two relevant
> conventions: "Unix" one for D(elimiter)SV files, where escape
> character is backlash, and "Windows" (RFC 4180) one....
Curious what does RFC 4180 have to do with Windows? You don't seem
the time that would be a follower or implementator IETF documents
generically because RFC are written with the basic idea of RAW
information. In other words, you would hardly, if ever, and if so,
very few see an RFC with any specific recommendation for a LIBRARY,
VENDOR or SOLUTION.
So when you reference RFC documents, it targeted at people (like
myself) who implement these recommendations for people (like you) to use.
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/23/2010 8:18:51 PM
|
|
Hi Stanza,
> What is the easiest way of reading a line at a time through a textual CSV
> file, and then extracting the comma-separated elements from each line?
You're probably already set up for this but I just blogged about this if
you're interested.
http://www.softcircuits.com/Blog/post/2010/01/21/Reading-and-Writing-CSV-Files-in-MFC.aspx
--
Jonathan Wood
SoftCircuits Programming
http://www.softcircuits.com
|
|
0
|
|
|
|
Reply
|
Jonathan
|
1/24/2010 4:21:45 AM
|
|
> Stanza you really could short circuit the craziest here by describing
> what language and platform you are using or want this solution. It
> would certainly help GORAN! :)
I'm using VC++ and MFC, that's why I posted on the "vc.mfc" newsgroup.
|
|
0
|
|
|
|
Reply
|
Stanza
|
1/24/2010 11:34:05 AM
|
|
On Jan 23, 8:27=A0pm, Hector Santos <sant9...@nospam.gmail.com> wrote:
> Goran wrote:
>
> > I have never seen HTTP escaping in CSV files.
>
> I'm sorry about that. =A0But its out there, maybe because layman web
> programmers were trying put CSV lines over HTTP and it was naturally
> escaped, but whatever reasons, its out there.
>
> > I know of two relevant
> > conventions: "Unix" one for D(elimiter)SV files, where escape
> > character is backlash, and "Windows" (RFC 4180) one, with quote
> > character escaping. What the hell do you think are you doing,
> > inventing things like that?
>
> I knew Yakov, and RFC 4180 was written in 2005, and ABSOLUTELY, check
> it out, http-like %XX escaping is a recommendation. =A0But practically
> applications predated Yakoc RFC recommendation by several decades!
OK, I have re-checked it out. There's no recommendation to use HTTP-
escaping in the RFC, nor in prior informal explanations prior to it
(e.g. http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm). So it
looks that you can't read. Or.. Wait! Do you think that use of %XX in
the ABNF grammar proposed by the RFC means that HTTP escaping should
be used in CSV files? Bwahahahaaaaa...
Also, can you tell me what widely available program used to view CSV
recognizes HTTP escaping? E.g. my Excel does not seem to do it and I'd
be surprised if OO did, but I won't be arsed to check.
Here's how things stand: perhaps some poor deluded souls did use HTTP
escaping in CSV files. That doesn't mean it has any bearing to the
rest of the sane world.
> And who's reinventing things, I'm not the trying to change CSV to DSV.
> its COMMA, ok?
Not OK.
About DSV: it's a long standing tradition in Unix world. Google it,
here's help: http://www.google.com/search?q=3DUnix+DSV.
In 4180, it's a comma. In real world, on Windows, it's locale list
separator, often resulting in e.g. semicolon. If you offer comma-
separated file to someone in e.g. France, he will quite likely tell
you that your file is broken. AFAIK, Excel obeys locale list
separator, and you can certainly change that. Given that RFC is quite
recent, I'd be wary of prior use of other separator characters. If
nothing else, because some DSV files are known to be written prior to
it.
> Well, good luck in trying to stop it because generally most
> programmers are interesting in knowing how to write code, not always
> depend on others!!
Well, I don't know about most programmers, but good programmers know
when to avoid writing code. That includes when problem is solved
dozens times over, like this one.
>=A0Sound like you would are not very good at neither!
> Really, you are not. Honestly. I would never hire a person like you
Irrelevant, 'cause I've seen code you posted here. Based on that,
there's no way I'd come to work for you - it was laughable.
Goran.
|
|
0
|
|
|
|
Reply
|
Goran
|
1/24/2010 5:23:19 PM
|
|
Goran wrote:
> OK, I have re-checked it out. There's no recommendation to use HTTP-
> escaping in the RFC, nor in prior informal explanations prior to it
> (e.g. http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm). So it
> looks that you can't read. Or.. Wait! Do you think that use of %XX in
> the ABNF grammar proposed by the RFC means that HTTP escaping should
> be used in CSV files? Bwahahahaaaaa...
I mis-spoke on this one. Its a recently new recommendation. As I
mentioned a few messages:
- It takes two to tango, writer/reader normally work in
concern,
- There are "http" escaping use cases, to suggest there are
not is not correct.
> Well, I don't know about most programmers, but good programmers know
> when to avoid writing code.
Well most good programmers know that is not correct, especially, in
cases like CSV as it was shown over and over again.
> That includes when problem is solved dozens times over, like this one.
So. It doesn't mean anything as it was shown over and over, not even
the best library can cover all bases, and even it did a good job, you
still need to know what you are doing.
>> Sound like you would are not very good at neither!
>> Really, you are not. Honestly. I would never hire a person like you
>
> Irrelevant, 'cause I've seen code you posted here. Based on that,
> there's no way I'd come to work for you - it was laughable.
Yes it is laughable. At least I put the effort and have quality
products in the market place for 30 years. Do you? You rather depend
on others to do your work. Hey, if that is your preference, cool,
maybe that is your way of keeping up and basic level understanding of
problem solving. Some people are actually pretty good using this
method. You don't seem to be one of them.
---
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/24/2010 8:51:24 PM
|
|
On Jan 24, 9:51=A0pm, Hector Santos <sant9...@nospam.gmail.com> wrote:
> Goran wrote:
> > OK, I have re-checked it out. There's no recommendation to use HTTP-
> > escaping in the RFC, nor in prior informal explanations prior to it
> > (e.g.http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm). So it
> > looks that you can't read. Or.. Wait! Do you think that use of %XX in
> > the ABNF grammar proposed by the RFC means that HTTP escaping should
> > be used in CSV files? Bwahahahaaaaa...
>
> I mis-spoke on this one.
No, you didn't mis-spoke. You did not __know__ what e.g COMMA =3D %x2C
means in the RFC, you connected it with HTTP escaping, and then you
blathered.
> Its a recently new recommendation.
Ah. Ok, so where is it? Made by who? And most importantly, what would
the relevance of that be given that existing CSV reading code won't be
able to meaningfully use new files with that?
So let me tell you about HTTP escaping in CSV: even if some poor
deluded soul actually did that, they are __wrong__.
> As I
> mentioned a few messages:
>
> =A0 - It takes two to tango, writer/reader normally work in
> =A0 =A0 concern,
OK, my reader is MS office or OpenOffice (that's not in the least
unreasonable, you see). I want your CSV files to work with that. How's
that working out with HTTP escaping, then?
>
> =A0 - There are "http" escaping use cases, to suggest there are
> =A0 =A0 not is not correct.
As I said, made by poor deluded souls (or perhaps by arrogant pricks
who think __they__ have the right to define how CSV content should be
escaped). There is no reason to use HTTP escaping in CSV, and it is a
bad idea, because CSV content existed and was escaped before HTTP was
any relevant.
You see, you claim to know how to write code to e.g. parse CVS, but
you don't know what CSV __is__. You don't know (or you didn't know
before people spoke about that here) how it's escaped (DSV is commonly
escaped with '\' on Unix), you don't know that there is software that
uses first-line-are-field-names convention, or
"*Field1,Field2,Field2..." (note the '*') convention to give the field
names (e.g. Excel, but, other code notwithstanding), and you refuse to
take into account localization issues (list separator changes between
cultures, the reason why 4180 isn't good enough). So if you worked
where I work, CSV parsing, the way you see it, wouldn't go past
testing.
>
> > Well, I don't know about most programmers, but good programmers know
> > when to avoid writing code.
>
> Well most good programmers know that is not correct, especially, in
> cases like CSV as it was shown over and over again.
>
> > That includes when problem is solved dozens times over, like this one.
>
> So. It doesn't mean anything as it was shown over and over, not even
> the best library can cover all bases
So what? The C code you've shown covers almost __no__ bases. And it's
still a waste of time to write code that will cover enough of them if
one can just get it.
> > Irrelevant, 'cause I've seen code you posted here. Based on that,
> > there's no way I'd come to work for you - it was laughable.
>
> Yes it is laughable. At least I put the effort and have quality
> products in the market place for 30 years.
If the code you put in production is not incomparably better, no you
haven't. And if the code someone puts in production really is that
better, then that person would not write code like you've shown, let
alone show it it to someone.
Goran.
|
|
0
|
|
|
|
Reply
|
Goran
|
1/25/2010 8:37:36 AM
|
|
"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
news:s33ll5l35qnknus8i1n06qitu1r5f750jv@4ax.com...
> On Fri, 22 Jan 2010 23:44:44 -0500, Hector Santos
> <sant9442@nospam.gmail.com> wrote:
>
>>Joseph M. Newcomer wrote:
>>
>>> This code looks like something from K&R C programming first edition.
>>
>>
>>HA! you shouldn't be ashame about it, Joey! You're too easy. :)
>
> Huh? I'd be embarassed to publish an algorithm that was based on K&R C.
> It represents
> the best of mediocre programming of thirty years ago. When I first read
> it, in 1975, I
> said "This language is really badly done",
From my understanding, the book wasn't published until 1978! ;-) You may
have been thinking of the version of the C reference manual that was
provided with the Unix operating system.
Other than that little nitpick, I think you are confusing the defects in the
implementation of the standard library with the true language defects.
Additionally, I think to compare a 30+ year old language that was originally
designed as a system implementation language with the modern general purpose
languages of today is misleading. C met its goals and became wildly
successful despite its flaws and is still very widely used even today.
-Pete
-Pete
|
|
0
|
|
|
|
Reply
|
Pete
|
1/25/2010 7:08:28 PM
|
|
Pete Delgado wrote:
>> Joesph M. Newcomer wrote:
>>
>> Huh? I'd be embarassed to publish an algorithm that was based on K&R C.
>> It represents the best of mediocre programming of thirty years ago.
>> When I first read it, in 1975, I said "This language is really
>> badly done",
>
> ..... I think you are confusing the defects in the
> implementation of the standard library with the true language defects.
> Additionally, I think to compare a 30+ year old language that was originally
> designed as a system implementation language with the modern general purpose
> languages of today is misleading. C met its goals and became wildly
> successful despite its flaws and is still very widely used even today.
Excellent point Pete.
My opinion on a related note:
I find it increasing harder to shallow how even today, people are
getting locked into thinking or "mindset" molded that what you did in
the past was all wrong and today's method is the better way. While
one might be able explain why the "militants" of this mantra are they
way they are, I just find the "stubbornness" very vexing, especially
from those who are obviously veterans of the industry. Generally, it
is the opposite.
Again, a rhetorical opinion.
I personally find it very odd that a dialect of the C language or any
language for that matter, has been "kludge" to support a syntax that
is cluttered with type casting.
So as Joe might has stated 25 years ago in regards to C:
"This language is really badly done"
I can easily say the same thing today with today's C/C++ syntaxing to
support unicode. I say that because 25 years from now, programmers
will most likely be saying the same sort of thing:
"This language is really badly done! It isn't
necessary today. Why all the extra coding?"
I only say that because I am firm believer in functional programming
where such specifics are excluded from the overall "thinking" and
interfacing, I/O transformation or function generators.
This is closer and it also it is a rebirth direction of where MS and
others are going with its languages. It is more "natural" per se,
without the wasteful "think time" to address minor details one has to
put into current C/C++ programming.
These minor details are quite often SO "touchy" in its required
construct that if one subtle mistake will create the same or worst
problem that it attempted to resolved or be part of the two part solution.
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/25/2010 8:36:37 PM
|
|
OK, you make some good points, but I think to the typical PC users, they
don't see the action of a carriage any longer.
Tom
"Hector Santos" <sant9442@nospam.gmail.com> wrote in message
news:eIFE7gAnKHA.1548@TK2MSFTNGP02.phx.gbl...
> Not so Tom.
>
> It is all the still the same! Trust me! Its what we do! This is my
> business. (http://www.santronics.com) It is what we do as one of the
> early pioneers in the telecommunications market. It is all still the same.
> It a natural part of our framework and everyone else in the same market.
> It is a fundamental understanding in this market. If you don't follow it,
> you will not be compatibility with the rest of the world.
>
> Our software covers every aspect of the communications market, from mail
> readers, telecommunication programs, mail/file distribution and hosting,
> dialup vs internet, name it. Your mail post here is guaranteed to be read
> by some users in the world with one of our mail reading devices. Your mail
> is guaranteed to be stored and forwarded (gated) to servers using our
> product, and honestly, if you recently saw a doctor and a health claim was
> filed on your behalf, the chances are really good our software was
> somewhere in the network loop in getting that claim collected, processed
> and the doctor paid!
>
> When you hit ENTER, depending on the device and the OS, it will do the
> translation for you.
>
> If you going to display a text file on the screen or send it to a printer,
> the device is doing the translation for you or not.
>
> Storage is different because the OS may use 1 EOL (END OF LINE)
|
|
0
|
|
|
|
Reply
|
Tom
|
1/26/2010 11:10:32 PM
|
|
Yeah, there are too many worlds these days :o) I wish they could all get
together and make all of our jobs easier. All I ever need in CSV is a line
ending character. More than one is just more than one.
Tom
"Hector Santos" <sant9442@nospam.gmail.com> wrote in message
news:OXMXjyAnKHA.1552@TK2MSFTNGP04.phx.gbl...
> Tom Serface wrote:
>
>> I think Joe is saying it is meaningless these days because there is no
>> carriage to return any longer. I think most of us consider \n synonymous
>> with Enter and that implies the start of a new line. A lot of this is
>> carry over from the days of teletype and paper terminals and we're just
>> stuck with it as part of ASCII.
>>
|
|
0
|
|
|
|
Reply
|
Tom
|
1/26/2010 11:12:37 PM
|
|
+1 and that was the goal, to make it transparent. Reminds me of an
old user confuser support question, when told to press any key:
"Where is the ANY key?"
:)
Tom Serface wrote:
> OK, you make some good points, but I think to the typical PC users, they
> don't see the action of a carriage any longer.
>
> Tom
>
>
> "Hector Santos" <sant9442@nospam.gmail.com> wrote in message
> news:eIFE7gAnKHA.1548@TK2MSFTNGP02.phx.gbl...
>> Not so Tom.
>>
>> It is all the still the same! Trust me! Its what we do! This is my
>> business. (http://www.santronics.com) It is what we do as one of the
>> early pioneers in the telecommunications market. It is all still the
>> same. It a natural part of our framework and everyone else in the same
>> market. It is a fundamental understanding in this market. If you
>> don't follow it, you will not be compatibility with the rest of the
>> world.
>>
>> Our software covers every aspect of the communications market, from
>> mail readers, telecommunication programs, mail/file distribution and
>> hosting, dialup vs internet, name it. Your mail post here is
>> guaranteed to be read by some users in the world with one of our mail
>> reading devices. Your mail is guaranteed to be stored and forwarded
>> (gated) to servers using our product, and honestly, if you recently
>> saw a doctor and a health claim was filed on your behalf, the chances
>> are really good our software was somewhere in the network loop in
>> getting that claim collected, processed and the doctor paid!
>>
>> When you hit ENTER, depending on the device and the OS, it will do the
>> translation for you.
>>
>> If you going to display a text file on the screen or send it to a
>> printer, the device is doing the translation for you or not.
>>
>> Storage is different because the OS may use 1 EOL (END OF LINE)
>
>
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/26/2010 11:41:35 PM
|
|
Tom Serface wrote:
> Yeah, there are too many worlds these days :o) I wish they could all
> get together and make all of our jobs easier. All I ever need in CSV is
> a line ending character. More than one is just more than one.
+1. I think overall, as experienced by my own career, system
interoperability has been somewhat successful, i.e. it worked to the
extent that it did allow you to continue, to enjoy it, not go crazy or
give up in this line of work. No doubt there were specific things you
wish were different and even felt at times the world would be better
of if done things that people knew about but were afraid to do or
change. The "Who Moved My Cheese?" syndrome.
This CSV topic was interested because it represents a primitive
protocol with many open-ended use cases. Yakov made an 2005 attempt
to codify the basic simple thinking but it could not cover all bases
and in typical RFC fashion, because it can not cover all bases, it
intentionally leaves much of this open ended. The document is peppered
with this, for example it states in regards to a line ending:
Encoding considerations:
As per section 4.1.1. of RFC 2046 [3], this media type uses CRLF
to denote line breaks. However, implementors should be aware that
some implementations may use other values.
This is a common approach to RFC documents. Intentional and ambiguous
informal notes. It tells you to be aware but only as a READER but not
what are the possible cases as a READER or WRITER.
There is also a few other subtle points to understand about this document.
- This RFC are not standard, they are recommendations,
guidelines,
- This RFC was an Informational category, not standard track.
The latter is important because it means it was fast tracked as an
informational RFC and thus didn't require a IETF working group (WG)
peer review process that can take 2-4 years or longer. That is
probably why I didn't see this in 2005. A Working Group would of
scrubbed many issues and for that reason it would of prolonged the RFC
publication. I personally find the security section lacking and SQL
CSV input formats considerations missing.
One thing you can do to help the world is to create IETF draft
proposals or get involved in the some of the standardization working
groups.
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
1/27/2010 12:35:12 AM
|
|
Yes, that is correct. It malfunctions, because it is working precisely as specified, not
as anyone would actually want it to. But strtok is such a fundamentally BAD design that
it should never be used in real code.
The number of times I've seen an embedded strtok added long after program creation
completely nuke an existing strtok is amazingly high. But by the early 1980s, we had
learned to never, ever use strtok in any code we cared about, which was typically 100% of
our code.
joe
On Sat, 23 Jan 2010 18:52:05 -0000, "Stanza" <stanza@devnull.com> wrote:
>Thanks for everyone's contributions. There's quite a lot here to digest. Re
>strtok - I remember using this many years ago, and as far as I recall it
>would jump over empty csv entries, so the string "one,,three" would return
>"one" followed by "three".
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
2/1/2010 1:45:41 PM
|
|
See below...
On Mon, 25 Jan 2010 14:08:28 -0500, "Pete Delgado" <Peter.Delgado@NoSpam.com> wrote:
>
>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>news:s33ll5l35qnknus8i1n06qitu1r5f750jv@4ax.com...
>> On Fri, 22 Jan 2010 23:44:44 -0500, Hector Santos
>> <sant9442@nospam.gmail.com> wrote:
>>
>>>Joseph M. Newcomer wrote:
>>>
>>>> This code looks like something from K&R C programming first edition.
>>>
>>>
>>>HA! you shouldn't be ashame about it, Joey! You're too easy. :)
>>
>> Huh? I'd be embarassed to publish an algorithm that was based on K&R C.
>> It represents
>> the best of mediocre programming of thirty years ago. When I first read
>> it, in 1975, I
>> said "This language is really badly done",
>
>From my understanding, the book wasn't published until 1978! ;-) You may
>have been thinking of the version of the C reference manual that was
>provided with the Unix operating system.
****
The book was a repackaging of the C reference manual.
****
>
>Other than that little nitpick, I think you are confusing the defects in the
>implementation of the standard library with the true language defects.
>Additionally, I think to compare a 30+ year old language that was originally
>designed as a system implementation language with the modern general purpose
>languages of today is misleading. C met its goals and became wildly
>successful despite its flaws and is still very widely used even today.
****
Both the standard library and the language had serious problems. Some of them persist to
this day, such as the strcpy/strcat/sprintf buffer overrun problems.
Sadly, there are people today who use C as if it is K&R C, in spite of the fact that both
the language and the runtimes have matured. You can spot these people right away, because
they toss strcpy, strcat, and sprintf around like they ever made sense, they use silly
prefix names inside structures for fields (putting some prefix like "tmHour" does inside
the "tm" struct, because struct field names were global in K&R C and had to be globally
unique). And today, especially in Windows, we can add people who use the 'char' data type
as the natural representation of text.
C became successful because it was given away free on a bunch of machines. Quality of the
language or the runtime had nothing to do with its popularity.
And for those who thought C was a "successful" system implementation language: I was
working with a compiler company in the 1980s, and our compiler could not compile the Unix
kernel. Why? Because AFTER the compiler ran, for some modules it ran a sed script over
the generated code to change it, and our compiler assigned registers differently than the
AT&T C compiler, and broke all the sed scripts. Now it is really hard for me to take
seriously a "system implementation language" that requires hand-editing of the generated
code (no matter how automated this is made) in order to work in its specified role. So it
only barely made the grade.
Remember, VHS tapes succeeded over Beta tapes, also, but it didn't make VHS a better
quality format!
joe
****
>
>-Pete
>
>
>
>
>-Pete
>
>
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
2/1/2010 2:01:46 PM
|
|
If you look at functional programming closely, especially along the dimensions of concepts
such as "total" functions, "partial" functions, "pure" functions, and such, you see a
powerful evolution of the thinking of ways to do software. Sadly, none of these languages
has a significant following outside a few academics.
As a former compiler writer, functional languages contain massively interesting amounts of
information that can be used to create super-quality code; languages like C, C++ and even
C# have a lot of issues that prevent good code generation.
Personally, I'd love to be able to look back on languages like C, C++ and C# as the
antiquated artifacts they are. But if you can't use a motorcycle, you might at least use
a competition-quality racing bike, or even a really good 10-speed, instead of a
"bone-shaker" (look up the history of bicycles if you don't know what this means...)
joe
On Mon, 25 Jan 2010 15:36:37 -0500, Hector Santos <sant9442@nospam.gmail.com> wrote:
>Pete Delgado wrote:
>
> >> Joesph M. Newcomer wrote:
> >>
>
>>> Huh? I'd be embarassed to publish an algorithm that was based on K&R C.
>>> It represents the best of mediocre programming of thirty years ago.
>
> >> When I first read it, in 1975, I said "This language is really
> >> badly done",
>
>>
>> ..... I think you are confusing the defects in the
>> implementation of the standard library with the true language defects.
>> Additionally, I think to compare a 30+ year old language that was originally
>> designed as a system implementation language with the modern general purpose
>> languages of today is misleading. C met its goals and became wildly
>> successful despite its flaws and is still very widely used even today.
>
>
>Excellent point Pete.
>
>My opinion on a related note:
>
>I find it increasing harder to shallow how even today, people are
>getting locked into thinking or "mindset" molded that what you did in
>the past was all wrong and today's method is the better way. While
>one might be able explain why the "militants" of this mantra are they
>way they are, I just find the "stubbornness" very vexing, especially
>from those who are obviously veterans of the industry. Generally, it
>is the opposite.
>
>Again, a rhetorical opinion.
>
>I personally find it very odd that a dialect of the C language or any
>language for that matter, has been "kludge" to support a syntax that
>is cluttered with type casting.
>
>So as Joe might has stated 25 years ago in regards to C:
>
> "This language is really badly done"
>
>I can easily say the same thing today with today's C/C++ syntaxing to
>support unicode. I say that because 25 years from now, programmers
>will most likely be saying the same sort of thing:
>
> "This language is really badly done! It isn't
> necessary today. Why all the extra coding?"
>
>I only say that because I am firm believer in functional programming
>where such specifics are excluded from the overall "thinking" and
>interfacing, I/O transformation or function generators.
>
>This is closer and it also it is a rebirth direction of where MS and
>others are going with its languages. It is more "natural" per se,
>without the wasteful "think time" to address minor details one has to
>put into current C/C++ programming.
>
>These minor details are quite often SO "touchy" in its required
>construct that if one subtle mistake will create the same or worst
>problem that it attempted to resolved or be part of the two part solution.
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
2/1/2010 2:06:33 PM
|
|
Well, APL was my among my first and favorite among the 20+ or so
languages under my belt. It molded my problem solving and
programming thinking for everything that followed. As a current
compiler writer, I appreciate all languages for what they do or are. I
doubt anyone can design a perfect language for all - unless of course,
you wish to force it down their throats.
--
HLS
Joseph M. Newcomer wrote:
> If you look at functional programming closely, especially along the dimensions of concepts
> such as "total" functions, "partial" functions, "pure" functions, and such, you see a
> powerful evolution of the thinking of ways to do software. Sadly, none of these languages
> has a significant following outside a few academics.
>
> As a former compiler writer, functional languages contain massively interesting amounts of
> information that can be used to create super-quality code; languages like C, C++ and even
> C# have a lot of issues that prevent good code generation.
>
> Personally, I'd love to be able to look back on languages like C, C++ and C# as the
> antiquated artifacts they are. But if you can't use a motorcycle, you might at least use
> a competition-quality racing bike, or even a really good 10-speed, instead of a
> "bone-shaker" (look up the history of bicycles if you don't know what this means...)
> joe
>
> On Mon, 25 Jan 2010 15:36:37 -0500, Hector Santos <sant9442@nospam.gmail.com> wrote:
>
>> Pete Delgado wrote:
>>
>>>> Joesph M. Newcomer wrote:
>>>>
>>>> Huh? I'd be embarassed to publish an algorithm that was based on K&R C.
>>>> It represents the best of mediocre programming of thirty years ago.
>>>> When I first read it, in 1975, I said "This language is really
>>>> badly done",
>>> ..... I think you are confusing the defects in the
>>> implementation of the standard library with the true language defects.
>>> Additionally, I think to compare a 30+ year old language that was originally
>>> designed as a system implementation language with the modern general purpose
>>> languages of today is misleading. C met its goals and became wildly
>>> successful despite its flaws and is still very widely used even today.
>>
>> Excellent point Pete.
>>
>> My opinion on a related note:
>>
>> I find it increasing harder to shallow how even today, people are
>> getting locked into thinking or "mindset" molded that what you did in
>> the past was all wrong and today's method is the better way. While
>> one might be able explain why the "militants" of this mantra are they
>> way they are, I just find the "stubbornness" very vexing, especially
>>from those who are obviously veterans of the industry. Generally, it
>> is the opposite.
>>
>> Again, a rhetorical opinion.
>>
>> I personally find it very odd that a dialect of the C language or any
>> language for that matter, has been "kludge" to support a syntax that
>> is cluttered with type casting.
>>
>> So as Joe might has stated 25 years ago in regards to C:
>>
>> "This language is really badly done"
>>
>> I can easily say the same thing today with today's C/C++ syntaxing to
>> support unicode. I say that because 25 years from now, programmers
>> will most likely be saying the same sort of thing:
>>
>> "This language is really badly done! It isn't
>> necessary today. Why all the extra coding?"
>>
>> I only say that because I am firm believer in functional programming
>> where such specifics are excluded from the overall "thinking" and
>> interfacing, I/O transformation or function generators.
>>
>> This is closer and it also it is a rebirth direction of where MS and
>> others are going with its languages. It is more "natural" per se,
>> without the wasteful "think time" to address minor details one has to
>> put into current C/C++ programming.
>>
>> These minor details are quite often SO "touchy" in its required
>> construct that if one subtle mistake will create the same or worst
>> problem that it attempted to resolved or be part of the two part solution.
> Joseph M. Newcomer [MVP]
> email: newcomer@flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
2/2/2010 7:02:32 AM
|
|
I think the world will move more and more away from languages that compile
to native code and more and more towards intermediate code languages like
..NET. When you get to that point it's just a matter of choosing a syntax
you like and the result is the same.
C++ has a lot of momentum so I suspect it will be around for times when
native coding is required for a long while, but it will be tough for it to
compete with the simplicity of paradigms like C# and Java that are just
starting to live up to their promise.
I never did APL and from what I can see I didn't miss out on much. I never
wanted to learn a different keyboard.
Tom
"Hector Santos" <sant9442@nospam.gmail.com> wrote in message
news:#FAttW9oKHA.1548@TK2MSFTNGP02.phx.gbl...
> Well, APL was my among my first and favorite among the 20+ or so languages
> under my belt. It molded my problem solving and programming thinking for
> everything that followed. As a current compiler writer, I appreciate all
> languages for what they do or are. I doubt anyone can design a perfect
> language for all - unless of course, you wish to force it down their
> throats.
>
|
|
0
|
|
|
|
Reply
|
Tom
|
2/2/2010 7:25:51 AM
|
|
Tom Serface wrote:
> I think the world will move more and more away from languages that
> compile to native code and more and more towards intermediate code
> languages like .NET. When you get to that point it's just a matter of
> choosing a syntax you like and the result is the same.
Right. .NET is not the first. Its the basis of our product name sake,
"Wildcat! Interactive Net Server" (winserver.com) - a centralized and
virtual p-code "network" environment with a common language interface
framework. Same principles in single source managing the I/O, the
interfacing and of course, the security. Not comparing, .NET is
extremely rich, but the same ideas that others are doing as well, so I
agree its definitely a direction. Embedded languages into server
environments is big.
> C++ has a lot of momentum so I suspect it will be around for times when
> native coding is required for a long while, but it will be tough for it
> to compete with the simplicity of paradigms like C# and Java that are
> just starting to live up to their promise.
I agree. But I decided I am going to invest in C# for development
products but only provide the .NET interface and examples for our server.
> I never did APL and from what I can see I didn't miss out on much. I
> never wanted to learn a different keyboard.
Well, APL definitely faded due to its complexity perhaps, but I do
believe if you were privy to it, it definitely would of made you a
different person. :) APL was totally symbolic with ideas that are
being duplicated today such as greater focus with FP, refactoring and
prototype stacking.
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
2/2/2010 8:52:10 AM
|
|
Hector Santos wrote:
> I agree. But I decided I am going to invest in C# for development
> products but only provide the .NET interface and examples for our server.
Correction, "I am NOT...."
--
HLS
|
|
0
|
|
|
|
Reply
|
Hector
|
2/2/2010 9:00:27 AM
|
|
"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
news:j6ndm5d1ikaet1hahrb69tnuai1q0tcl52@4ax.com...
> See below...
> On Mon, 25 Jan 2010 14:08:28 -0500, "Pete Delgado"
> <Peter.Delgado@NoSpam.com> wrote:
>
>>
>>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in message
>>news:s33ll5l35qnknus8i1n06qitu1r5f750jv@4ax.com...
>>> On Fri, 22 Jan 2010 23:44:44 -0500, Hector Santos
>>> <sant9442@nospam.gmail.com> wrote:
>>>
>>>>Joseph M. Newcomer wrote:
>>>>
>>>>> This code looks like something from K&R C programming first edition.
>>>>
>>>>
>>>>HA! you shouldn't be ashame about it, Joey! You're too easy. :)
>>>
>>> Huh? I'd be embarassed to publish an algorithm that was based on K&R C.
>>> It represents
>>> the best of mediocre programming of thirty years ago. When I first read
>>> it, in 1975, I
>>> said "This language is really badly done",
>>
>>From my understanding, the book wasn't published until 1978! ;-) You may
>>have been thinking of the version of the C reference manual that was
>>provided with the Unix operating system.
> ****
> The book was a repackaging of the C reference manual.
> ****
From what I recall, the book greatly expanded on the C reference manual. The
C reference manual became the basis for the language reference towrd the end
of the book. I believe that Dennie Ritchie may still have links to the
original manuals on his site.
So at any rate, apparently it was the C reference manual that you are
referring to and *not* "The C Programming Language" as you had initially
suggested.
>>
>>Other than that little nitpick, I think you are confusing the defects in
>>the
>>implementation of the standard library with the true language defects.
>>Additionally, I think to compare a 30+ year old language that was
>>originally
>>designed as a system implementation language with the modern general
>>purpose
>>languages of today is misleading. C met its goals and became wildly
>>successful despite its flaws and is still very widely used even today.
> ****
> Both the standard library and the language had serious problems. Some of
> them persist to
> this day, such as the strcpy/strcat/sprintf buffer overrun problems.
Again, I agree that the standard library has problems, especially given what
we now know about security and the pathology of many security bugs, however
your criticisms again target only the standard library. What specific
criticisms do you have of the core language and its constructs? Personally,
I find the fact that certain keywords like "static" are overloaded without
regard to how the uses are related to be disturbing. I think that in
hindsight not having fixed sized data types has caused more portability
problems than the supposed effeciencies and that the terseness of the
language in the hands of some can lead to near incomprehensible code. Even
given these criticisms, I still find the language extremely useful.
> Sadly, there are people today who use C as if it is K&R C, in spite of the
> fact that both
> the language and the runtimes have matured. You can spot these people
> right away, because
> they toss strcpy, strcat, and sprintf around like they ever made sense,
> they use silly
> prefix names inside structures for fields (putting some prefix like
> "tmHour" does inside
> the "tm" struct, because struct field names were global in K&R C and had
> to be globally
> unique). And today, especially in Windows, we can add people who use the
> 'char' data type
> as the natural representation of text.
I'm not sure that I would condemn the engineering decisions made by other
professionals unless I first knew all the details and the reasons behind the
decisions made. I will say that I have seen a lot of code in my time with a
high WTF factor!
>
> C became successful because it was given away free on a bunch of machines.
> Quality of the
> language or the runtime had nothing to do with its popularity.
I respectfully disagree Joe.The fact that compilers were given away may have
helped early adoption of the language, but if the language didn't have the
qualities that were desired such as being expressive, it would have died
long ago and never been as popular as it became. I think Dennis Ritchie
summed it best when he stated "C is quirky, flawed, and an enormous
success."
>
> And for those who thought C was a "successful" system implementation
> language: I was
> working with a compiler company in the 1980s, and our compiler could not
> compile the Unix
> kernel. Why? Because AFTER the compiler ran, for some modules it ran a
> sed script over
> the generated code to change it, and our compiler assigned registers
> differently than the
> AT&T C compiler, and broke all the sed scripts. Now it is really hard for
> me to take
> seriously a "system implementation language" that requires hand-editing of
> the generated
> code (no matter how automated this is made) in order to work in its
> specified role. So it
> only barely made the grade.
While I am not privvy to all the details of your project, it seems to me
that a single or even a few pieces of contrary anecdotal evidence with
respect to a particular implementation do not diminish the overall successes
of the language.
>
> Remember, VHS tapes succeeded over Beta tapes, also, but it didn't make
> VHS a better
> quality format!
There are a number of reasons why VHS prevailed. To claim that Beta was a
clearly superior choice for consumers ignores the facts.
-Pete
|
|
0
|
|
|
|
Reply
|
Pete
|
2/2/2010 5:40:11 PM
|
|
I once passed a PhD qualifying exam based, I am absolutely convinced, on the fact that the
question which gave a piece of APL code was erroneous, and I pointed out that the could
not pssibly work (an incompatibility in the rho operator operands), then went on to say
"But if you meant to write <new APL expression>, then it would make sense, and in that
case the answer would be..." I know I blew two other questions, which should have
resulted in failure, but I passed. Only four of us did (the exam was commenmorated by a
song, which everyone of that era can still sing to this day, and we remember all the
words, nearly 40 years later; www.flounder.com/battle_hum.htm).
APL was a fascinating language, and the last time I wrote a serious APL program was about
1969.
There is no perfect language. C++ has been forced down our throats already, so I'm not
sure that forcible usage is a criterion. C# and Java are being forced down right now.
joe
On Tue, 02 Feb 2010 02:02:32 -0500, Hector Santos <sant9442@nospam.gmail.com> wrote:
>Well, APL was my among my first and favorite among the 20+ or so
>languages under my belt. It molded my problem solving and
>programming thinking for everything that followed. As a current
>compiler writer, I appreciate all languages for what they do or are. I
>doubt anyone can design a perfect language for all - unless of course,
>you wish to force it down their throats.
>
>--
>HLS
>
>Joseph M. Newcomer wrote:
>
>> If you look at functional programming closely, especially along the dimensions of concepts
>> such as "total" functions, "partial" functions, "pure" functions, and such, you see a
>> powerful evolution of the thinking of ways to do software. Sadly, none of these languages
>> has a significant following outside a few academics.
>>
>> As a former compiler writer, functional languages contain massively interesting amounts of
>> information that can be used to create super-quality code; languages like C, C++ and even
>> C# have a lot of issues that prevent good code generation.
>>
>> Personally, I'd love to be able to look back on languages like C, C++ and C# as the
>> antiquated artifacts they are. But if you can't use a motorcycle, you might at least use
>> a competition-quality racing bike, or even a really good 10-speed, instead of a
>> "bone-shaker" (look up the history of bicycles if you don't know what this means...)
>> joe
>>
>> On Mon, 25 Jan 2010 15:36:37 -0500, Hector Santos <sant9442@nospam.gmail.com> wrote:
>>
>>> Pete Delgado wrote:
>>>
>>>>> Joesph M. Newcomer wrote:
>>>>>
>>>>> Huh? I'd be embarassed to publish an algorithm that was based on K&R C.
>>>>> It represents the best of mediocre programming of thirty years ago.
>>>>> When I first read it, in 1975, I said "This language is really
>>>>> badly done",
>>>> ..... I think you are confusing the defects in the
>>>> implementation of the standard library with the true language defects.
>>>> Additionally, I think to compare a 30+ year old language that was originally
>>>> designed as a system implementation language with the modern general purpose
>>>> languages of today is misleading. C met its goals and became wildly
>>>> successful despite its flaws and is still very widely used even today.
>>>
>>> Excellent point Pete.
>>>
>>> My opinion on a related note:
>>>
>>> I find it increasing harder to shallow how even today, people are
>>> getting locked into thinking or "mindset" molded that what you did in
>>> the past was all wrong and today's method is the better way. While
>>> one might be able explain why the "militants" of this mantra are they
>>> way they are, I just find the "stubbornness" very vexing, especially
>>>from those who are obviously veterans of the industry. Generally, it
>>> is the opposite.
>>>
>>> Again, a rhetorical opinion.
>>>
>>> I personally find it very odd that a dialect of the C language or any
>>> language for that matter, has been "kludge" to support a syntax that
>>> is cluttered with type casting.
>>>
>>> So as Joe might has stated 25 years ago in regards to C:
>>>
>>> "This language is really badly done"
>>>
>>> I can easily say the same thing today with today's C/C++ syntaxing to
>>> support unicode. I say that because 25 years from now, programmers
>>> will most likely be saying the same sort of thing:
>>>
>>> "This language is really badly done! It isn't
>>> necessary today. Why all the extra coding?"
>>>
>>> I only say that because I am firm believer in functional programming
>>> where such specifics are excluded from the overall "thinking" and
>>> interfacing, I/O transformation or function generators.
>>>
>>> This is closer and it also it is a rebirth direction of where MS and
>>> others are going with its languages. It is more "natural" per se,
>>> without the wasteful "think time" to address minor details one has to
>>> put into current C/C++ programming.
>>>
>>> These minor details are quite often SO "touchy" in its required
>>> construct that if one subtle mistake will create the same or worst
>>> problem that it attempted to resolved or be part of the two part solution.
>> Joseph M. Newcomer [MVP]
>> email: newcomer@flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
|
|
0
|
|
|
|
Reply
|
Joseph
|
2/2/2010 8:32:03 PM
|
|
I've been doing more and more C# programming. I find it to be a very rapid
development environment, but not so much because of the syntax. There is
simply more "stuff" readily available in libraries and the VS IDE works so
much better with it. All I've done are dialog type applications with
WinForms and WebForms, so I suspect it may fail some (or not be as easy) to
do SDI or MDI type applications.
Tom
"Hector Santos" <sant9442@nospam.gmail.com> wrote in message
news:u42dmY#oKHA.4648@TK2MSFTNGP06.phx.gbl...
> Hector Santos wrote:
>
>> I agree. But I decided I am going to invest in C# for development
>> products but only provide the .NET interface and examples for our server.
>
>
> Correction, "I am NOT...."
>
> --
> HLS
|
|
0
|
|
|
|
Reply
|
Tom
|
2/2/2010 10:03:53 PM
|
|
|
62 Replies
996 Views
(page loaded in 2.546 seconds)
Similiar Articles: Parsing CSV files - microsoft.public.vc.mfcWhat is the easiest way of reading a line at a time through a textual CSV file, and then extracting the comma-separated elements from each line? ... Reading csv file and parsing comma delimited values in Array ...I am reading a .CSV file using FileSystemObject, and one of my field (std_num ) which has null values for the first 15 rows but have a number value a... import csv and parse? - microsoft.public.accessHi everyone, You've been a great help to me in the past. I hope someone can here too. I have a database to which I would like to import data from a csv file. How to ignore embedded linefeed in CSV quoted text fields ...Parsing CSV files - microsoft.public.vc.mfc... the fields with double quotes and that MAY be enough for escaping embedded ... is - a line feed ... Reading Text Files w/commas in the data - microsoft.public.dotnet ...Reading Text Files w/commas in the data - microsoft.public.dotnet ... Access import CSV file with comma issue - microsoft.public.access ... Reading csv file and parsing ... Problem parsing log file using switch - microsoft.public.windows ...Parsing CSV files - microsoft.public.vc.mfc... switch(*p ... ve been parsing a csv download file using the ... I had this problem, and saw others had it as well; I ... Get-Content String Pattern Matching Using CSV File - microsoft ...Parsing CSV files - microsoft.public.vc.mfc Parsing CSV files - microsoft.public.vc.mfc Can someone help; pipe delimited text import not working Members ... Copy files based on filename parse - microsoft.public.windows ...Parsing CSV files - microsoft.public.vc.mfc... not handle boundaries properly; the "copy ... In parsing files, therefore, we learned early on that ... eHow.com Comma ... Parsing XML in string variable? - microsoft.public.dotnet ...Reading csv file and parsing comma delimited values in Array ... Parsing XML in string variable? - microsoft.public.dotnet ... Reading csv file and parsing comma delimited ... How can I read CSV file using VBA? - microsoft.public.access ...Reading csv file and parsing comma delimited values in Array ... How can I read CSV file using VBA? - microsoft.public.access ... Reading csv file and parsing comma ... Parsing a network path - microsoft.public.accessParsing CSV files - microsoft.public.vc.mfc Parsing a network path - microsoft.public.access Hi, Gang, I am looking for help in parsing out a file name from a file path. ... vba line input not recognizing end of line - microsoft.public.vb ...For years I've been parsing a csv download file using the standard vba line input function. Well, the download file is still a csv, but longer has CRLF, just LF. Export-CSV - Why does it put quotes around *everything ...What do I put in the update to part of the query to ... Parsing CSV files - microsoft.public.vc.mfc >For ... - Free ..... field, but NO double quotes around numeric fields ... Problem when opening CSV files after update - microsoft.public.mac ...Read in a Comma-Separated Values File. The Import-Csv ... no problem on reading CSV file. But, I am changing the csv file's 1 ... Have you thought about opening and parsing ... force reading csv fields as text - microsoft.public.scripting ...Parsing CSV files - microsoft.public.vc.mfc... or help good csv >reading. A good practice it surround the fields ... you wish to force it ... eHow.com CSV (comma ... start-job fails but interactive process works - microsoft.public ...Parsing CSV files - microsoft.public.vc.mfc start-job fails but interactive process works - microsoft.public ... Parsing CSV files - microsoft.public.vc.mfc This is a ... increasing left coumn causes first column to become too narrow ...Parsing CSV files - microsoft.public.vc.mfc The technology was meant not only to increase ... what it is - a carriage return (move it to the first column ... did not ... Reading large csv-file and removing duplicates - microsoft.public ...Parsing CSV files - microsoft.public.vc.mfc What is the easiest way of reading a line at a time through a textual CSV file ... Processing large files with ... Is there a simple, free, method to send files using FTP ...Is there a simple, free, method to send files using FTP ... Parsing CSV files - microsoft.public.vc.mfc If you going to display a text file on the screen or send it to a ... Not recognizing ppt files - microsoft.public.powerpoint ...Parsing CSV files - microsoft.public.vc.mfc I recently did a project (my PowerPoint Indexer ... vba line input not recognizing end of line - microsoft ... years I've been ... Parsing CSV files - Welcome to perlmeme.orgParsing CSV files. In this tutorial you will learn how to parse a simple CSV (comma separated values) file. This is the sort of file produced by spreadsheets and ... How to Parse a CSV File | eHow.comComma Separated Values (CSV) files use a common, text-based format for sharing spreadsheet and database information between different applications and over the ... CSV Comma Separated Value File Format - How To - Creativyst ...Comma Separated Values (CSV) File format - Creativyst, Inc. Docs - How to: parse and convert CSV files to XML. by John Repici How to Parse CSV in Perl | eHow.comCSV (comma separated values) files contain text or numeric fields that are separated by commas. It is a common file format used to exchange data between spreadsheet ... Managed Extensions: Parsing CSV Files with Regular ExpressionsTom Archer illustrates how to parse comma-delimited text—when the data contains quotes, commas, and blanks—and return the data in a managed array. Know Dot Net - Creating and Parsing Excel Compatible CSV FilesHow do I create and parse a true Excel Compatible CSV file? Excel is very particular about how it creates and parses CSV files. Some folks have the idea that Excel ... Regex Pattern for Parsing CSV files with Embedded commas, double ...a website of music, thought, worship, community and technology with leader and writer, Kim Anthony Gentes A Parsing CSV File Routine by Steven JacobsIf any of you already know, parsing a CSV file is a pain in the !@#*! However, some people love to export their data via this format. So, I had a little time on my ... Parsing CSV files in C# - Stack OverflowIs there a default/official/recommended way to parse CSV files in C#? I don't want to roll my own parser. Also, I've seen instances of people using ODBC/OLE DB to ... A Fast CSV Reader - CodeProjectDownload source files for .NET 2.0 - 540 KB; Download binaries for .NET 2.0 - 23.8 KB; Download Profiler data - 5.09 KB; Introduction. One would imagine that parsing ... 7/24/2012 11:55:39 AM
|