Processing large files with TextFieldParser

ASP.net 3.5/SQL Server 2008

I've got a large (1.1m rows) csv file which needs parsing and sticking into 
sql server - the job needs doing every day and if anything the csv will get 
larger over time. At present I'm using a TextFieldParser to parse the csv 
line by line and add to the database. This fails probably 2 times in 3, if 
it's going to fall over it's usually at around 200,000 lines. Looking for 
suggestions as to how to do this robustly, on a shared server which doesn't 
allow bulk insert. Fair to assume the server is a factor in failure but I 
can't upgrade just yet.

Would I be better breaking the csv into say 5 seperate files then processing 
each individually or processing in chunks, eg

if TextFieldParser.LineNumber < 200,000 then
process first chunk
end if

if TextFieldParser.LineNumber > 200000 and TextFieldParser.LineNumber 
<400000 then
process next chunk
end if

etc.

Or something else entirely?

Cheers,
Jon




0
Jon
11/30/2009 5:09:23 PM
dotnet.framework.aspnet 1425 articles. 0 followers. Follow

3 Replies
1458 Views

Similar Articles

[PageSpeed] 45

"Jon Spivey" <js@nisusnewmedia.com> wrote in
news:u#Orc$dcKHA.1028@TK2MSFTNGP06.phx.gbl: 

> I've got a large (1.1m rows) csv file which needs parsing and sticking
> into sql server - the job needs doing every day and if anything the
> csv will get larger over time. At present I'm using a TextFieldParser
> to parse the csv line by line and add to the database. This fails
> probably 2 times in 3, if it's going to fall over it's usually at
> around 200,000 lines. Looking for suggestions as to how to do this
> robustly, on a shared server which doesn't allow bulk insert. Fair to
> assume the server is a factor in failure but I can't upgrade just yet.

Having done this numerous times, I find the best way is to use a 
StreamReader and read in line by line, esp. with large files, as trying 
to store everything in memory (whether DataSet or objects) ends up 
unwieldy.

With a good regex, you can divide out the elements, even if there is a 
text delimiter (usually some form of quote). I have written my own, but 
I would not be surprised if there are others.

Another direction to conquer this, as you are storing in SQL Server, is 
to use SSIS (or DTS in older versions). SSIS has the ability to read a 
CSV file. 

If this is a file format you can set up a BCP file for, you can bulk 
load the items into SQL Server, as well. NOTE that this will not work if 
you have to manipulate the CSV flat file into multiple tables, however.

In the past, I architected a system that had multiple GB files that had 
to be manipulated. The solution was to leave the data in flat files and 
manipulate out into files that mimicked SQL Server tables. I then 
incremented the IDENTITY values and seeded the flat files. This required 
many passes and some file sorts to get things into SQL Server, so it is 
overkill if the file is very predictable and/or does not require 
extensive manipulation.

Peace and Grace,


-- 
Gregory A. Beamer (MVP)

Twitter: @gbworld
Blog: http://gregorybeamer.spaces.live.com

*******************************************
|      Think outside the box!             |
*******************************************
0
Gregory
11/30/2009 6:23:35 PM
Hi Greg,
Thanks for your reply. I've gone with a streamreader as you suggested, 
tested with the 1st csv and it worked perfectly. Going to test it a few more 
times to be sure but it certainly seems to be the solution.

Cheers,
Jon

"Gregory A. Beamer" <NoSpamMgbworld@comcast.netNoSpamM> wrote in message 
news:Xns9CD37DC4031Bgbworld@207.46.248.16...
> "Jon Spivey" <js@nisusnewmedia.com> wrote in
> news:u#Orc$dcKHA.1028@TK2MSFTNGP06.phx.gbl:
>
>> I've got a large (1.1m rows) csv file which needs parsing and sticking
>> into sql server - the job needs doing every day and if anything the
>> csv will get larger over time. At present I'm using a TextFieldParser
>> to parse the csv line by line and add to the database. This fails
>> probably 2 times in 3, if it's going to fall over it's usually at
>> around 200,000 lines. Looking for suggestions as to how to do this
>> robustly, on a shared server which doesn't allow bulk insert. Fair to
>> assume the server is a factor in failure but I can't upgrade just yet.
>
> Having done this numerous times, I find the best way is to use a
> StreamReader and read in line by line, esp. with large files, as trying
> to store everything in memory (whether DataSet or objects) ends up
> unwieldy.
>
> With a good regex, you can divide out the elements, even if there is a
> text delimiter (usually some form of quote). I have written my own, but
> I would not be surprised if there are others.
>
> Another direction to conquer this, as you are storing in SQL Server, is
> to use SSIS (or DTS in older versions). SSIS has the ability to read a
> CSV file.
>
> If this is a file format you can set up a BCP file for, you can bulk
> load the items into SQL Server, as well. NOTE that this will not work if
> you have to manipulate the CSV flat file into multiple tables, however.
>
> In the past, I architected a system that had multiple GB files that had
> to be manipulated. The solution was to leave the data in flat files and
> manipulate out into files that mimicked SQL Server tables. I then
> incremented the IDENTITY values and seeded the flat files. This required
> many passes and some file sorts to get things into SQL Server, so it is
> overkill if the file is very predictable and/or does not require
> extensive manipulation.
>
> Peace and Grace,
>
>
> -- 
> Gregory A. Beamer (MVP)
>
> Twitter: @gbworld
> Blog: http://gregorybeamer.spaces.live.com
>
> *******************************************
> |      Think outside the box!             |
> ******************************************* 


0
Jon
12/1/2009 6:39:11 PM
"Jon Spivey" <js@nisusnewmedia.com> wrote in
news:u3U3RWrcKHA.5472@TK2MSFTNGP02.phx.gbl: 

> Thanks for your reply. I've gone with a streamreader as you suggested,
> tested with the 1st csv and it worked perfectly. Going to test it a
> few more times to be sure but it certainly seems to be the solution.

The stream only has the overhead of the buffer, so it works very well when 
working with data that can be streamed. When you are working with files, 
you generally work one row at a time, so it is a perfect solution in the 
cases where you are simply grabbing records and putting them elsewhere.

Peace and Grace,

-- 
Gregory A. Beamer (MVP)

Twitter: @gbworld
Blog: http://gregorybeamer.spaces.live.com

*******************************************
|      Think outside the box!             |
*******************************************
0
Gregory
12/1/2009 10:02:36 PM
Reply:

Similar Artilces:

Opening a file in a macro (network)
I have a macro that requires opening a file in My Documents. This may be a networking question that may require onsite help but what I was wondering is if there is a way in the code to specify to open a file in My Documents from "the computer you're on" for example (not sure how to word that). I'm trying to avoid having to create a macro for each computer because I will have the same file stored in My Documents on each computer. Does that make sense? For example, the code right now reads (for the computer I am currently on - "Blinds etc" is how the computer i...

Inserting files from outlook
Does anyone know how to insert files or attachments from an outlook mail box to access? I would appreciate any help.. Thanks, Terry Carroll Terrence Carroll wrote: >Does anyone know how to insert files or attachments from an outlook mail box >to access? I would appreciate any help.. > >Thanks, > >Terry Carroll Why not just leave them in Outlook and link to them? You can link to your Inbox in Outlook by using File, Get External Data, Link... and choose Outlook as the file type and then navigate to whatever mail folder you want. -- Message posted...

Backup fails at STM file
We are having problems with our backups on exchange 2003. we have 8 seperate stores and just recently we noticed that our tlogs were not being purged. so we after some digging and testing (removing and or disabling all anitvirus and removing our Netvault backup system). We used NTbackup and we have about 4 of our stores not backing up, it fails when it starts to backup the particular sotres STM file. Any ideas on how to check the integrity of teh STM file, or is the problem somewhere else? Thanks Check your event logs. Do you have any errors listed indicating database corruption? ...

Resources in a .LIB File
I have some code logic, included a number of resources, that I'd like to compile to a .LIB file to make it easy to use in several different projects. I haven't compiled a .LIB file since the DOS days. Does anyone know if .LIB files can contain resources in addition to code, and if there are any potential issues with storing code and resources this way? Thanks. -- Jonathan Wood SoftCircuits Programming http://www.softcircuits.com > I haven't compiled a .LIB file since the DOS days. Does anyone know if > .LIB files can contain resources in addition to code, and if there ...

embedding flash file
Hi every one, I would like to embed a flash file in my excel document so that when I open the file I can view the player playing the clip automatically. Can some one please advice me how to do this? Thanks yakis Look at Insert | Object | From File I believe that with current restrictions on ActiveX objects, you may have to double-click on the displayed object to actually begin playing it. At least I did in the test I performed, inserting a .avi file into a sheet. "yakis" wrote: > Hi every one, > I would like to embed a flash file in my excel document so that when I open...

Copy from another excel file
I copy cell range A1:H1 from excel file X to cells range A1:H1 in excel file Y. But then, when I simply try to copy cells range A1:H1 in file Y to A2:A199,H2:H100 to create the same connection of cells in Y with cell in X it does not copy appropriately. in other words, it still copy range A1:H1 ..and shows copied cells as $A$1:$H$1.... when i tried to eliminate $ sign, it showed in the cell the formula-link itself instead of actual number how to make parallel copying from another excel file work ? Hi not really sure what your issue is? normally it sould be sufficient to enter = in a s...

Transferring Mail File
I want to transfer my email folders from my Windows XP computer to my Vista computer. The mail folders have a .dbx extension. I found their path in XP which uses Outlook Express and copied them to a thumb drive but i can't seem to get the folders onto the Vista computer which uses Windows Mail.. I found the path of Windows mail and I tried to copy the files to this folder but that didn't work. I have many emails in a "saved mail" folder and I really would like to get these emails moved to my Vista Windows Mail computer. Can you help? Thanks ...

Your files are attached and ready to send with this message.
WHILE SENDING FROM A WEB PAGE "FILE - SEND - LINK BY EMAIL. THE FOLLOWING MESSAGE IS ENTERED IN MY MESSAGE. HOW DO YOU STOP THIS. "Your files are attached and ready to send with this message." Delete it --� Milly Staples [MVP - Outlook] Post all replies to the group to keep the discussion intact. Due to the Swen virus, all e-mails sent to my actual account will be deleted w/out reading. After searching google.groups.com and finding no answer anonymous@discussions.microsoft.com <anonymous@discussions.microsoft.com> asked: | WHILE SENDING FROM A WEB PAGE "FILE...

shortcut files
Hi...I'm using Excel 2000 and I have a new hard drive (in addition to my main drive). Since we installed the new drive and my data was moved over, none of my shortcuts to my excel files work (even though I redirected them). Excel opens, but the file does not. In fact, if I double click on the file it will not open. There are only two ways to open the file. One is through the File Menu when Excel is open, and the other is to drag the file icon onto Excel. This is very irritating! Any ideas? Thanks. Jay The standard fixes for this problem....... Tools>Options>General u...

Personal Files #2
I moved a personal folder from a machine to a network so the machine could be reloaded. After reloading the software I copied the file back but when I open it in Outlook it looks like a new folder with just deleted items. The file still shows about 750mb in size but it's content will not display. Anyone have any ideas? Thanks ...

outlook .pst files appear empty
I have been trying to open, import, and even restore backup for a .pst file I made before reinstalling Windows on my computer. The import seems to work correctly but nothing changes. What is happening. I tried to install them on another computer running the same version of Outlook and the same thing. The file size is large enought to tell me something is there but........I don't know what to do. So if you open the files in Outlook using File | Open | Outlook Data File (command may vary if you're using an older version), you still get empty files in your Folder List? -- Jo...

OL2000 .nick file and fresh install of OL2003
Have a user who's been running OL2000 under W98SE for the past few years, who extensively uses the autocomplete function. I have brand new pc's coming in that will all be loaded with XP and OL2003. Will do the ..pst file copy off the old system and then do the .pst copy onto the new system, etc, etc. Is it possible to utilize any of the cached infro from the .nick of the OL2000 system on the OL2003 install? I know you move from a .nick format to a .nk2 on the change, but not clear if you can save any of the old info, or if the user will have to start on a fresh build of cached nicknam...

Re: Remove file from cache (VC++ 6)
I am writing a program that writes a backup file to a location selected by the user (potentially a diskette). Once the backup file is written, I would like to re-read the file to ensure that the backup media is OK. The problem is that the file just written is still located in the file cache and any attempt to read it just reads it from the cache (not the media itself). I would like to know if there is a way to remove this file from the cache prior to reading it to ensure the file on the media is good. Thanks in advance for any info you can give me on this. - Robert "Robert Ed...

Sheet.xlt and Book.xlt File Locations
I'm running XP with Excel 2002 SP2. I find one set of sheet.xlt and book.xlt files in a folder named c:\mark when I search all of my c: drive (inlcuding hidden and system folders). The default sheet template contains a footer I'd like to modify. If I modify the only sheet.xlt file on my hard drive and save it to the same place (or any of the many "\template" or "\xlstart" folders scattered across my PC) there is no change in the default sheet when I open a new instance of Excel. Any suggestions to get control of the system? Thanks, Mark If you open a new wor...

Error Message when I try to open Excel files
Just got a new computer with Windows XP Home Edition. The old compute was Windows 98. Whenever I try and open any of my Excel 9 spreadsheets, I get error message: "A document with the same '******.xls' is already open. You canno open two documents with the same name, even if the documents are i different folders. To open the second document, either close th document that's currently open, or rename one of the documents." If I click OK the spreadsheet is ready. It apparently wants to open the same file twice. Any suggestions a this is becoming aggrevating and somet...

Why can't I convert quatro pro .wb3 file in Excel #2
I have trid everything suggested in Excel and on-line but can't convert my *.wb3 file in Excel 2003. Downloaded the Office converter file and ran it, still nothing appropriate in add-ins. What am I doing wrong? Thanks, Tom ...

No Log Files can be Truncated-Exchange 2003 Backup
Hi all, I recently did a full backup of my A and B information Stores in which A o went fine and truncated all the logfiles but on the B information store it said in the event viewer that "No log files can be truncated and all the log files are there. Any ideas what is going on and what can be done. -Aman On Feb 5, 4:04 pm, Andy David {MVP} <ada...@pleasekeepinngcheesebucket.com> wrote: > On 5 Feb 2007 15:51:25 -0800, "Barundi" <pan...@gmail.com> wrote: > > >Hi all, > > >I recently did a full backup of my A and B information Stores in which ...

How can I locate links in an excel file and fix if incorrect?
I have a large excel file with some embedded links to other files. I am trying to find a way to locate the links and make corrections as the files they were linked to need to be changed. Any thoughts on how to locate them quickly, correct them and test to make sure links are gone? Download Bill Manville's FINDLINK.XLA from Stephen Bullen's website. http://www.bmsltd.ie/MVP/Default.htm Dave; Thanks for the suggestion. Went to the site, downloaded the file looked for the links with "find links" and got a answer that no links were found, even though when you select ed...

Excel file association slow on opening worksheets
Hi, One of my client's PCs has this problem. If Excel is opened first and a saved worksheet is opened, it opens fast, But, if the file (.xls) is opened from explorer (or a shortcut), then Excel takes about two minutes to open and then the document is there. I've reset the file association, with no effect. Anyone any ideas ? Regards, Brian Google finds: http://www.mcse.ms/message2515857.html Slow opening Excel files through explorer http://www.mcse.ms/message1229458.html Slow Opening Files, Explorer Slow, Context click slow FIXED!!! ...

Text spreadsheet becomes too large
I was sent an excel file by a member of our accounting dept. and while it is not very large at all it has grown to 134 mb even though it contains no graphs, imbedded objects, special formatting or anything of that nature that would cause it to grow this large. Does anyone know how this can happen. The file has 3 tabs and only uses about 20-30 rows on each tab with nothing other than text but it has grown to 134 mb. Is there something behind the scenes that could be causing this or anything I can look at to find out why this file has become so large. Thanks! Hi see: http://www.con...

can i insert a pdf file as a comment?to appear only when clicked?
the same way one can enter a comment in excel and a viewer can only see that if they click on that particular cell-can i enter a pdf file the same way? if someone should click on a certain cell, a pdf file will pop up? You mean like using a hyperlink to the PDF file? dl wrote: > the same way one can enter a comment in excel and a viewer can only see that > if they click on that particular cell-can i enter a pdf file the same way? if > someone should click on a certain cell, a pdf file will pop up? ...

Outlook Data Files being automatically created
Hello, I have a user who has about 7 PST's and only created 1. We have been able to delete the other 6 in the past, but they eventually come back without any user creation. This is Outlook 2002, and on an Exchange 5.5 server. Any ideas on how to stop this? He is the only one with this issue in an environment with over 100 users. Thanks, Jeff Jeff <anonymous@discussions.microsoft.com> wrote: > I have a user who has about 7 PST's and only created 1. We > have been able to delete the other 6 in the past, but they > eventually come back without any user creation. ...

OFX FILES and SMALL BUSINESS ACCOUNTING
I have 2 questions: Does anyone here know if there is a group for Small Business Accounting? I can't find it. I download transactions into Money 2006 by selecting manual downloads. My bank is sends .OFX formatted data which updates the transactions. I also Use Small Business Accounting. My bank does not support Small Business Act. Nor can they provide me an .ofx file. They suggest capturing the .ofx download into Money, saving it, so it can them be uploaded into Small Business Accounting. Does anyone have anyone an idea how to do this? I have not been able to sort anything out or c...

How can I open a text file to the screen
Can anybody tell me how to open (print)a text file to the screen. For example, to click a botton and the file will be shown in the screen. Using something like the code below.(the code below is not good, but I want to give you an idea) private: System::Void ShwCrdBtn_Click(System::Object^ sender, System::EventArgs^ e) { FileInfo^ fi = gcnew FileInfo("C:\\MyFile.txt"); fi->OpenRead(); } -- Thanks Allen Basically, there are several things you can do depending on exactly what you want the result to look like. If the only thing you want to be vis...

Large Quantity of Data, Graphed in Time Intervals
I have a very large set of data (over 20,000 points) listed in minute intervals. I'd like to be able to create charts using different time intervals such as 5 minutes, 15 minutes etc. by using a pull down menu to select the interval and have the graph make itself. Is this possible? You can use a pivot table to group times by such intervals. - Jon ------- Jon Peltier, Microsoft Excel MVP Tutorials and Custom Solutions http://PeltierTech.com _______ "Kirsten" <Kirsten@discussions.microsoft.com> wrote in message news:811D6E3E-C457-4D36-BA47-1C835D6D670D@microsoft.com....