Chunking out data from a huge xml file (Ajax)

Hi
I am faced with quite a challenge.  I need to open a 70-100 meg file and be 
able to chunk it out using AJAX back to the client but that isn't my problem 
really.  What I need to do is open the file and get pieces of it out without 
loading the entire thing into memory.  The pieces themselves are random 
although of a fixed size.  If I tried to read an entire file into a stirng 
and parse pieces out I use too much memory and if I use the xmlTextReader and 
the skip method my memory problems are solved but it creates a huge 
performance issue.  If I don't have to I don't want to write my own parser 
and try and read and track the tags byte by byte.  Basically I would love to 
be able to go to a point in my file and read from it saving only the position 
of my cursor in the file (but not the actual cursor because of the sharing of 
session issues)

Any help would be greatly appreciated and I apologize if this double posted. 
 I asked this question this morning but I haven't seen if come up in the past 
few hours so I am trying again.



thanks
0
adhag (5)
1/3/2006 6:47:37 PM
dotnet.xml 7266 articles. 0 followers. Follow

5 Replies
453 Views

Similar Articles

[PageSpeed] 20

Maybe XML is just not the right storage. If you have a way to preprocess 
your data and store the pieces into a data store that can index it 
efficiently (a database, smaller xml files indexes by a directory structure, 
etc.), you will be much better off.

Writing your parser won't help. The .NET classes benefit from man years of 
work, so you'll have to work really hard to get better perf (and you can 
only expect a small percentage, not a major factor).

My 2 cents.

Bruno

"adhag" <adhag@discussions.microsoft.com> a �crit dans le message de news: 
5D7303DC-A768-4D59-AAAF-73DCDE8B3A6D@microsoft.com...
> Hi
> I am faced with quite a challenge.  I need to open a 70-100 meg file and 
> be
> able to chunk it out using AJAX back to the client but that isn't my 
> problem
> really.  What I need to do is open the file and get pieces of it out 
> without
> loading the entire thing into memory.  The pieces themselves are random
> although of a fixed size.  If I tried to read an entire file into a stirng
> and parse pieces out I use too much memory and if I use the xmlTextReader 
> and
> the skip method my memory problems are solved but it creates a huge
> performance issue.  If I don't have to I don't want to write my own parser
> and try and read and track the tags byte by byte.  Basically I would love 
> to
> be able to go to a point in my file and read from it saving only the 
> position
> of my cursor in the file (but not the actual cursor because of the sharing 
> of
> session issues)
>
> Any help would be greatly appreciated and I apologize if this double 
> posted.
> I asked this question this morning but I haven't seen if come up in the 
> past
> few hours so I am trying again.
>
>
>
> thanks 


0
bjouhier (10)
1/4/2006 8:59:35 PM
The thing is that the files reside in xml format on disc at a given location. 
 The size is variable but can be very large.  This I cannot change.  What I 
need to do though is pull back parts of the file without having to load the 
whole file into memory which you can do with the xmlTextReader but I cannot 
take the performance hit of skipping nodes until I pass the ones already sent 
to the client because the file could get very very large.  If I was to parse 
something myself it would be a painful process but I do know the opening 
tag's name so I would have to look for it via a pattern matching of bytes.  
There has to be a better way than this though.

Thanks.

0
adhag (5)
1/4/2006 10:22:02 PM
"adhag" <adhag@discussions.microsoft.com> a �crit dans le message de news: 
1AF7A04C-49DF-4D95-9E9C-D1EDBB51CCFB@microsoft.com...
> The thing is that the files reside in xml format on disc at a given 
> location.
> The size is variable but can be very large.  This I cannot change.  What I
> need to do though is pull back parts of the file without having to load 
> the
> whole file into memory which you can do with the xmlTextReader but I 
> cannot
> take the performance hit of skipping nodes until I pass the ones already 
> sent
> to the client because the file could get very very large.  If I was to 
> parse
> something myself it would be a painful process but I do know the opening
> tag's name so I would have to look for it via a pattern matching of bytes.
> There has to be a better way than this though.

There is no free lunch. If you want fast indexing, you have to organize your 
storage so that it can be indexed efficiently, and XML is just not designed 
for fast indexing from file (without loading the data in memory)

As I said before, you have to use some other indexing mechanism (a database 
or directories, or a hashtable where you index the end offsets of the XML 
fragments, or something else) but if you only have the big XML file on disk 
and don't want to do any kind of preprocessing to build an index on it, 
there is not much hope!

Bruno

>
> Thanks.
> 


0
bjouhier (10)
1/5/2006 10:23:00 PM
There is a new XML processing model/API that may work for the case
you described, it is called vtd-xml (http://vtd-xml.sf.net) it consumes
less
memory (5x less than DOM ) and retains random access and is 10x
faster than DOM. It is perfect when you want to grab a chunk of xml
out of the file (is that what you mean by chunking) a demo is at
http://vtd-xml.sf.net/demo.html

0
1/8/2006 10:22:08 PM
This is interesting info but according to the link, "Its memory usage is 
typically between 1.3x~1.5x the size of the XML document, with 1 being the 
XML itself". So, the 70-100 MB file will use at least 100 MB of memory. Not 
sure this is the answer.

Bruno.

<jzhang@ximpleware.com> a �crit dans le message de news: 
1136758928.389315.93840@o13g2000cwo.googlegroups.com...
> There is a new XML processing model/API that may work for the case
> you described, it is called vtd-xml (http://vtd-xml.sf.net) it consumes
> less
> memory (5x less than DOM ) and retains random access and is 10x
> faster than DOM. It is perfect when you want to grab a chunk of xml
> out of the file (is that what you mean by chunking) a demo is at
> http://vtd-xml.sf.net/demo.html
> 


0
bjouhier (10)
1/9/2006 7:00:40 PM
Reply:

Similar Artilces:

How to transform HTTP query string (HTML data) to xml?
Dear experts, I need to send a simple HTML form to an ASP.NET page which has to create a xml object. What technique would you recommend me to use in order to transform a HTML form data to xml? What naming convention for the input fields would be best so that I can easily parse the query string in the asp page and create the xml object? I also dispose of the .xsd schema of the xml that must be created if that could be of some use. At every request the .xsd could change so I need some general algorithm. I would greatly appreciate your help. ...

Quicken file conversion problems
I'm trying to test out the new 2004 by importing my quicken file throught the initial Wizard. Only problem it I get the following error message: Your Quicken file could not be converted. Money could not convert your Quicken file. You might have run out of disk space or system memory. Try closing other programs and making sure the disk you are copying your file to has enough space. Then try converting the file again. Can anyone help? TIA In microsoft.public.money, Scott wrote: >I'm trying to test out the new 2004 by importing my >quicken file throught the initial Wiz...

Distribution Files for MFC in VS .net 2003
I have an app that was developed under Visual Studio VC++ 6.0 & MFC and deployed at our customer site. I have since installed Visual Studio .NET 2003 and rebuilt the application under the new development environment. I wanted to do a quick test of the executable and copied it up to a test machine and tried to run it. I got a series of "unable to find xxx.dll" errors. I realized that none of the new DLLs were installed on the test machine. Does anyone have a pointer to a good document on what distribution files are required? I know about the obvious ones like MFC71.DLL...

Save formatted text from RichEdit control to rtf-file
Hi , How can I save the text from Rich edit control (2.0) to *.rtf , *.txt , *.doc I tried to get the buffer and putting the buffer to file, then saving the file but the text in the file is something different. Please let me know what to do? Here is the Code I ma using: mFile.Seek( 0, CFile::begin ); CString cBuffer2; int iTotalTextLength = m_oChatMessageControl.GetWindowTextLength(); HWND focusWnd = ::GetFocus(); m_oChatMessageControl.HideSelection(TRUE, TRUE); m_oChatMessageControl.SetSel(iTotalTextLength, iTotalTextLength); cBuffer2 = m_oChatMessageControl.GetSelText(); LPTSTR...

file cloning
I was wondering. How come it is possible to clone a file (using right click copy/paste file), but not possible to do this for other documents (apps and clip-art etc)?. Could it be possible to have an add-in in excel to prevent people from copying documents on their desktop?? -- shnim1 You can copy files that way (rightclick|copy, rightclick|paste). But most windows applications are no longer just simple .exe files (like back in the old DOS days). They usually have tons of other stuff that gets installed with them--and that stuff gets scattered all over your harddrive (windows folder, wi...

converting tabular structures in a Word document into an actual table or reading data from the tabular structures using VBA code
I have a macro which can read the last cell/column of all tables in a Word 2003/2007 document and store the data in an MS-Access table. But, some Word documents have the data in structures like a table format but are not actually tables. The structure looks like a table, but the table borders are actually line connectors. These documents were created by a software(VeryPDF PDF to Word converter) which converted the PDF documents(the original format these documents were) into Word documents. 1. Is there a way I can convert/replace the tabular structures with actual tables in Word so t...

Twist Data?...
Hi all... I've got the following in a dataset: 19 30 1200 FI FI 20030906 36 19 30 1324 FI FI 20030906 36 and I want the following result: 19 30 1200, 1324 FI FI 20030906 26 Any guess on how to do this? thanks much!... Hmmm... Not quite clear: The data is arranged in columns, I assume It looks as if both items in the same column are the same, you want just one item, otherwise the one in the top row first, then the one from the bottom row. But what about the last 36s that should yield 26? Typo? Does the dataset have more columns or more rows or both? And how big is it?...

moving payables data from open to history
Hello: A client says that someone imported data about a year or two ago into Great Plains from their AS400. Many payables documents that were imported should have been coded during the import as open, instead of history. The client knows that she can take care of this herself within two hours, by simply turning off the posting to the GL and entering and posting the payables documents to move them to history. But, she is wondering if there is a quick and easy way to do this on the back-end. I'm familiar with the open and history payables tables within GP. And, I know through a T...

merging 2 cells without losing data?
How can I merge 2 cells without losing data from the other cell? Hi Bob Not possible I'm afraid. Try placing the dat from both cells into one and use "Center across selection" under Format>Cells>Alignment Merge cells always end up causing grief. they are best avoided. ***** Posted via: http://www.ozgrid.com Excel Templates, Training & Add-ins. Free Excel Forum http://www.ozgrid.com/forum ***** "bob" <bobree@hotmail.com> wrote in message news:%23JuOM9HGEHA.2308@tk2msftngp13.phx.gbl... > How can I merge 2 cells without losing data from the other...

Problems Converting Data from Quicken 2001 Deluxe to MS Money
Hello, I have a relatively new Compaq Desktop (2.5 GHz Celeron with 512 MB RAM). I have a Viewsonic Pocket PC and I wanted to use it to track my financial data so I purchased Money 2003 Standard. I tried several times to convert my Qucken Data (it's a big file--I've been using Quicken since 1995). My Quicken program is Quicken 2001 Deluxe. Anyway, the MS Money program started to convert and after a few minutes said: "Your Quicken file could not be converted. Money could not convert your Quicken file. You might have run out of disk space or system memory. Try closing othe...

Trying to create an Update query based on HR data to find upline V
Hi All, looking for some advice. I have an HR table that contains employee information but does not contain management chain info. Basically i am trying to determine who the employees upline VP is. The fields i have to work with are [Employee Name], [Manager Name] and [Job Title]. I figure the logic would be to check the employees' manager and if the manager is a VP (based on job title), return the manager's name to a field called [VP]. If the manager is not a VP then check that manager's manager, so on and so forth until a VP is found. Any ideas would be much appr...

Transformation of data into columns
Hi, I have the data from a flattened spreadsheet in a table in the following form: f1 f2 f3 period to: Scheme1 Scheme2 31/01/2005 Net Gross 28/02/2005 Net Gross 31/03/2005 Net Gross 30/04/2005 Net Gross 31/05/2005 Net Gross 30/06/2005 Net Gross 31/0...

How can I sort duplicate text data in excel?
I have a large list of noames that I need to make sure that none of them are duplicated. Is there a way to have excel check it quisker than me reading every name until I find a duplicate? After selecting your data go to filter Advanced filter and check "Unique records only" You can even copy it to another area all uniques entries if you want to ... "TinaScheu" <TinaScheu@discussions.microsoft.com> wrote in message news:0399D580-7E69-4DF0-A969-E7FC5F777C70@microsoft.com... >I have a large list of noames that I need to make sure that none of them >are >...

retrieving folders.old file
probably been posted before, but need some help. i was getting the "MSIMN has caused an error in directdb.dll" i found the solution by renaming the folders.dbx file to folders.old. here's the problem, i opened express back up and my sent folder was empty. my question is, how or can i retrieve that old sent message list?? ...

Insert,Update Data in sage (MS Access Linked tables) using Vb.net form
Hi folks, I am developing application using vb.net which requires integration with SAGE LINE 50 (Accounting software ) V11... The data which SAGE is using is MC ACCESS 2003 database... with linked tables in it... Now I Have developed the Sage connection using ODBC which works fine when reading the record but cannot Add or Update record into the Linked tables.... When i debug the program the error is at the line where it has... <br> MyodbcCommand.ExecutenonQuery() <br> Can anybody Help ????? -- Message posted via AccessMonster.com http://www.accessmonster.com/Uwe/Forums.aspx/acce...

how to build the netsample ipconfig to the exe file?
C:\WINCE500\public\common\oak\drivers\netsamp\ipconfig\ipconfig.cpp i want to make ipconfig.exe. and i could found the sample code. but it source code builded to the lib file. in ipconfig sources files, TARGETNAME=ipconfig TARGETTYPE=LIBRARY SOURCES= \ $(TARGETNAME).cpp the project is .lib file. but the source code has a _tmain() funciton in the source. it's looks possible to compile to the exe file. how can it compile to the exe file? You can add the SYSGEN_NETUTILS and you will have the ipconfig.exe integrated to your OS. Search ipconfig in the cat...

Data Validation List not showing
I'm using Excel 2003. My data validation lists have stopped working on one sheet in my workbook. It is working on all other sheets. I have googled the problem and found the following advice: 1. Make sure freeze panes is off.... check. 2. Select "Show All" under Tools->Options->View->Objects ... check. The problem remains. Any ideas? "Stopped working" doesn't do much to describe your problem. When you select one of the "stopped working" Data Validation cells, do you see the drop-down arrow to the right of the cell? When you select t...

How do I export Lotus Approach files into an Excel spreadsheet?
I need to export data from Lotus Approach to Excel; please help. I am using an old version of Lotus SmartSuite 9.5 and I have Microsoft Office 2003 Basic. Well, I don't know Approach at all but is there a common file format that both use e.g. comma delimited. If so , save in that format from Approach and import into Excel. "LEWOLF" wrote: > I need to export data from Lotus Approach to Excel; please help. I am using > an old version of Lotus SmartSuite 9.5 and I have Microsoft Office 2003 Basic. ...

outlook can't receive exe files
A guy here at work can't get exe files through his outlook. Is there a check to uncheck somewhere to allow it to do this? He can receive normal attachments. ...

Need code snippet to read offline PST file
Hi friends, I have a PST file in my local hard disk and have requirement to read PST file and parse through all folders and then each message item in all folders and then segregate them to different folders based on subject line. Please kindly send the code for the above requirement. Thanks & Regards Ramesh -- ramserp You're going to have to write your own code. Do you know anything about Outlook programming at all? You can start out by looking at information and code samples at www.outlookcode.com. -- Ken Slovak [MVP - Outlook] http://www.slo...

Excel 2000 vs. Excel 2002
I am having troubles with a workbook that I created that is havin problems opening. I created it in 2002, and it opens fine in Excel 2002 for other people However, when I send it to someone who has Excel 2000, it takes over a hour to open. Now I also made a very similar report that works just fine whe trasferred to excel 2000. Here are a couple of stats on the workbook that is having problems: 1.5mb 500+ externel links 500+ subtotals 200+ simple calculations (a1+b1; a1/b1;etc..) 1 Worksheet in the book. 2 columns with conditional formatting Thanks, Joh -- Message posted from http://ww...

Any FREAKIN' way to import DBX files into Outlook 2003
I've tried: Importing via Outlook | Import from another Program or File Importing via Outlook | Import Internet Mail and Addresses Exporting from Outlook Express Tried Many, many times... Can Microsucks make this any more complicated... It's a FREAKIN' DBX file collection NO - No other Application has it Open. YES - The Files ARE there YES - the Internet Account IS there Yes - I've wasted more of my time IMPORTING into Outlook Express in VISTA just to RE-EXPORT back to Outlook. What a bunch of freakin' idiots... Another 2 hours wasted - because one Microsoft applica...

Opening .prn files in XL2000
I am using a software that does not save data/reports in .csv or .xls formats; only in printed versions. Is there a way to save the printed report in a file and open the file in XL2000? If there is, how is the print file produced, where is it saved, etc? A friend suggested setting up a generic printer but didn't know how to go about it. You may want to give that other software just one more chance--look under File and see if there is a SaveAs option. You may find something upon further review. But if you want to add a generic printer, I think it'll depend on your version of win...

Adding extra data options
Is there a way to customize CRM to allow for adding another heading? I would like to add a second field similar to topic and would like to call it type. Can you add extra data fileds and types in CRM 3.0? You can add extra data fields to an entity. Go to entities customization at setting area. -- Marco Amoedo Plain Concepts http://geeks.ms/blogs/marco/ "xxdcmast" escribió: > Is there a way to customize CRM to allow for adding another heading? I would > like to add a second field similar to topic and would like to call it type. > > Can you add extra data ...

Import reg files into Registry, without UAC
Hi, I've got a .reg file to be imported silently in a batch. The file contains only entries in HKEY_CURRENT_USER, therefore can be imported without elevation. This works well with the regedit /s switch on limited accounts, however on admin accounts, elevation UAC prompt is still shown, even though it's not needed. How can I prevent this? Thanks, Jens Jens M�ller wrote: >Hi, > >I've got a .reg file to be imported silently in a batch. The file contains >only entries in HKEY_CURRENT_USER, therefore can be imported without >elevation. >...