Stop and Resume parsing of large XML file

Currently I am using XmlReader (but I am open to other options) to parse an 
XML file, and I would like to be able to stop/break the current parse 
(simple enough) and then resume it later (say after a reboot). Is there any 
way to get the current location in the file that the XmlReader has reached 
so as to be able to restore that and start from that point later?

TIA. 

0
Brian
5/27/2008 12:22:31 PM
dotnet.xml 7266 articles. 0 followers. Follow

5 Replies
675 Views

Similar Articles

[PageSpeed] 14

Brian Cryer wrote:
> Currently I am using XmlReader (but I am open to other options) to parse 
> an XML file, and I would like to be able to stop/break the current parse 
> (simple enough) and then resume it later (say after a reboot). Is there 
> any way to get the current location in the file that the XmlReader has 
> reached so as to be able to restore that and start from that point later?

I don't think so. If you have an underlying stream you could store the 
stream position but I don't know of any way to store and restore the 
state of the XmlReader.


-- 

	Martin Honnen --- MVP XML
	http://JavaScript.FAQTs.com/
0
mahotrash (1778)
5/27/2008 12:34:46 PM
"Martin Honnen" <mahotrash@yahoo.de> wrote in message 
news:erugPY$vIHA.4912@TK2MSFTNGP03.phx.gbl...
> Brian Cryer wrote:
>> Currently I am using XmlReader (but I am open to other options) to parse 
>> an XML file, and I would like to be able to stop/break the current parse 
>> (simple enough) and then resume it later (say after a reboot). Is there 
>> any way to get the current location in the file that the XmlReader has 
>> reached so as to be able to restore that and start from that point later?
>
> I don't think so. If you have an underlying stream you could store the 
> stream position but I don't know of any way to store and restore the state 
> of the XmlReader.

I'm not too worried about the "state" of the XmlReader (I might be when I 
get there but for now I'm assuming if there are any issues that I'll be able 
to work round them).

I've looked at storing the stream position, but its evident that the 
XmlReader reads in a buffer load because my stream position is at about the 
8KB mark when I get to the first tag in the XmlReader.

Ahh ... Martin, you've been a great "sounding board". Knowing that the 
XmlReader doesn't provide any way of doing this is useful. But thinking 
about it, if the XmlReader reads in 8KB chunks (an assumption on my part, 
but one which I ought to be able to test) then as a way of "restoring" I may 
be able to get away with simply putting my read point 8096 bytes before the 
last known position in the underlying stream and then deal with any errors 
that get thrown up when XmlReader hits what it thinks is malformed XML. Bit 
yucky, but this might work for me (if XmlReader will play ball). At least it 
gives me an avenue to explore.

TA.

0
Brian
5/27/2008 1:19:30 PM

"Brian Cryer" wrote:

> "Martin Honnen" <mahotrash@yahoo.de> wrote in message 
> news:erugPY$vIHA.4912@TK2MSFTNGP03.phx.gbl...
> > Brian Cryer wrote:
> >> Currently I am using XmlReader (but I am open to other options) to parse 
> >> an XML file, and I would like to be able to stop/break the current parse 
> >> (simple enough) and then resume it later (say after a reboot). Is there 
> >> any way to get the current location in the file that the XmlReader has 
> >> reached so as to be able to restore that and start from that point later?
> >
> > I don't think so. If you have an underlying stream you could store the 
> > stream position but I don't know of any way to store and restore the state 
> > of the XmlReader.
> 
> I'm not too worried about the "state" of the XmlReader (I might be when I 
> get there but for now I'm assuming if there are any issues that I'll be able 
> to work round them).
> 
> I've looked at storing the stream position, but its evident that the 
> XmlReader reads in a buffer load because my stream position is at about the 
> 8KB mark when I get to the first tag in the XmlReader.
> 
> Ahh ... Martin, you've been a great "sounding board". Knowing that the 
> XmlReader doesn't provide any way of doing this is useful. But thinking 
> about it, if the XmlReader reads in 8KB chunks (an assumption on my part, 
> but one which I ought to be able to test) then as a way of "restoring" I may 
> be able to get away with simply putting my read point 8096 bytes before the 
> last known position in the underlying stream and then deal with any errors 
> that get thrown up when XmlReader hits what it thinks is malformed XML. Bit 
> yucky, but this might work for me (if XmlReader will play ball). At least it 
> gives me an avenue to explore.
> 
> TA.
> 
> 

It seems like a lot of work to go through, and likely prone to errors due to 
machine dependencies.  How are you persisting the part that was read before 
the reboot?  Are you no longer interested in that portion of the XML after it 
has been processed?

0
5/28/2008 11:25:02 AM
"Family Tree Mike" <FamilyTreeMike@discussions.microsoft.com> wrote in 
message news:60AE9B88-1244-4699-90B2-AF1211FE2941@microsoft.com...
>
> "Brian Cryer" wrote:
>
>> "Martin Honnen" <mahotrash@yahoo.de> wrote in message
>> news:erugPY$vIHA.4912@TK2MSFTNGP03.phx.gbl...
>> > Brian Cryer wrote:
>> >> Currently I am using XmlReader (but I am open to other options) to 
>> >> parse
>> >> an XML file, and I would like to be able to stop/break the current 
>> >> parse
>> >> (simple enough) and then resume it later (say after a reboot). Is 
>> >> there
>> >> any way to get the current location in the file that the XmlReader has
>> >> reached so as to be able to restore that and start from that point 
>> >> later?
>> >
>> > I don't think so. If you have an underlying stream you could store the
>> > stream position but I don't know of any way to store and restore the 
>> > state
>> > of the XmlReader.
>>
>> I'm not too worried about the "state" of the XmlReader (I might be when I
>> get there but for now I'm assuming if there are any issues that I'll be 
>> able
>> to work round them).
>>
>> I've looked at storing the stream position, but its evident that the
>> XmlReader reads in a buffer load because my stream position is at about 
>> the
>> 8KB mark when I get to the first tag in the XmlReader.
>>
>> Ahh ... Martin, you've been a great "sounding board". Knowing that the
>> XmlReader doesn't provide any way of doing this is useful. But thinking
>> about it, if the XmlReader reads in 8KB chunks (an assumption on my part,
>> but one which I ought to be able to test) then as a way of "restoring" I 
>> may
>> be able to get away with simply putting my read point 8096 bytes before 
>> the
>> last known position in the underlying stream and then deal with any 
>> errors
>> that get thrown up when XmlReader hits what it thinks is malformed XML. 
>> Bit
>> yucky, but this might work for me (if XmlReader will play ball). At least 
>> it
>> gives me an avenue to explore.
>>
>> TA.
>>
>
> It seems like a lot of work to go through, and likely prone to errors due 
> to
> machine dependencies.  How are you persisting the part that was read 
> before
> the reboot?  Are you no longer interested in that portion of the XML after 
> it
> has been processed?

Fortunatly in this case the XML file whilst rather long is quite shallow. So 
I can forget about what went on before, and if I come across a duplicate 
section (which I will) then I can handle that (because each has a unique 
ID). So, in short, I don't need to worry too much about what went on before 
or the context. So, this isn't a generic solution by any means. (If I were 
processing something like an HTML file then it would get too messy to be 
viable.)

However, all this is still theory at the moment, as other work has pulled me 
away from this. I am hoping to be able to prove whether or not thie approach 
works for me either today or tomorrow.


0
Brian
5/29/2008 9:46:41 AM
"Brian Cryer" <www.cryer.co.uk> wrote in message 
news:u5SuoDXwIHA.1236@TK2MSFTNGP02.phx.gbl...
> "Family Tree Mike" <FamilyTreeMike@discussions.microsoft.com> wrote in 
> message news:60AE9B88-1244-4699-90B2-AF1211FE2941@microsoft.com...
>>
>> "Brian Cryer" wrote:
>>
>>> "Martin Honnen" <mahotrash@yahoo.de> wrote in message
>>> news:erugPY$vIHA.4912@TK2MSFTNGP03.phx.gbl...
>>> > Brian Cryer wrote:
>>> >> Currently I am using XmlReader (but I am open to other options) to 
>>> >> parse
>>> >> an XML file, and I would like to be able to stop/break the current 
>>> >> parse
>>> >> (simple enough) and then resume it later (say after a reboot). Is 
>>> >> there
>>> >> any way to get the current location in the file that the XmlReader 
>>> >> has
>>> >> reached so as to be able to restore that and start from that point 
>>> >> later?
>>> >
>>> > I don't think so. If you have an underlying stream you could store the
>>> > stream position but I don't know of any way to store and restore the 
>>> > state
>>> > of the XmlReader.
>>>
>>> I'm not too worried about the "state" of the XmlReader (I might be when 
>>> I
>>> get there but for now I'm assuming if there are any issues that I'll be 
>>> able
>>> to work round them).
>>>
>>> I've looked at storing the stream position, but its evident that the
>>> XmlReader reads in a buffer load because my stream position is at about 
>>> the
>>> 8KB mark when I get to the first tag in the XmlReader.
>>>
>>> Ahh ... Martin, you've been a great "sounding board". Knowing that the
>>> XmlReader doesn't provide any way of doing this is useful. But thinking
>>> about it, if the XmlReader reads in 8KB chunks (an assumption on my 
>>> part,
>>> but one which I ought to be able to test) then as a way of "restoring" I 
>>> may
>>> be able to get away with simply putting my read point 8096 bytes before 
>>> the
>>> last known position in the underlying stream and then deal with any 
>>> errors
>>> that get thrown up when XmlReader hits what it thinks is malformed XML. 
>>> Bit
>>> yucky, but this might work for me (if XmlReader will play ball). At 
>>> least it
>>> gives me an avenue to explore.
>>>
>>> TA.
>>>
>>
>> It seems like a lot of work to go through, and likely prone to errors due 
>> to
>> machine dependencies.  How are you persisting the part that was read 
>> before
>> the reboot?  Are you no longer interested in that portion of the XML 
>> after it
>> has been processed?
>
> Fortunatly in this case the XML file whilst rather long is quite shallow. 
> So I can forget about what went on before, and if I come across a 
> duplicate section (which I will) then I can handle that (because each has 
> a unique ID). So, in short, I don't need to worry too much about what went 
> on before or the context. So, this isn't a generic solution by any means. 
> (If I were processing something like an HTML file then it would get too 
> messy to be viable.)
>
> However, all this is still theory at the moment, as other work has pulled 
> me away from this. I am hoping to be able to prove whether or not thie 
> approach works for me either today or tomorrow.

Incase anyone is monitoring this or wants to do something similar one day 
.... I've decided to abandon this approach. It just started to get too messy. 
Since the XML is well structured I'm going to implement reader from scratch 
which does exactly what I need.

0
Brian
6/3/2008 1:30:08 PM
Reply:

Similar Artilces:

Regarding Reading XML Files in Visual C++6.0 using XMLDOMMethod.
Hi, While reading a 4.2 MB XML File in Visual C++ 6.0 using XMLDOM method, it is taking lot of time. But the same XML File when reading in Visual C# using Visual Studio 2005 is taking lesser time. How this Visual C++ 6.0 reading delay in XML can be avoided. ==========Suresh ...

Excel Files on the Web
Hi, I'm looking at setting up all our excel reporting on my company intranet. We have numerous excel based reports that we currently have on our server that we email out to users. However, we have a diverse user group who reside in many different locations. I'm just looking at any basic advice or software on how this can be achieved. Thanks. ...

Conditional XML serialization???
Hi, I've got two classes - Entity which will have an array of Child objects in it and Child class. class Entity { public Child[] Children; } class Child { public string Name; public string Description; } I want to be able to serialize this class into two different XMLs. 1) looks like, <Entity> <Child Name="" Description=""/> <Child Name="" Description=""/> <Child Name="" Description=""/> </Entity> 2) looks like, <Entity> <Child Name="&quo...

disable snap-to while dragging tab stops?
How do I drag a tab stop to arbitrary position without it snapping to the tick marks on the ruler? I tried holding down shift, ctrl, alt. None of them works. I even turned off everything in the arrange->snap menu. Publisher 2003. Set the tab to about where you want it, zoom to 200%, open the tab dialog and input the number where you want the tab positioned, click set. The tab button can be dragged to your toolbar by customizing. This is the best that Publisher can do... -- Mary Sauer MSFT MVP http://office.microsoft.com/ http://msauer.mvps.org/ news://msnews.microsoft.com "p...

Where is the .ost file ? Outlook 2003
I want to use the inbox repair tool. I did find the scanost and scanpst files in the "1033" folder on my HDD. I wanted to use the scanpst file to scan .ost as you can scan either file, but when I searched for the location --- ending in microsoft\outlook ---- there wa no .ost file. I also scanned my HDD for .ost could not find that extension. And I do run in cached mode. I did use the scanost, but also wanted to use the scanpst tool drill to the location of the .ost file, change the file extension and run. Thanks, Bob Did you check this directory? C:\Doc...

Mapping a CSV file to an Xml Schema
Hi all, I am getting a CSV file like this from our client: "C1","2","12344","Mr","John","Chan","05/07/1976"......... I need to validate **each filed value** against a set of rules ,for instance for "05/07/1976" ,I need to make sure that it's in the right format ,It's not later than today and lots of other rules ,Is there somebody who can help me how to that?Can I map it to some sort of xml schema or something? Thanks for your help. Ali-R "Ali-R" <AliR@microsft.com> wrote in messa...

stops responding
A few days ago while trying to balance one of my accounts, Money would start hanging and I had to kill the process and created a LRD file. My file is only 12MB and it has been fine until recently. I am using Money 2007. I should have put my system specs. I have a P4 3.8GHZ w/ 2GB RAM running XP Pro. When I have Money open, I typically only have Messenger, a browser, and sometimes Acroabat Reader running also. "Casey" wrote: > A few days ago while trying to balance one of my accounts, Money would start > hanging and I had to kill the process and created a LRD file. ...

blue stop screen 0x0000007E
stop error code after windows installs drivers. this is new hdd and mobo but problem persists.oxoooooo7e (oxf748eobf oxf78da2o8,oxf78d9fo8....was doing it with last mobo and hdd...help please!!! -- thanks one and all for your helping hands xerxies wrote: > stop error code after windows installs drivers. this is new hdd and > mobo but problem persists.oxoooooo7e (oxf748eobf > oxf78da2o8,oxf78d9fo8....was doing it with last mobo and hdd...help > please!!! That *is* a driver issue. You are installing the incorrect drivers or need to contact the manufacturer of said ...

Receipt XML
I am attempting to have a gift receipt that prints only the items with the GIFT Sales Rep assigned. I understand that you cannot do an IF statement to compare a variable against a string. Instead you have to compare the variable to another variable which contains the string you're checking for. This seems to work for Entry.Item.ItemLookupCode but always returns true for Entry.SalesRep.Name. Any ideas? <!-- I am attempting to have a gift receipt that prints only the items with the GIFT Sales Rep assigned The code segment for Entry.SalesRep.Name always returns true which is no...

recipient policy stopped working
Hello All, I'm seeing an issue in my Exchange 2003 environment where an admin will create a new account and the recipient policy does not populate the users email address. This just started happening a few days ago here in the US... It happened in our UK office last week but I thought that might have been user error on the admin. We are in mixed mode with Exchange5.5, which we have been for almost a year. I did how ever remove some old 5.5 servers but the last one I removed was almost a month ago, so i don't think it would be that. I know you need to ask this question so yes th...

Re: File Not Found #2
"Leslie Coover" <lcc66604@cox.net> wrote in message news:... > > "Leslie Coover" <lcc66604@cox.net> wrote in message news:... >> When I try to copy just a few worksheets from a large workbook >> to a new workbook I get bunches of File Not Found boxes that pop up >> when I try to open the new workbook. How can I stop this? >> >> Les >> Excel 2000 >> > > ...

How do I stop a cc email to myself from going to deleted folder
Sometimes when I send an email, I want to keep a copy ofr myself. When I enter my email address in the cc part, this always goes straight to the deleted folder, even though I have my email listed as a safe sender. Any suggestions? On 13/03/2010 01:53, technochallenged wrote: > Sometimes when I send an email, I want to keep a copy ofr myself. Outlook automatically keeps a copy of every email you send in the Sent Items folder. There is no need to cc yourself. "technochallenged" <technochallenged@discussions.microsoft.com> wrote in message news:D4...

3197: The microsoft Jet Database Engine stopped the process because you and another user are attempting to change the same data at the same time
Hello, I have a fairly large (For Access - its slated to be transferred to MySQL) backend that allows the user to use the database's front end normally to store and retrieve data. However when I try to open the database I get the Error "The microsoft Jet Database Engine stopped the process because you and another user are attempting to change the same data at the same time". This isn't true because I have it on my local computer. I've tried importing to a new DB file and I get the same error. The original DB will not open to allow import (show table list) and show er...

Quicken can't find my 2007 Money file
I recently attempted to install "Q" 2010 but it won't import from my 2007 Money file. It says it can't find it. I tried using their "chat" support but chickened out when the rep wanted me to import Money 2008. My Money file is nearly 20 years old and contains my complete financial history for that time period. I am paranoid about protecting it. Anybody have any thoughts about why "Q" can't find my file? It isn't imperative that I convert now ....or in the future probably.... but I thought I would test the water. If Q can't find i...

Unable to load files from alternate startup folder
Windows SBS 2003 Windows XP Office 2003 Dell Optiplex GX260 We are a small CPA firm running, seemingly, identical set-ups on 6 identical workstations. We load a system.xls file from a network folder (R:\Firm\system\excel\xlstart) when Excel starts through the 'At startup, open all files in:' setting in the Options|general tab. We've been loading this file successfully through multiple versions of Excel over the last 5+ years. On two of the workstations, the files in this folder have stopped loading. Additionally, we are unable to create icons to execute any of the macros in the ...

Any suggestions CSS/XML
I have an interesting implementation and I am trying to kill two birds with one stone. In the wizard I am designing the user has the ability to choose from one of n CSS style sheets to be applied to the end results of the wizard. I am also thinking of supporting customization of the stylesheet and this is where my question comes in. I have been commenting my stylesheets using xml as a guide, for example <data> <name>Cool Blue</name> <comment>yadda</comment> <styleset name="common"> <styles> <style name=&...

Stop all macros
I have a button on a form, which should do different things depending on 1. The data contained within the current record; 2. Whether the current record is a 'new record'; 3. Whether the current record is the first record in the form. On click, an embedded macro is initiated. Each step in the embedded macro has a condition and a corresponding 'RunMacro' action. All the conditions are mutually exclusive, and each sub-macro has a final step of 'StopAllMacros'. So the embedded macro should trigger one, and only one sub-macro. However, it appears that what is act...

"Broken" PST files?
Outlook 2007 / Windows XP / PST formats both Outlook 2002 and Outlook 2003 Problem: I keep getting "Delayed Write Failures" across most of my PST files. Environment: 1) Files are stored on network drive. (company policy) 2) All files are 400MB and less 3) I have 6 PSTs loaded at all times while running Outlook What I've tried. 1) Run ScanPST on each PST. It found errors and was able to repair. However, it becomes "bad" again a couple of hours later. Again, I'm able to run ScanPST and repair the files. 2) Checked with IT support that I have ...

File name change
I am very new with Exel The following works, even though it might not be pretty. But what I want to do is to be able to run this after changing the file name Follow_up.xls to, for example, 1234.xls. The value 1234 is the sum of A3, so I put it in a variable - myFolder1 - How do I enter it into the - Windows("Follow_up.xls").Activate - line, in the place of "Follow_up.xls" Sub List_Req2() Dim myFolder1 As String Range("A3").Select Selection.Copy Range("G3").Select Selection.PasteSpecial Paste:=xlValues, Operation:=xlNone, SkipBlanks:= _ Fal...

SOAP message to XML String
Hi All, I have a web service hosted on my server. When the web service is invoked a SOAP request is generated on my server. I would like to store the SOAP message as XML String, so I can perform various validations on each of the XML Tags. How do I acquire this in C# ? Thanks, -Nifty Nifty, You can intercept the incoming stream by registering a SoapExtension for your web service. Take a look at [0] for an article on how to add a SoapExtension that validates SOAP request bodies against XML schemas. HTH, Christoph Schittko MVP XML http://weblogs.asp.net/cschittko [0] http://msdn.micro...

Backing up the PST file(s)
I recently began using Outlook 2003 on a new Windows 7 Home Premium Computer. I successfully imported my emails and contacts from an older Windows XP computer. I want to have my Outlook data store =96 the .PST file set up for automatic backup using the =93Second Copy=94 program. I want to tell the backup program which file(s) and where. On the Mail page of Outlook, I used the click sequence Rt Click Personal Folders Properties Advanced The resulting file name is: C:\Users\PT\AppData\Local\Microsoft\Outlook\Outlook.pst But in c:\Users\PT\MyDocuments, there=92s another fi...

Converting Quatro Pro 5.0 file to Excell
Can this be done. If so, how. I would really like to preserve and use my quatro pro files in excell. Thanks. try the excel newsgroup... >-----Original Message----- >Can this be done. If so, how. I would really like to >preserve and use my quatro pro files in excell. Thanks. >. > ...

Stopping the scroll bar
Hi all, this is my first time. I don't know if this is the right place for this problem but here I go. I have created a template of an invoice the size of an A4 and would like to stop the scroll bar moving both horizantally and vertically in normal view just as if it was in print view. Is that possible through VBA and if yes what is the code that I would need to achieve such task. Thanks in advance --- Message posted from http://www.ExcelForum.com/ Not too sure what you are after but you can remove the scrollbars with <Tools - Options> and then in the View tab, at the bottom,...

Parsing data
I have looked at all the posts I could find but still have questions about this. I have a need to pull zipcode out of a 30 byte field. It could start in the 10 position or the 25th position. It is a length of five, except if it is an international address, then who knows! Let's just stick with zipcode for now. How do I go to the 30th position (the last blank) and keep going left until I get a 5 digit zip? Ideas? Something like this would extract a 5 character zip code that starts at the 25th position. Zipcode25: Mid([TheString],25, 5) However you mentioned a spa...

CRM 3.0 DB is too large
Hi all, We use CRM3.0 customer service for our helpdesk application with 4Gb RAM and Dual Intel Xeon Processor 3.6 Ghz (DB Server). For 8 months, the application run well but now we can't use the application because there is no space in server MSSQL folder. And I see that the db is too large (182 Gb for mdb file and 80 Gb for log file). FYI, We've already created MSSQL Maintenance for backup the db and log. In the maintenance schedule, we do the full backup db every Sunday and the log every day. And we also do the shrink for the CRM db. Please help me to give the solution for our...