XmlReaders and fragments

When reading fragments, it seems like XmlReaders try to read too much.  I'm 
working on a file parser for a new file format, and I've run into a problem. 
 The format has an XML fragment for a header, then a (frequently) large amount 
of binary data beneath.  In certain situations, there may be XML fragments 
further down in the file.  An example of the file might look like this:

<header>
<name>some name</name>
<size>1000</size>
<stuff>more data</stuff>
<otherstuff>19.2</otherstuff>
</header>...some huge block of binary data...

The header isn't a fixed size.  In writing a little test to try and parse 
out the header, I ran into what seemed like a really weird decision on the 
part of the XmlReader.  When I try to stream the data through using ReadOuterXml 
(to get the header for processing later), it would throw an exception regarding 
invalid characters MUCH further down in the file.  In other words, the reader 
had gotten past the ending element of the fragment, then kept going.  ReadOuterXml 
is just supposed to get the tags and children of the current node, which 
in this case would have been the node labelled "header".

Here's a short program to demonstrate what I mean.

using System;
using System.IO;
using System.Xml;

namespace XmlParseTest
{
    class Program
    {
        static void Main(string[] args)
        {
            FileStream file;
            XmlReader baseReader;
            XmlTextReader reader, sReader;
            XmlReaderSettings readerSettings;
            string xml;

            file = new FileStream(@"c:\test.xml", FileMode.Open, FileAccess.Read);
            file.Seek(0, SeekOrigin.Begin);

            reader = new XmlTextReader(file, XmlNodeType.Element, null);
            reader.Normalization = false;

            readerSettings = new XmlReaderSettings();
            readerSettings.ConformanceLevel = ConformanceLevel.Fragment;
            readerSettings.IgnoreWhitespace = false;
            readerSettings.IgnoreComments = true;
            readerSettings.CheckCharacters = false;

            baseReader = XmlReader.Create(reader, readerSettings);
            baseReader.MoveToContent();
            xml = baseReader.ReadOuterXml();

            baseReader.Close();
            file.Close();
        }
    }
}

Am I missing something?

Lee Crabtree


0
Lee
1/8/2008 9:27:43 PM
dotnet.xml 7266 articles. 0 followers. Follow

1 Replies
549 Views

Similar Articles

[PageSpeed] 5

Lee Crabtree wrote:
> When reading fragments, it seems like XmlReaders try to read too much.  
> I'm working on a file parser for a new file format, and I've run into a 
> problem. The format has an XML fragment for a header, then a 
> (frequently) large amount of binary data beneath.  In certain 
> situations, there may be XML fragments further down in the file.  An 
> example of the file might look like this:
> 
> <header>
> <name>some name</name>
> <size>1000</size>
> <stuff>more data</stuff>
> <otherstuff>19.2</otherstuff>
> </header>...some huge block of binary data...
> 
> The header isn't a fixed size.  In writing a little test to try and 
> parse out the header, I ran into what seemed like a really weird 
> decision on the part of the XmlReader.  When I try to stream the data 
> through using ReadOuterXml (to get the header for processing later), it 
> would throw an exception regarding invalid characters MUCH further down 
> in the file.  In other words, the reader had gotten past the ending 
> element of the fragment, then kept going.  ReadOuterXml is just supposed 
> to get the tags and children of the current node, which in this case 
> would have been the node labelled "header".

ReadOuterXml() when positioned on an element reads everything including 
the end tag and positions the reader on the next node. If there is 
binary data after the end tag then you get an error. I am not sure what 
you expect, ConformanceLevel.Fragment does only mean there is no 
requirement to have exactly one root element, it does not mean binary 
data is allowed.
If you want to consume an element but avoid that the reader is 
positioned after the end tag then you might want to try whether using 
ReadSubtree() does what you want, it gives you a second XmlReader you 
can work with to consume only the element, once you close it the first 
main reader is positioned on the end tag, not after it.


-- 

	Martin Honnen --- MVP XML
	http://JavaScript.FAQTs.com/
0
mahotrash (1778)
1/9/2008 3:10:46 PM
Reply:

Similar Artilces:

XMLREADER
If I run the following code :- XmlReader rdr = dal.Getxxxx rdr.MoveToContent(); string xmlstring = rdr.ReadOuterXml(); how can I load the string back to a XPathDocument or a XPathNavigator. hmm does anyone know ? David Price david wrote: > If I run the following code :- > > XmlReader rdr = dal.Getxxxx > rdr.MoveToContent(); > string xmlstring = rdr.ReadOuterXml(); > > how can I load the string back to a XPathDocument or a XPathNavigator. hmm > does anyone know ? XPathDocument doc = new XPathDocument(new StringReader(xmlstring)); -- Oleg Tkachenko [XML MVP] h...

Looping through XMLReader skips records
Hi: When I try and loop through the reader using any of the Read methods, I never get the five records returned that I expect. On the SQL Query Analyzer end, my query returns 5 results. Using the code below and the various Read methods, I get 3 and sometimes four values when Debugging. What does this mean? ---------------------------------------------------------------------------- - Dim reader As XmlTextReader reader = objCmd.ExecuteXmlReader Response.Write(reader.ReadState) While (reader.Read) Response.Write(reader.ReadString) End While reader.Close() -----------------------...

i want to kill xmlreader and all her children #2
please reply via email to this post to x0td0x@hotmail.com i need some code that will generate code to read the xml file it is reading for example input: <book> <bk:title>the wind</bk:title> <bk:content type="text">howl</bk:content> </book>output: (xtw=xmltextwriter) [code:1:b5cc48202f]xtw.WriteStartElement("book"); xtw.WriteStartElement("bk:title"); xtw.WriteString("the wind"); xtw.WriteEndElement(); xtw.WriteStartElement("bk:content"); xtw.WriteAttributeString("type","text"); xtw.WriteSt...

XmlReader-to-XmlReader skips nodes?!
Hi, I have the following code to copy nodes from an XML document (XmlReader reader) to some output (XmlWriter writer). while (reader.Read ()) if (reader.MoveToContent () == XmlNodeType.Element) break; // forward the reader to the document element writer.WriteStartDocument (false); writer.WriteStartElement ("XmlStore"); // write the document element itself while (reader.Read ()) if (reader.NodeType == XmlNodeType.Element) writer.WriteNode (reader, true); // copy nodes // might want to insert more nodes here writer.WriteEndElement (); /...

XMLWriter/XMLReader vs MSXML 4.0
Hi to all! someone of you know the difference between XMLWriter/XMLReader and MSXML4.0? The first one has the same capacity of the second one? Which is better to use? Which give the clearest messages or the greatest amount of information about the kind and the origin of error (considering that I have to show messages in Italian and the Framework is in English)? And ...is it difficult to include msxml4.dll in my project? Thank you in advance to all. Fede. Fede wrote: > someone of you know the difference between XMLWriter/XMLReader and MSXML4.0? > The first one has the same capacity ...

XmlReader and LineNumber
According to the MSDN documentation within the XmlTextReader class for ..NET 2.0, the recommended practice to create XmlReader instances is using the XmlReaderSettings class and the XmlReader.Create() method. However, the problem is, the XmlReader class does not expose certain properties that I need, e.g., LineNumber, LinePosition, etc. I would like to follow Microsoft's recommended practices, but I'm not sure how I can get XmlTextReader functionality out of XmlReader. Should I instantiate a XmlTextReader object and pass this to the XmlReader.Create() method and then access this under...

How to obtain a utf-8 string from an XmlReader?
Hi all, I'm trying to convert the xml obtained from a XmlReader object into a UTF-8 array. My general idea is to read the XmlReader and write into a MemoryStream. Then convert the MemoryStream bytes into utf-8. MemoryStream ms = new MemoryStream(); XmlTextWriter xmlWriter = new XmlTextWriter(ms, new UTF8Encoding(false)); writer.Formatting = Formatting.Indented; writer.Namespaces = false; writer.Indentation = 4; while(xmlReader.Read()) { xmlWriter.Write(?); } xmlWriter.Flush(); xmlWriter.Close(); string xml_as_utf8 = Encoding.UTF8.GetString(ms.ToArray()); B...

XMLREADER or XMLDocument???
How do I read the attributes of this XML? I have a page with text boxes that i want to read these values in. notice there are 2 Parameter tags with the same attributes. Code would help <TranslationRecords> <TranslationRecord TrxID="1"> <ParameterCollection> <Parameter KeyName="FielDelimiterChar" KeyValue="29" /> <Parameter KeyName="SegmentDelimiterChar" KeyValue="30" /> </ParameterCollection> </TranslationRecord> </TranslationRecords> Thanks Try this as a template. It should be gene...

XmlReaders and fragments
When reading fragments, it seems like XmlReaders try to read too much. I'm working on a file parser for a new file format, and I've run into a problem. The format has an XML fragment for a header, then a (frequently) large amount of binary data beneath. In certain situations, there may be XML fragments further down in the file. An example of the file might look like this: <header> <name>some name</name> <size>1000</size> <stuff>more data</stuff> <otherstuff>19.2</otherstuff> </header>...some huge block of binary data....

is it possible to read a previous node by XMLReader?
I know XMLReader can only move forward. In my program I may need to go back to read a previous node from XML file again and I don't like to use DOM, do we have any tricky way to do it? Thanks. Linda Chen "Linda Chen" <linda.chen@faa.gov> writes: > I know XMLReader can only move forward. In my program I > may need to go back to read a previous node from XML file > again and I don't like to use DOM, do we have any tricky > way to do it? My trick was to use a Stack variable where to put the stuff, I used it to put the name of the previous element.....

XMLreader to text
This is a multi-part message in MIME format. ------=_NextPart_000_001A_01C3CDFB.F9247FB0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable How do I read the entire XML text from an XMLReader ? I just want to retrieve the XML string from a SQL SP formatted as XML. = I assumed the ExecuteXMLReader was the best option. So whats the most = streamlined process of getting the XML into a string ? Thanks Bill ------=_NextPart_000_001A_01C3CDFB.F9247FB0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-pri...

End-of-line Handling in the XmlReader
According to the XML 1.0 (Third Edition) W3C Recommendation (http://www.w3.org/TR/2004/REC-xml-20040204/#sec-line-ends) all #xD, #xA, and #xD#xA character combinations should be converted to a single #xA character. According to the "Reading XML with the XmlReader" section of the ".NET Framework Developer's Guide" on-line help, the XmlReader will not perform this normalization by default. You can cause the XmlReader to perform this normalization by setting the Normalization property to true. This does not appear to be the case in every situation. Sample XML File: ...

XmlReader Question
All, I have a large unformatted document which needs to processed and populated into DB. For this, I was using an XMLTextReader to get valid IDs, and then doing something like this: newRdr = xRdr.ReadSubtree(); while (newRdr.Read()) { if (newRdr.NodeType == XmlNodeType.Element) { switch (newRdr.LocalName) { case "A": case "Y": case "Z": ...

XMLReader Problem
Hi, I'm just moving to VB.NET and I'm trying to load a recordset into= an XMLReader and loop through the records. When I use the= .ReadElementString() method to get the next 'record' if it has= reached the last record (eof) it fails with "Invalid attempt to= read when reader is closed". It doesn't seem capable of finding= this out until the error has occured though (Readstate is= interactive prior to the failure and EOF is false.) Is this limited information enough for anyone to tell me where= I'm going wrong ? (currently as I'm in the early st...

HTML Code Fragments
I am using simple html code fragments in my e-newsletter. They are just table of contents that link to coorelated parts of the page. I am distrubiting this newletter in the body of an email. When i do a webpage preview it works fine. However, when i try to send it.....neither yahoo mail or my outlook account wil recognize the code fragments. They recognize the first part (the table of contents code)but will not even display the anchors. When i click the table of contents links nothing happens. Any idea why??? Thanks cmichaud@ufl.edu wrote: > I am using simple html code fragments in...

XmlReader and Validation
I have an application that reads through an XML document and validates it against a DTD. During the read of the document I look at difference aspects of the document by looking at XmlNodeType. I have run into a snag though when an element is empty and written in what I call shorthand of <empty/>. My code of course captures that this is an element but since there is no underlying end element it causes me a problem. I currently wrote a workaround using the IsEmptyElement but I was wondering if anyone knew of a way to have the reader tell me the element is written in shorthand. Thanks, ...

Fragmented MEM
I have been getting the following errors 9582 and 7631 on my W2K SP4 Std Edt running E2K3 SP3 Ent. Edt. Once I get those error, no one can send or receive internal email while external email works fine. There are no message indicating that the internal email did not get sent or receive, it just disappear. After I reboot the exchange server, everything seems to work fine again. The server has 4GB of RAM, SCSI RAID controller with 160GB of disk space. It is a member server and only has exchange on it and do not serve any other role. Since it is a Windows 2000 standard edition, is it...

Getting attributes(I think) from XMLReader
Hi All, I need a few suggestions. I have the following XML segment: <LookUp> <ControlType>CheckBoxGroup</ControlType> <DBField>LastMedDate</DBField> <ControlName>cmbGoal1</ControlName> <Values VALUE="0" BookMark="Goal1Progress"/> <Values VALUE="1" BookMark="Goal1NoProgress"/> <Values VALUE="2" BookMark="Goal1NA"/> </LookUp> I have code that will read everything except the "Values" elements. Here is the code: While reader.Read() Select Case (reader.No...

Transform to a Stream or XmlReader?
Hi, I have this code that will write the transformed XML immediately to the browser with the Response object.. XslTransform trans = new XslTransform() trans.Load(MapPath("MyXsl.xsl")) trans.Transform(oXmlDataDocument, null, Response.OutputStream, null); //Writes immediately to browser her But if I wanted to return the transformation as a variable and not immediately to the browser, the only way I found to do this was using the XmlReader below.. XmlUrlResolver resolver = new XmlUrlResolver() XmlReader reader = trans.Transform(oXmlDataDocument, null, resolver) while (reader.Read())...

new to XMLReader
Hi, I'm doing an XML project, and I've gotten to the point where I retreive my data from my SQL Server, and execute my XML Reader. How can I display the XML in a browser window now? Here is my code: string tXML; SqlCommand objCmd = new SqlCommand ("up_get_XMLStandardExterior", dbConnect()); objCmd.CommandType=CommandType.StoredProcedure; SqlParameter objParam = new SqlParameter ("@nControlIDNumber", DbType.Int16); objParam.Direction=ParameterDirection.Input; objParam.Value=1; objCmd.Parameters.Add(objParam); System.Xml.XmlReader myXmlReader = objCmd.Exec...

xmlreader question
Hello, I have: writer = XmlWriter.Create("mike.xml",settings); // ......Do Some stuff. This works writes an xml file. // Then I try to close it writer.Flush(); writer.Close(); //But I get an error saying I cant open it, its being used FileStream fs = new FileStream("mike.xml", FileMode.Open); ....More stuff What am I missing? Thanks "AMP" <ampeloso@gmail.com> wrote in message news:4ffc568b-828d-4c40-b5a2-1239b0928c91@k19g2000yqc.googlegroups.com... ...

use XmlReader/XmlWriter to reformat XML?
Since XmlWriter offers so many nice options for formatting, I thought it would be nice to read in via XmlReader, and write back out via XmlWriter. It might be overkill, but I'd also like to be able to check some values during that time also so I was going to be using XmlReader anyway. Unfortunately I don't see an easy way to stream it back out through XmlWriter without going node by node. Any suggestions? Is there an easier/faster way to do this already? Michael Michael Malinak wrote: > Since XmlWriter offers so many nice options for formatting, I thought it > would ...

End-of-line Handling in the XmlReader (.NET Framework Version 1.1)
This is a multi-part message in MIME format. ------=_NextPart_000_002A_01C45784.A61EEA40 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable According to the XML 1.0 (Third Edition) W3C Recommendation = (http://www.w3.org/TR/2004/REC-xml-20040204/#sec-line-ends) all #xD, = #xA, and #xD#xA character combinations should be converted to a single = #xA character. According to the "Reading XML with the XmlReader" section of the ".NET = Framework Developer's Guide" on-line help, the XmlReader will not = perform this normali...

subclassing xmlReader issues
I have created a subclass of xmlReader and was passing that in to XPathDocument and using that XPathDocument instance in my xslCompiledTransform transform() method. Something about my reader causes the xslCompiledTransform to behave differently in reading xml attributes depending on whether I am debugging the xsl or not. I declared all of abstract methods and I don't get any errors building my new xmlReader class. I get no exceptions when using it to parse my xml document and while debugging I can see my xml attributes. But when I try to use it with or without visual ...

XmlReader Question #2
Hi, I am using an XmlTextReader to read an xml file. It may happen that the file is present in the disk, but it may be empty (0 bytes). I would like to find out whether the xml file contains a valid root node or not. How do I do this? This is what I need if(File.Exists(fileName)) { XmlTextReader xmReader = new XmlTextReader(fileName); //How do I find out whether this xml document contains a Root node named "ROOT_NODE" or not? } CGuy You can loop with the reader through the document until you find the first element node and then check if node's name is not R...