Handling multiple schemas and large files in XML

Hi

I hope that this is the correct place to post this question.

I'm looking at developing an application which will enable me to import
and process some data that is made available to me as XML.

One complication is that the providers of the data have published two
different schema versions. Whilst effectively describing the same data,
the 2nd schema is a significant refactoring of the first and so is
almost totally different in structure. I also can't rule out the
possibility that they will issue further versions too. I'd ideally like
to be able to handle both of these schemas and I also like to be able to
support for new versions with the minimum of fuss.
 From knowlege of the application domain, I am also fairly sure that the
essential data will be stable change across schema versions.

I originally considered defining a class for each schema version and
using the XmlSerializer class to construct the appropriate one from the
xml document. However, this is where another potential issue raises it's
head: the xml files are rather large: 50+ Mb and over 1 million lines.

I suspect that using the XmlSerializer with documents of this size is
probably not appropriate. Am I correct?

Thankfully, it's not necessary to load the entire document in one go as
the user won't need to visualise *all* the data at once. Instead, they
will home into a section of the data and drill down for detail in
tree-like fashion. Because of this, the application's internal object
model can represent just the data that the user is interested in.

Bearing this in mind, I could construct the object model by using an
XmlTextReader and analysing XmlTextReader.NodeType. The downside to this
is that AIUI, I will then have to manually handle the schema differences.

I'd appreciate it if anyone could suggest better approaches. I'm fairly
new to both .NET and XML so please point out if I'm completely off the
mark here. Any suggestions at all are greatly appreciated.

TIA
MikeB
0
9/21/2008 7:40:10 AM
dotnet.xml 7266 articles. 0 followers. Follow

2 Replies
684 Views

Similar Articles

[PageSpeed] 31

MikeB wrote:

> I originally considered defining a class for each schema version and
> using the XmlSerializer class to construct the appropriate one from the
> xml document. However, this is where another potential issue raises it's
> head: the xml files are rather large: 50+ Mb and over 1 million lines.
> 
> I suspect that using the XmlSerializer with documents of this size is
> probably not appropriate. Am I correct?

If you deserialize an XML document with XmlSerializer then you get .NET 
objects held in memory. It is hard to tell how much memory a 50 MB 
document consumes, you will have to run some tests and of course you 
will also have to take into account what kind of systems the users of 
your application have. Nowadays they are selling PC systems with 3 GB of 
RAM so I wouldn't rule out completely that you can use XmlSerializer to 
deserialize your large XML.


> Bearing this in mind, I could construct the object model by using an
> XmlTextReader and analysing XmlTextReader.NodeType. The downside to this
> is that AIUI, I will then have to manually handle the schema differences.

Note that with .NET 2.0 XmlTextReader is deprecated, you should create 
an XmlReader with XmlReader.Create and proper XmlReaderSettings.
Other than that you are right, XmlReader works fast but forwards only 
maintaining a low memory footprint that way so it is the .NET XML API 
for parsing large XML documents.
You can however combine XmlReader and other APIs like 
XPathDocument/XPathNavigator or or XmlSerializer or LINQ to XML (in .NET 
3.5) to process the whole document with XmlReader but pass subtrees on 
to other APIs to have more comfort or power to extract the data you are 
looking for.
For instance with LINQ to XML you have XNode.ReadFrom
http://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.readfrom.aspx
to consume a subtree.

-- 

	Martin Honnen --- MVP XML
	http://JavaScript.FAQTs.com/
0
mahotrash (1778)
9/21/2008 10:49:08 AM
On 21 Sep, 11:49, Martin Honnen <mahotr...@yahoo.de> wrote:
> MikeB wrote:
> > I originally considered defining a class for each schema version and
> > using the XmlSerializer class to construct the appropriate one from the
> > xml document. However, this is where another potential issue raises it'=
s
> > head: the xml files are rather large: 50+ Mb and over 1 million lines.
>
> > I suspect that using the XmlSerializer with documents of this size is
> > probably not appropriate. Am I correct?
>
> If you deserialize an XML document with XmlSerializer then you get .NET
> objects held in memory. It is hard to tell how much memory a 50 MB
> document consumes, you will have to run some tests and of course you
> will also have to take into account what kind of systems the users of
> your application have. Nowadays they are selling PC systems with 3 GB of
> RAM so I wouldn't rule out completely that you can use XmlSerializer to
> deserialize your large XML.
>
> > Bearing this in mind, I could construct the object model by using an
> > XmlTextReader and analysing XmlTextReader.NodeType. The downside to thi=
s
> > is that AIUI, I will then have to manually handle the schema difference=
s.
>
> Note that with .NET 2.0 XmlTextReader is deprecated, you should create
> an XmlReader with XmlReader.Create and proper XmlReaderSettings.
> Other than that you are right, XmlReader works fast but forwards only
> maintaining a low memory footprint that way so it is the .NET XML API
> for parsing large XML documents.
> You can however combine XmlReader and other APIs like
> XPathDocument/XPathNavigator or or XmlSerializer or LINQ to XML (in .NET
> 3.5) to process the whole document with XmlReader but pass subtrees on
> to other APIs to have more comfort or power to extract the data you are
> looking for.
> For instance with LINQ to XML you have XNode.ReadFromhttp://msdn.microsof=
t.com/en-us/library/system.xml.linq.xnode.readfro...
> to consume a subtree.
>
> --
>
> =A0 =A0 =A0 =A0 Martin Honnen --- MVP XML
> =A0 =A0 =A0 =A0http://JavaScript.FAQTs.com/

Martin. Thanks for that.
/MikeB
0
9/23/2008 5:53:40 AM
Reply:

Similar Artilces:

Exchange 2K to 2K3 migration
I'd like to get some advice on the best approach to handle the STM file during an Inter-Org mailbox move. The Exchange migration wizard will handle the the mailbox but how do you handle the content in the STM file? Thanks. You don't need to worry about it. Mailbox data is stored in both the EDB and STM files. Typically, message content from the Internet will be in the STM file. I'd suspect that when you migrate the mailboxes, all data will be converted to native MAPI format (moved to EDB file) and will then be moved to the new mailbox. -- Ben Winzenz Exchange MVP Me...

lists w/ multiple worksheets q
I am trying to create what I think is a strange spreadsheet and at this point I do not even know if it is possible. I have 3 worksheets that I have named database, GUI, & totals. The database sheet contains a a number of rows about a item the user can select. The first column is the name and the next 10 or so columns are specific information about that item. The GUI worksheet has a new row for each day. Then I created a list of all the items in the database worksheet. The user can then on the GUI worksheet select a number of different items from the list (each item is a different co...

How to use Windows Media Player control to play avi file
Hello, I need to play an avi file in my program using the media player activex control.I need to be able to indicate a starting frame and an end frame that the control will stop playing after reaching it. Thanks in advance ...

Multiple monitor usage
I have a Lenovo laptop with Vista Home Premium. Has anyone tried to operate with multiple monitors? Can't seem to have an arrangement that allows the laptop monitor to have a different appearance than the other monitor. Had no problems working with 2 monitors when using XP. Any suggestions? This is a newsgroup for Microsoft Access a relational database. I suggest posting to a newsgroup dedicated to your software. -- KARL DEWEY Build a little - Test a little "Roger H. Sirois" wrote: > I have a Lenovo laptop with Vista Home Premium. > > Has anyone tried to o...

multiple emails sent after anti-virus installed
I installed Norton Anti Virus 2004. Now, whenever I send an email with an attachment, my recipients get multiple copies - sometimes as many as 12 - but the attachments are unreadable. I have Outlook 2000 on Windows 98 SE. From memory, in your account options (or under Tools | Options | Mail Delivery), do you have the option set to split a message once it reaches a specific size? Other things to check for... * Make sure message format is set to plain text or html (rich text is proprietary and only outlook understands it) * Consider disabling Microsoft Word as your e-mail editor * Check...

Simple XML Log
I want to receive an XML document via an aspx page and log the raw xml to a text file. I've been looking at this for a while and can't work out how to do it. Can someone tell me how to grab the raw xml text (once I've got the text I can do the logging bit) Thanks HHoulston wrote: > I want to receive an XML document via an aspx page and log the raw > xml to a text file. I've been looking at this for a while and can't > work out how to do it. Can someone tell me how to grab the raw xml > text (once I've got the text I can do the logging bit)? Assumin...

Handle to an ActiveX control
Hi... What if I use GetModuleHandle(L"abc.ocx") in the InitInstance() of the of the App class derived from 'COleControlModule'...of the ActiveX control "abc.ocx" ? Will I get the Handle or it will return NULL....? I have encountered controls, in some of them..handle is found and in others its NULL... Wat is the reason behind this behavior..? Can any one explain? "Abby++" <asthana.abhinav@gmail.com> wrote in message news:1174308443.062622.51500@b75g2000hsg.googlegroups.com... > What if I use GetModuleHandle(L"abc.ocx") in the InitI...

Unable to open file
I have Money 2007 - has been working fine until today. When I open I get a message "Microsoft Money has encountered a problem and needs to close. We are sorry for the inconvenience". When I use "Click here" for more details - I get module name: Unknown. I have tried both types of repairs, I unistalled and reinstalled the software; rebooted the machine a few times - all to no avail. I'm extremely frustrated. Please help me... In microsoft.public.money, asg wrote: >I have Money 2007 - has been working fine until today. When I open I get a >message "...

File is locked for Editing by user problem
We have a spreadsheet on a network folder than whenever it is opened, we get message "File In Use. <<file name>> is locked for editing by <username>" with the read-only and Notify buttons. Problem is, the specified user does NOT have the file open. We've made sure no one has the file open, we had that particular user reboot his system and not attempt to open the file again. Even when this particular user DOES open the file, even he also still gets the same message. We cannot seem to find a way to unlock the file for editing now. I can copy the file to my ...

Importing a .VCS file
When I attempt to import a .VCS file created by Lotus Organizer 6.0 Outlook gives me an error message stating that I do not have the rights to import such a file. I was able to import one calendar event that was saved as a .VCS file. Any help would be appreciated. Thanks. ...

I'm looking to put outlook files from my pc to my laptop for work
I have a home pc that I use for work and a laptop I also bring to work in my vehicle. I want access to my e-mail that I bring in through Outlook on both. How do I transfer the Files from one to the other? Plumbvz <Plumbvz@discussions.microsoft.com> wrote: > I have a home pc that I use for work and a laptop I also bring to > work in my vehicle. I want access to my e-mail that I bring in > through Outlook on both. How do I transfer the Files from one to the > other? http://www.slipstick.com/outlook/sync.htm http://www.howto-outlook.com/howto/backupandrestore.htm -- Bri...

multiple data
I need a chart that has Collections Charges Units For 3 years Broken out by payor mix. I have tried to do the stacked chart thinking y axis would have dollars, x axis would have years 1-3 and then for each year I would have 3 bars (each stacked by payor) for collections, charges and units. I can get this to run but only for one year. I cannot get 3 bars per year. I am certain it is something to do with the way I have highlighting the data range and or the series. We have worked all weekend getting the data and now cannot determine how to run chart. Please help! ...

Help!
This morning I had to format my HDD and re-install windows. I backed up my Money files onto a memory stick and have now re-installed Money (2003 standard). I cannot, however, restore the back-up. I go into File/restore back up. I select the back-up file and click on 'restore'. I then get a window asking me where I want to file the restored back-up file and gives me the default option in 'My Documents'. If I then click on restore - all that happens is that the back-up file is duplicated into My Documents - nothing goes into 'Money' itself. I'm becoming despe...

Pivot Table--How can I create from multiple sheets?
Dear Steven: Thank you very much for your reply and advice. I am curious now how to create a Pivot Table from multiple sheets. Whenever I try it fails or doesn't allow me to access the Pivot Table/Pivot Chart Report menu option. If you or anyone else has any insight on this, please let me know. Thanks. -- -=- penciline -=- "steven1001" wrote: > > If you organise your data as follows: > > Date source Account Value > 1/2/06 Cash News 2.50 > 1/2/06 Debit Food 21.50 > 1/2/06 Charge Clothes 52.50 > 1/2/06 Char...

Excel
I have 65 text files which I want to import into one Excel worksheet s I can run a report on the data. Is this possible -- Message posted from http://www.ExcelForum.com Hi if you're looking for a way to automatically import several text files, this posting may help you: http://tinyurl.com/35fyc Note: Excel has a row limit (65536 maximum). depending on the size of your text files you max exceed this limit Frank > I have 65 text files which I want to import into one Excel worksheet > so I can run a report on the data. Is this possible? > > > --- > Message posted from ...

invalid handle
hi frds i m very new in vc++ and i m founding too much difficulty in this so plz can any one solve this problem .... actully i m inserting a image in list box using this code ........ everything is returning write thing still it is saying invalid handle after ImageList_Add(hList,m_hBmpNew,0); when i m going to dibug it plz help me BOOL Fun() { // Create 256 color image lists HIMAGELIST hList = ImageList_Create(32,32, ILC_COLOR8 , 8, 1); HBITMAP m_hBmpNew = (HBITMAP) LoadImage( AfxGetInstanceHandle(), // handle to instance "c:\\img.bmp", /...

Excel XP: File name in Title Bar not changed after Save As...
Hi, Situation: I open Excel 2002 (XP), choose a file to open (eg "myfile.xls") and work in it. I save a copy via File / Save as. I name it "mynewfile.xls" and click OK. File correctly saved; the original still exists on disk. The filename appears correctly in the File menu in the list of recently used files. But: the information in the Excel's title bar didn't change: it still displays the filename of the first file. If I understand Excel well, it saves a copy of the original file to a new one, and keeps working with the new one. So the new filename should app...

Multiple Background Shades in Graph
Hi, I'm trying to create graph with multiple background shades. i.e if i were trying to create a graph of the price of a stock, when the stock has had an upward trend the background would be a different color than that of when the stock has had a downward trend. Also if the colors of the background would only be on top of the stock plot. Any help would be greatly appreciated. Thanks, ask72883 ...

Thunking a 32-bit HANDLE to a 64-bit HANDLE
Currently I am converting a 32-bit WDM driver to a 64-bit KMDF driver that will continue to work with our 32-bit DLL and our customer's 32-bit applications. The sample code for thunking 32-bit items shows the following Buffer->Handle = (HANDLE)Buffer32->Handle; see: http://msdn.microsoft.com/en-us/library/aa489604.aspx Buffer32->Handle is declared as UINT32 Handle The driver compiler issues error number C4312 for this cast. The code that I have adopted (to get it to compile) is: handlerInputs.hEvent = (HANDLE) (ULONG_PTR) p_handlerInputs3...

How can I display a message for users on file open?
I am interested in creating a default instructional reminder for all users of a spreadsheet I have created. How can I do this and have the reminder displayed each time the file is opened? 1. With the workbook open, press Alt-F11. 2. From the menu, choose View, then Project Explorer. 3. Open the section called "ThisWorkbook" 4. Type the follwing: Sub MyMessage end Sub >-----Original Message----- >I am interested in creating a default instructional reminder for all users of >a spreadsheet I have created. How can I do this and have the reminder >displayed each time...

Opening files in Excel 2003
Hello Everyone, I have this problem when trying to open a file in Excel. It takes along time to drill down down in the folder list. Any ideas? Thanks -Mark ...

Sorting Columns on HTML file created in Excel
Hi.. I'm not sure if I'm just searching for the wrong words, but can't find the answer to this.. I've created an excel file, about 15 columns.. In excel, I can sort the columns easily, but, I've saved the file as HTML now. How can I make it that those columns can still be sortable on the internet? Is there special code I need to put somewhere? I've seen javascript codes that can do this, but I would like to do it all via excel if possible. Any help appreciated! Thanks, -- Rob Hi Terri and Rob, You could include the JavaScript (Bookmarklet) within your HTML cod...

Daily Bank Sweep
New GP Client that is reconciling for the first time their operating account. We have entered the last reconciled balance and dates as of 12/31/08. Operating account gets swept every night and redeposited the next day. With each sweep interest is calculated. My question is how to handle the last sweep of the month, which is a sweep in transit. We have tried entering as a decrease adjustment (not posting to GL as it should not effect the GL balance). We have also tried entering as an adjustment to see what the effect is. We don't even come close to the bank ending balance. Pl...

Converting files to excel
I was wondering if anyone could tell me how to convert a .bsf file to an excel file??? I don't know what a .bsf file is, but someone is trying to get me some information and we keep having to use the fax. Email is so much easier, but I can't open the file. Any insite would be greatly appreciated. Have a nice day... creativeamy Are you sure that is not .csv file? Do you data for cells separated by a comma, semi-colon or a TAB (which you won't see) One line of data per table row. Check these http://filext.com/detaillist.php?extdetail=BSF http://www.quartus.com/software/i...

Handling blank data points
I have a chart which is has "" in a formula to clear contents when not applicable to show error. This results in the chart treating the cell as 0 and therefore ugly result in data point. All other post responses to this type of question suggest using NA() and conditional formation to hide the error.. Unfortunately when this is used this screws up my SUM() and AVG() formulas.. Any other suggestions? Thanks Jo Hi Jo, I don't know if this is the best way but what I have done under similar circumstances is use the #N/A for the chart series data column and then I use a hel...