reading xhtml documents offline

Hi!
I'm trying to parse an xhtml document like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><title>

</title></head>
<body>
	<dl>
		<dt>NAME</dt><dd>VARIANTATTRIBUTE1</dd><dt>TYPE</dt><dd>STRING</dd><dt>VALUE</dt><dd>HEY</dd><dt>LOCKVALUE</dt><dd>31314D32444F2F485A4E6C414A47474B62734B5352456F676A2F673D</dd><dt>HOME</dt><dd><a href="/v1/boards/DNUMBOARD05/wgs/01/variants/B1/attributes/VARIANTATTRIBUTE1">VARIANTATTRIBUTE1</a></dd><dt>VARIANT</dt><dd><a href="/v1/boards/DNUMBOARD05/wgs/01/variants/B1">B1</a></dd>
	</dl>
</body>
</html>

The problem is that XmlDocument.Load("<!..."); tries to access the DTD and this takes ages. On the other hand, the browser can display this page quickly.
Is it possible at all to read this xhtml document offline?
Can I somehow tell the Document object the contents of the DTD? then I could store it as a string in the assembly.

Lots of Greetings!
Volker
-- 
For email replies, please substitute the obvious.
0
10/7/2008 3:09:27 PM
dotnet.xml 7266 articles. 0 followers. Follow

2 Replies
667 Views

Similar Articles

[PageSpeed] 54

Volker Hetzer schrieb:
> Hi!
> I'm trying to parse an xhtml document like this:
[...]
> The problem is that XmlDocument.Load("<!..."); tries to access the DTD 
> and this takes ages. On the other hand, the browser can display this 
> page quickly.
> Is it possible at all to read this xhtml document offline?
> Can I somehow tell the Document object the contents of the DTD? then I 
> could store it as a string in the assembly.
Found out.
There is a nice explanation with a downloadable project here:
http://blogs.pingpoet.com/overflow/archive/2005/07/20/6607.aspx .
I just had to modify it not to create its own DTD.

Lots of Greetings!
Volker
-- 
For email replies, please substitute the obvious.
0
10/7/2008 3:41:10 PM
Volker Hetzer wrote:

> I'm trying to parse an xhtml document like this:
> 
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml">
> <head><title>
> 
> </title></head>
> <body>
>     <dl>
>         
> <dt>NAME</dt><dd>VARIANTATTRIBUTE1</dd><dt>TYPE</dt><dd>STRING</dd><dt>VALUE</dt><dd>HEY</dd><dt>LOCKVALUE</dt><dd>31314D32444F2F485A4E6C414A47474B62734B5352456F676A2F673D</dd><dt>HOME</dt><dd><a 
> href="/v1/boards/DNUMBOARD05/wgs/01/variants/B1/attributes/VARIANTATTRIBUTE1">VARIANTATTRIBUTE1</a></dd><dt>VARIANT</dt><dd><a 
> href="/v1/boards/DNUMBOARD05/wgs/01/variants/B1">B1</a></dd>
>     </dl>
> </body>
> </html>
> 
> The problem is that XmlDocument.Load("<!..."); tries to access the DTD 
> and this takes ages. On the other hand, the browser can display this 
> page quickly.
> Is it possible at all to read this xhtml document offline?

It depends on whether the document references entities defined in the 
DTD. If it does not do that then doing
   XmlDocument doc = new XmlDocument();
   doc.XmlResolver = null
   doc.Load("doc.xhtml");
should suffice to avoid that the XML parser fetches the DTD.

If the document however references entitities with e.g. &auml; then you 
would get an error about an a reference to an undefined entity if the 
parser has not fetched the DTD.

In that case if you want to speed up parsing your XHTML documents you 
should store local copies of the DTD and all files it references and 
then write your own XmlResolver (for instance by subclassing 
XmlUrlResolver) that makes sure that the local copies are fetched when 
the identifiers like "-//W3C//DTD XHTML 1.0 Transitional//EN" are resolved.

> Can I somehow tell the Document object the contents of the DTD? then I 
> could store it as a string in the assembly.

That has been done I think, google for .NET XmlResolver assembly.



-- 

	Martin Honnen --- MVP XML
	http://JavaScript.FAQTs.com/
0
mahotrash (1778)
10/7/2008 3:47:21 PM
Reply:

Similar Artilces:

Reading .wks file
Greetings...according to the Excel "help" file, as well as the file extension listing, I *should* be able to read an *.wks file, but Excel insists that it cannot. I am pretty sure the file was created in Microsoft Works. Is there a converter somewhere that I can download/install? Cheers - S2 Excel can read Works 2.0, not later. You have to save them in Works as excel files or Works 2.0 or get a commercial converter. -- Regards, Peo Sjoblom "Skip Stocks" <anonymous@discussions.microsoft.com> wrote in message news:AFC110E0-641D-4D87-9464-B930CC41CF02@microsoft....

File won't open as read only
I have a file that is in use, but another person opens it and it doeasn't display the "file is in use" message. Is there a setting or fix? Hi have you shared this file? -- Regards Frank Kabel Frankfurt, Germany John wrote: > I have a file that is in use, but another person opens it > and it doeasn't display the "file is in use" message. Is > there a setting or fix? The file is on a network share. The share has all the appropriate permissions. >-----Original Message----- >Hi >have you shared this file? > >-- >Regards >Frank Ka...

Read mail arn't marked as read anymore
After an SP upgrade of my Office 2000 the priviewed mail doesnt get marked as read anymore. I have tried to change the time (Tools->Options->Priview pane) from 2 -> 3 -> 4 seconds but nothing works. The only way to mark a mail as read is either to open it or right click it and chose Mark as read. Since I only use the priview pane this is very anoying for me. Is this a bug or has some setting changed with the SP? ...

About: Using IHtmlElementRender to print HTML document?
I have realized the printing preview/output to printer in my project. but there is a problem, when the HTML document is larger than paper restriction, I need to do pagination with the document, How do I? I ever tried to use the 'Print-template' method which can be found from MSDN, but however I can't use the method, because I reform the behavior of IWebBrowser2 using asynchronously plug-in technic, to render html document on screen, some images and CSS file is read from memory, Thus, once I use 'Print-template' method that can't parse images and CSS, whole layout...

How to give other users read-only access to Calendar
I want to allow the group Everyone to have read-only access to a calendar in a certain mailbox. I can do this by granting the permission 'Full mailbox access' (under 'Mailbox rights', under 'Exchange Advanced', for the particular user). However this also allows people to to do everything (ie: they become read-write users). I notice that every mailbox in the system has 'Read permissions' granted to group Everyone. This does not allow other people to open items in the mailbox, but as I understand it, permits Exchange Server and Outlook to do shared meetin...

Outlook 2003 goes Offline
why does outlook 2003 running in cached mode go offline, other than the obvious (no network connection), it seems that I have clients become chronic offliners all of the time On Tue, 6 Sep 2005 10:12:11 -0500, "Mostro" <oveloz@glasfloss.com> wrote: >why does outlook 2003 running in cached mode go offline, other than the >obvious (no network connection), it seems that I have clients become chronic >offliners all of the time > I havent seen that unless there is a network/global catalog/exchange connectivity issue. All I can think of is the obvious. If y...

Excel adds 1 to the title of my document
This is a multi-part message in MIME format. ------=_NextPart_000_000F_01C3DC33.7D9B5300 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Why is it that when I open up a document up in Excel it adds a 1 after = the title? Then when I=20 go to save it, it doesn't want to save to the document that I want. ------=_NextPart_000_000F_01C3DC33.7D9B5300 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML&g...

Long time offline Database
Hi, I heard that a long time offline set database in one storage group isn't best practice but I can not remember the reason or problems. Have anyone a short answer for this Thank you ...

Shortcut documents
I've a customer who wants to link documents (information) into MS CRM (Accounts or Notes for Accounts). Somebody out there who can help me with how I could do this in an easy way? Hans I. Letnes ...

reading confirmation
Good day, I have a problem with outlook. When they send a message to me that demands the shipment of a reading confirmation, even if I accept, the reading confirmation does not come received from the sender. Someone knows from what depends and in which way I can resolve the problem? Thanks for the eventual answers. Niki In news:eht7fo$251$1@fata.cs.interbusiness.it, Niki <nicola.pantaleo@yahoo.it> typed: > Good day, > > I have a problem with outlook. When they send a message to me that > demands the shipment of a reading confirmation, even if I accept, the > read...

"Unblock" feature should be optional when reading e-mail in CRM
When viewing e-mail messages in CRM, a line appears saying "Unblock" to allow the full message content to be read. Can this be made a configurable server or security setting? We are trying to reduce "clicks" as much as possible. ---------------- This post is a suggestion for Microsoft, and Microsoft responds to the suggestions with the most votes. To vote for this suggestion, click the "I Agree" button in the message pane. If you do not see the button, follow this link to open the suggestion in the Microsoft Web-based Newsreader and then click "I Agre...

The memory could not be "read".
I'm at work yesterday afternoon responding to e-mail on my desktop, and Adobe pops up from the toolbar with a notice that there is an update available for Acrobat. I accept the download, keep plugging away at my e-mail, and install it when it's done. It runs through the install, tells me that I should restart, but I ignore it and keep working. It's near the end of the day and I'm going to be going home soon anyways. I forgot to turn it off when I left. Came in this morning, restarted it through the Start menu, and rebooted. Upon reaching the "Ctrl+Alt+Del" ...

publisher document to pdf format without horizontal lines
I created a map with text boxes, and pictures and use Print to pdf to create an unalterable map for general users. However, when I do this there are multiple orizontal grey lines on the pdf output. How do I save a publisher document as a pdf document without these lines? Are you printing to Acrobat or some other PDF printer? If you print your publication to a regular printer, does the output have the lines? Can you send the file to me? I will test your file to see if I get the same result. mary-sauer at columbus.rr.com -- Mary Sauer MSFT MVP http://office.microsoft.com/ http://...

how can i copy a document to a CD without making it read only?
HOW CAN I COPY A DOCUMENT TO A CD WITHOUT MAKING IN READ ONLY? You can't. It is not the file, but the media, that is read only. Even CD-RW media does not allow editing a file on the CD. Copy te file from CD to HD, mak edits and if a CD-RW you should be able to burn the edited file back to the CD. hth "DON" wrote: > HOW CAN I COPY A DOCUMENT TO A CD WITHOUT MAKING IN READ ONLY? ...

Unable to Read Japanese Email
I correspond with several Japanese users and can read emails from some of them without a problem, but emails from others are nothing but a series of ?????. In one case, I can read one email but not another from the same sender. Changing the Encoding doesn't help. Sending emails in Japanese to them without a problem...they can read it fine. Would appreciate any suggestions/solutions as I'm stumped. I'm using Outlook 2003 on a Windows XP Home operating system. Japanese language support is installed. ...

Transferring read e-mail to another folder
Hello I was wondering whether there was a way in which I could automatically transfer my e-mail that has been read into another folder, such as "Old e-mail" or something like that? Thanks ...

Outlook not marking read emails as read
Hi - We have 1 computer that is doing the oddest thing, anytime the customer reads the last email in his box, exit's out of email and then comes back in, that email is now marked as unread - it's the weirdest thing I have seen in a long time. I uninstalled office XP, rebooted, then reinstalled and applied the 2 service packs, hoping that would fix it, but it didn't. So now I'm stuck and was wondering if anybody out there has any thoughts on what to do.... thanks! Gerri Urban gurban@ci.broomfield.co.us ...

Suggested reading
Any suggested reading for Access 2003 VBA? I have both "Step by Step Access 2003" And "Microsoft Access 2003 VBA for Dummies" (how Ironic) and niether have been very helpful. Try this book. It's a winner: Access 2003 VBA Programmer's Reference by Patricia Cardoza, Teresa Hennig, Graham Seach, and Armen Stein http://www.amazon.com/Access-2007-Programmers-Reference-Programmer/dp/0470047038/ref=sr_1_1/104-1181757-2327103?ie=UTF8&s=books&qid=1185824619&sr=8-1 -- Arvin Meyer, MCP, MVP http://www.datastrat.com http://www.mvps.org/access http://www.access...

How to customize the appearance of a multiple document peoject ?
Hi, How to customize the appearance of a multiple document peoject ? I mean the color of borders , dockbar areas, toolbar background, etc . Thanks There is no magic trick for this. You will need to do each individually unless you go towards themes(whatever limitations it may have). --- Ajay "new" <mnasiri@sharif.edu> wrote in message news:%23HJM329aIHA.4208@TK2MSFTNGP04.phx.gbl... > Hi, > How to customize the appearance of a multiple document peoject ? I mean > the color of borders , dockbar areas, toolbar background, etc . > > Thanks > > Thank...

Message(s) Not Being Marked As Read
I have "Mark message read after displayed for" '0' seconds checked. But in my Junk e-mail folder the messages don't get marked as read when I select them, only when I open them. I'm assuming this is because there isn't a the reading pane for the Junk e-mail folder even though I have "Show reading pane" checked in "Layout" a reading pane doesn't display in the Junk email right pane. Is this normal? I'd rather just click on the message instead of opening the message to mark it as read. James > I'd rather just c...

How to remove automatic hyperlinks in Word documents?
Version: 2008 Operating System: Mac OS X 10.6 (Snow Leopard) This is driving me nuts...I thought I'd turned them off ( in Preferences) but these automatic hyperlinks keep showing up in my documents whenever I type in an e-mail address, and it's driving me to distraction! <br> Please tell me, is there not someway to turn them all off?? Forever and ever amen? <br> Thank you. Well, first confirm that you *did* turn off the right ones: Word> Preferences> AutoCorrect - Replace As You Type, Clear the check on 'Internet & network paths with hyperlinks&#...

Controling READ ONLY and READ/WRITE mode when opening a project PS
Hello All, I was looking for a way to force users to select between READ ONLY and READ/WRITE instead of it defaulting to READ/WRITE when opening projects in MS Project 2007. Any suggestions would be a huge help. Thanks, Eric Eric -- Short of using custom software development, there is no way to force this issue with your PMs. If you want to try the custom software development route, then please repost your message in the microsoft.public.project.developer newsgroup. Otherwise, make this a training and performance issue with your PMs. Hope this helps. -- Dale A. Ho...

Read/Not read
Hello We have an exhange-server environment. The Boss' secretary need's to be able to read the Boss' mail, without the messages being marked as read, within outlook 2003. She can access the Boss mail, but all mail she reads is being marked as read, hence the boss can not figure out what he has seen/not seen. What is the solution? Thanx a lot /Jan Hi Towli. There is no way to marked as unread automatically, Just she should be tick the unread option on the pop up menu after she open the her boss e-mail. Once you right click button one of e-mail on the e-mail list, you w...

read an ascii file with fopen
I try to open with fopen and read an ascii file, line by line, but get garbage - among the right data in the CString variable that is filled with this line data. Can someone copy&paste the right code how to so that? Thanks in advance. Mark "Mark" <mark@chasan.ar> wrote in message news:%23sPmEzsgGHA.2208@TK2MSFTNGP05.phx.gbl... > I try to open with fopen and read an ascii file, line by line, but get > garbage - among the right data in the CString variable that is filled with > this line data. > > Can someone copy&paste the right code how to so that?...

Programatically reading a XSD File
Hello, Let us say I have a schema file like this sample below. How would I using ..NET classes be able to read this XSD file and get all the values for each element, such as "name", "type", "minoccurs" etc.,? I would appreciate if somebody can help me with some sample code. Thanks for your help. Ganesh ********************* <?xml version="1.0" standalone="yes"?> <xs:schema id="Account_Did" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata&...