Does a DTD change the default namespace or something?

I'm feeling very stupid about this ...

pdf2html (http://pdf2html.sourceforge.net) is an app that reads a PDF
and can generate HTML or XML; in my case I'm using the XML. The PDF I'm
working with is a concatenation of many reports; my objective is to
find the first page of each report, which I've discovered can be found
in this particular instance by looking for an xml element with a
particular attribute "left" equal to 277.

So I want to consume this XML using XPath, to find all "page" elements
that contain "text" elements that have an attribute of 277.  The XPath
expression is therefore:

"/pdf2xml/page/text[@left=277]"

Works great ... IF I change the XML output by the tool to remove the
DTD reference.  If I leave the DTD reference in there, it stops finding
any nodes. Why? Does the presence of the DTD reference automatically
assign a namespace? Do I need a XmlNamespaceManager? What do I use it
with?

Altering the input XML is not the preferred option here. I also have a
version that just uses the Reader to walk the tree ... I want to get
away from that because I eventually want to be able to specify an XPath
query as input.

My code:
Sub test()
        Dim inputfile As String = "test.xml"
        Dim r As New XmlTextReader(inputfile)
        Dim xd as New Xml.XPathDocument(r)
        Dim nav As XPath.XPathNavigator = xd.CreateNavigator()
        Dim expr As XPath.XPathExpression =
nav.Compile("/pdf2xml/page/text[@left=277]")
        Dim ni As XPath.XPathNodeIterator = nav.Select(expr)
        Do While ni.MoveNext()
            Dim node As XPath.XPathNavigator = ni.Current
            Dim ani As XPath.XPathNodeIterator = _
              node.SelectAncestors(XPath.XPathNodeType.Element, False)
            ani.MoveNext()
            Dim pagenum As Integer = ani.Current.GetAttribute("number",
"")
            Debug.WriteLine(pagenum)
        Loop
End Sub

My XML is below, showing two pages; the desired result is to get the
first page. It's actual output from pdf2html, slightly stripped and
censored.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">

<pdf2xml>
<page number="1" position="absolute" top="0" left="0" height="1188"
width="918">
<text top="805" left="277" width="0" height="18" font="0"><i><b>Person
Name</b></i></text>
<text top="805" left="298" width="0" height="18" font="0"><i><b>123
Main St</b></i></text>
<text top="805" left="319" width="0" height="18"
font="0"><i><b>Hometown, IL 60000</b></i></text>
</page>
<page number="2" position="absolute" top="0" left="0" height="1188"
width="918">
<text top="245" left="144" width="136" height="18"
font="0"><i><b>Person Name</b></i></text>
<text top="266" left="144" width="124" height="18" font="0"><i><b>123
Main St</b></i></text>
<text top="287" left="144" width="168" height="18"
font="0"><i><b>Hometown, IL 60000</b></i></text>
<text top="470" left="143" width="319" height="19"
font="1"><b>STATEMENT OF MANAGEMENT FEES</b></text>
</page>
</pdf2xml>

0
rpresser (10)
1/10/2007 7:00:56 PM
dotnet.xml 7266 articles. 0 followers. Follow

4 Replies
510 Views

Similar Articles

[PageSpeed] 0

* Ross Presser wrote in microsoft.public.dotnet.xml:
>Works great ... IF I change the XML output by the tool to remove the
>DTD reference.  If I leave the DTD reference in there, it stops finding
>any nodes. Why? Does the presence of the DTD reference automatically
>assign a namespace? Do I need a XmlNamespaceManager? What do I use it
>with?

Yes, unfortunately some DTDs declare a default namespace and cause such
confusion. If the generating tool does not itself declare the namespace
in the document, I would consider that a bug in the tool. For using the
XmlNamespaceManager, see http://msdn2.microsoft.com/en-us/d271ytdx.aspx
-- 
Bj�rn H�hrmann � mailto:bjoern@hoehrmann.de � http://bjoern.hoehrmann.de
Weinh. Str. 22 � Telefon: +49(0)621/4309674 � http://www.bjoernsworld.de
68309 Mannheim � PGP Pub. KeyID: 0xA4357E78 � http://www.websitedev.de/ 
0
bjoern1 (135)
1/10/2007 7:07:53 PM
Bjoern Hoehrmann wrote:
> * Ross Presser wrote in microsoft.public.dotnet.xml:
> >Works great ... IF I change the XML output by the tool to remove the
> >DTD reference.  If I leave the DTD reference in there, it stops finding
> >any nodes. Why? Does the presence of the DTD reference automatically
> >assign a namespace? Do I need a XmlNamespaceManager? What do I use it
> >with?
>
> Yes, unfortunately some DTDs declare a default namespace and cause such
> confusion. If the generating tool does not itself declare the namespace
> in the document, I would consider that a bug in the tool. For using the
> XmlNamespaceManager, see http://msdn2.microsoft.com/en-us/d271ytdx.aspx

The thing is, I can't figure out what namespace is being applied.
This was the DTD line in the XML file:

<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">

and this is the contents of pdf2xml.dtd:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!ELEMENT pdf2xml (page+)>
<!ELEMENT page (fontspec*, text*)>
<!ATTLIST page
	number CDATA #REQUIRED
	position CDATA #REQUIRED
	top CDATA #REQUIRED
	left CDATA #REQUIRED
	height CDATA #REQUIRED
	width CDATA #REQUIRED
>
<!ELEMENT fontspec EMPTY>
<!ATTLIST fontspec
	id CDATA #REQUIRED
	size CDATA #REQUIRED
	family CDATA #REQUIRED
	color CDATA #REQUIRED
>
<!ELEMENT text (#PCDATA | b | i)*>
<!ATTLIST text
	top CDATA #REQUIRED
	left CDATA #REQUIRED
	width CDATA #REQUIRED
	height CDATA #REQUIRED
	font CDATA #REQUIRED
>
<!ELEMENT b (#PCDATA)>
<!ELEMENT i (#PCDATA)>

Some experimentation with msxslt, by the way, did not seem to show a
need to use a namespace.

0
rpresser (10)
1/10/2007 7:17:18 PM
Ross Presser wrote:

> and this is the contents of pdf2xml.dtd:
> 
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <!ELEMENT pdf2xml (page+)>
> <!ELEMENT page (fontspec*, text*)>
> <!ATTLIST page
> 	number CDATA #REQUIRED
> 	position CDATA #REQUIRED
> 	top CDATA #REQUIRED
> 	left CDATA #REQUIRED
> 	height CDATA #REQUIRED
> 	width CDATA #REQUIRED
> <!ELEMENT fontspec EMPTY>
> <!ATTLIST fontspec
> 	id CDATA #REQUIRED
> 	size CDATA #REQUIRED
> 	family CDATA #REQUIRED
> 	color CDATA #REQUIRED
> <!ELEMENT text (#PCDATA | b | i)*>
> <!ATTLIST text
> 	top CDATA #REQUIRED
> 	left CDATA #REQUIRED
> 	width CDATA #REQUIRED
> 	height CDATA #REQUIRED
> 	font CDATA #REQUIRED
> <!ELEMENT b (#PCDATA)>
> <!ELEMENT i (#PCDATA)>
> 
> Some experimentation with msxslt, by the way, did not seem to show a
> need to use a namespace.

There is no xmlns attribute defined in that DTD.

As for your original problem with .NET code, which version of the .NET 
framework are you using?


-- 

	Martin Honnen --- MVP XML
	http://JavaScript.FAQTs.com/
0
mahotrash (1777)
1/11/2007 11:59:43 AM
Martin Honnen wrote:
> There is no xmlns attribute defined in that DTD.
>
> As for your original problem with .NET code, which version of the .NET
> framework are you using?

Version 1.1

0
rpresser (10)
1/11/2007 4:06:11 PM
Reply:

Similar Artilces:

MS word as default email editor
this is the first time i am using windows mail. in all earlier versions like outlook / express etc, i know this option where you can use word as default editor for composing all new mail messages. this ensures that we can format / allign / bullet etc just like in word. However, this windows mail thing is not showing me this option of using word as default editor for composing all new messages / replies. please suggest / advise. i am talking about the POP windows mail available with Vsta in the start menu. "needhelp" wrote: > this is the first time i am using wi...

Unread emails automatically change to read after a few minutes
I have a weird problem that just started today with my email. All new emails (and even the emails I change back to Unread) automatically change to read after about 4 minutes. Now before anyone jumps to conclusion on the cause - The Reading Pane is turned OFF, Auto preview is turned off, I don't have options checked in the Reading Pane window of Tools>Option>Other>Outlook Panes, and I have NO rules whatsoever. I haven't changed anything or updated anything but it started sometime after shutting down the email client the day before and starting it this morning....

Select All Pictures and Change Wrapping Style to Behind Text
I am trying to get a macro that can change the wrapping style to behind text for all the pictures in the document. It would also be great if I could get this macro to change a lot of other properties too. For example, I want all the photos behind text, center horizontal alignment, height = 4", width = proportional. It would also be great if this macro could add a text box under each picture which was grouped with its respective picture. Any advice on any of this would be greatly appreciated. Funny you should ask this now. I was just fighting with the same problem yeste...

Change reply-to address in outlook 2003 exchange cllient
How do you change the reply-to address of an eMail that originates in outlook 2003 as the client to an exchange server. If outlook was a client to an MAPI ISP, there is a reply-to address in the profile. Where do I set the reply-to address as an outlook-exchange client? Thx John John Lenz <lenz4@mindspring.com> wrote: > How do you change the reply-to address of an eMail that originates in > outlook 2003 as the client to an exchange server. If outlook was a > client to an MAPI ISP, there is a reply-to address in the profile. > > Where do I set the reply-to address...

Changing cursor (sizing problem)
Hi, I've created a cursor, and MSVC6 allows me to make it 16x16 pixels (what I want). Then I set the cursor at runtime: HCURSOR lhCursor=AfxGetApp()->LoadCursor(IDC_NEWCURSOR); SetCursor(lhCursor); However, when running, it's size is 32x32. How to I get it to be the 16x16 I wanted? >Hi, I've created a cursor, and MSVC6 allows me to make it 16x16 pixels >(what I want). Then I set the cursor at runtime: > >HCURSOR lhCursor=AfxGetApp()->LoadCursor(IDC_NEWCURSOR); >SetCursor(lhCursor); > >However, when running, it's size is 32x32. How to I get it to b...

Allow changing a recalled quote to a layaway
When you recall a quote you only have the option to have the customer pickup the whole quote or change the quote to a workorder. It would be useful to have the option to create a layaway from the recalled quote. Marc Beverly's Pet ---------------- This post is a suggestion for Microsoft, and Microsoft responds to the suggestions with the most votes. To vote for this suggestion, click the "I Agree" button in the message pane. If you do not see the button, follow this link to open the suggestion in the Microsoft Web-based Newsreader and then click "I Agree" in the...

change the year in a list of dates ex 1/3/2003 to 1/3/2004
I have a list of dates, all in the year 2003 1/3/2003, and 5/17/2003, I want to change just the year to 1/3/2004 and 5/15/2004 in the whole list. How do I do that? say the dates you have now are in col A in a new column, type =a1+365 and copy down then do a /copy/special/values on the new column so it becomes static and no longer refers to col A... then delete col A and move teh new column to Col a work on a copy of your file to make sure I have it right... "Elaine" <Elaine@discussions.microsoft.com> wrote in message news:57252856-7A95-4C46-8B96-10E4D127EAB9@microsof...

How to Delete or Edit System Views
i have several sytem (i.e.public) views in our CRM 3.0c installation that I need to either delete or modify. I (as admin) cannot do either. These are not the Default system-defined views, but views that were created after the initial installtion. We upgraded from CRM 2 to 3 and these may have initially been created in CRM 2. Does anyone know how I can make them editable again? ...

Outlook express 6 not default email client for Internet Explorer 6?
There is no option in Internet Explorer 6 in the programs tab under internet options to support outlook express 6 it only gives the standard Microsoft Outlook and in the newsgroups section it gives an option to support outlook express.....need help want to send links directly with outlook express using the mail icon in internet explorer 6. ...

Is there a way to sort multiple columns with a tab or something?
I have a 4 column spreadsheet. I want to be able to click the heading for each of the columns to hav them sort by that column if clicked. How do I do that -- Message posted from http://www.ExcelForum.com Hi why not used the soprt icon for this. Note: This could screw up your sorting if Excel does not recognize your database columns correctly -- Regards Frank Kabel Frankfurt, Germany > I have a 4 column spreadsheet. > > I want to be able to click the heading for each of the columns to have > them sort by that column if clicked. How do I do that? > > > --- > Mess...

XML getting the namespace
Hi, I'm working on a ASP.NET component with one requirement being to catch Web Service exceptions. I've managed to get it working using the following code XmlDocument doc = new XmlDocument(); doc.LoadXml(se.Detail.OuterXml); XmlNamespaceManager nsManager = new XmlNamespaceManager(doc.NameTable); nsManager.AddNamespace("prefix", "http://localhost/site/test/ webservice.asmx"); string errorMessage = doc.DocumentElement.SelectSingleNode("// prefix:ErrorMessage", nsManager).InnerText; Response.Write("Error message:" + errorMessage); The issue i...

How can I change spacing after periods and commas automatically
How can I change spacing after periods and commas automatically? What do you want to change it to? On Jan 4, 12:00=A0am, Spacing after commas and periods <Spacing after commas and peri...@discussions.microsoft.com> wrote: > How can I change spacing after periods and commas automatically? You can't. Either press the spacebar as you type, or use the replace function to add whatever spacing you want. Note that with proportionally spaced fonts single spacing is correct. -- <>>< ><<> ><<> <>>< ><<> <>...

Can I change the organizer of an Outlook 2007 meeting?
If a meeting is already set up by another user, is there a way to change the organizer (by the original organizer or another) instead of having to delete the original and resending out a new one? Thanks! No. -- Diane Poremsky [MVP - Outlook] Outlook Tips: http://www.outlook-tips.net/ Outlook & Exchange Solutions Center: http://www.slipstick.com/ Outlook Tips by email: mailto:dailytips-subscribe-request@lists.outlooktips.net EMO - a weekly newsletter about Outlook and Exchange: mailto:EMO-NEWSLETTER-SUBSCRIBE-REQUEST@PEACH.EASE.LSOFT.COM Poll: What version of Outl...

Setting Outlook as Default Mail Handler
I am using Outlook 2000 and Outlook Express 5 for mail. Right now OE is set as the default mail handler. How do I set OL 2000 as the default? Thanks, Ian -----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- http://www.newsfeeds.com - The #1 Newsgroup Service in the World! -----== Over 100,000 Newsgroups - 19 Different Servers! =----- Control panel, Internet Options, Programs tab. Ian Stock wrote: > I am using Outlook 2000 and Outlook Express 5 for mail. Right now OE > is set as the default mail handler. How do I set OL 2000 as the > default? > > Thanks, &g...

Changing e-mail program
Is there a way to change the e-mail program that is begin used by excel? Hi John You can change it in the Internet options(Programs tab) on the control panel -- Regards Ron de Bruin (Win XP Pro SP-1 XL2002 SP-2) www.rondebruin.nl "John A" <anonymous@discussions.microsoft.com> wrote in message news:0dc501c3a9e5$9b2ba6c0$a301280a@phx.gbl... > Is there a way to change the e-mail program that is begin > used by excel? ...

Entering a date that then gets changed to next weekday Thursday's date, say?
I don't want to make the current sheet I'm working on too complicated, so prefer not to go the macro route. Was hoping a formula would take care of this (?). If I enter a date via "^;", it would be so nice if the sheet knew to change the date to the following Thursday's date. i.e., when I type ^; into the currently empty date cell, it puts today's date of "2006.12.27.Wed" which I then stop to fix to nearest Thursday. Instead, it would be very helpful if a formula or something non-macro did that for me and changed it to the nearest Thursday's date, in...

How do I make "No Header Row" the default in 'Sort Ascending'
I opften sort a range of cells using the Sort Ascending arrow. The majority of the time the first row dose not sort, and when I got to Data, sort The "Hesre Row" is ticked. How do I make the default 'No Header Row' and avoid using Data, Sort Thanks DT I don't think you can set the default for an alternative see the macros at http://www.mvps.org/dmcritchie/excel/sorting.htm#icons --- HTH, David McRitchie, Microsoft MVP - Excel [site changed Nov. 2001] My Excel Pages: http://www.mvps.org/dmcritchie/excel/excel.htm Search Page: http://www.mvps.org/dmcritch...

Display name change
My problem is this: I need email for just a few users to show up as shown below. This should be seen in the "From:" field on the recipient's email client. John Doe (Contractor) <jdoe@widgets.com> Currently it shows up in the recipient mailbox as: John Doe <jdoe@widgets.com> I tried changing the display name but it doesn't seem to work any differently. Is this possible? Seems like it should be relatively easy, but I guess I'm missing something somewhere. Dave display name is in fact the attribute you need to change. Once you change it, it should show u...

Changing account from which mail is sent
I have several valid email addresses from which I send, depending upon the nature of the email. How can I change, on the fly, not changing the default account, the account from which I send emails? Thanks... Also, how can I click on Send/Receive, and have it do just that without clicking on "all accounts" as in outlook express? Thanks again! What version of Outlook "Peter G" <peeg67@hotmail.com> wrote in message news:005101c3be68$aeadd550$a101280a@phx.gbl... > I have several valid email addresses from which I send, > depending upon the nature of the e...

Dynamically Copying Changing Conditional Formatting
Is it possible to dynamically copy the formatting of a 1st cell whose format changes to another 2nd cell. Not just on creation, but anytime the original cell changes formatting either conditionally or manually? (Excel 2003, XP) Thanks! -- sdm ...

Changing Account Distribution Based on Customer Region
My chart of accounts is segmented by Product and Region. I have setup SOP to use posting Account from Item. In addition, I have setup a User-defined field to identify each customer region in Customer Address. Since I can only have one sale account for an item, I don't want to manually change my account distribution in sales documents everytime I sell to customers in a different region from the default defined for the item. I would like GP to automatically change the Account Distribution so that it substitute the appropriate region segment based on the user-defined field in Customer...

Default User Defined Functions
I have some user defined functions that I use quite often. I want to have all my user defined functions accessable by any workbook that I start at any time. I do not want to have to open a module that contains the functions and copy and paste then inside the new workbook. How con I make all my user defined functions show up in the user defined function section by default? Thanks, Eric Save them as *.xla (save as>add-in), then check them under tools>add-ins Regards, Peo sjoblom "flycast" wrote: > I have some user defined functions that I use quite often. I want to ...

Change Directory
Hi, I to know how to progammatically change the directory my powerpoint saves. The macro i'm writing is saving a backup copy in a different folder to my main slide. Only the VBA changeFileOpenDirectory doesn't work in powerpoint, what is the correct code for powerpoint please? In article <02B65A30-79E3-466D-8CC1-CFC43075DF48@microsoft.com>, Jenn wrote: > Hi, I to know how to progammatically change the directory my powerpoint > saves. The macro i'm writing is saving a backup copy in a different folder to > my main slide. Only the VBA changeFileOpenDirecto...

Time change
Can anyone tell me why my exchange server is time stamping my mail an hour before it's processed on the server. The server clock shows the right time. I set up a sync with a standard time server some weeks ago, but since daylight savings kicked in the timestamp on mail shows it to be processes an hour earlier than it really is. How do I resync the time in Exchange? Kim Check the time zone settings on the server and the clients....make sure Automatically adjust for daylight savings is checked... "Kim" <Kim@discussions.microsoft.com> wrote in message news:D7D48C64...

In Excel- change the order of a name? Bill Gates to Gates, Bill
What is the formula for changing the order of a name? Assume your name is in cell A1: =CONCATENATE(RIGHT(A1,LEN(A1)-SEARCH(" ",A1)),", ",LEFT(A1,SEARCH(" ",A1))) -- Regards, Dave "arnold36" wrote: > What is the formula for changing the order of a name? ...