Does a DTD change the default namespace or something?

I'm feeling very stupid about this ...

pdf2html (http://pdf2html.sourceforge.net) is an app that reads a PDF
and can generate HTML or XML; in my case I'm using the XML. The PDF I'm
working with is a concatenation of many reports; my objective is to
find the first page of each report, which I've discovered can be found
in this particular instance by looking for an xml element with a
particular attribute "left" equal to 277.

So I want to consume this XML using XPath, to find all "page" elements
that contain "text" elements that have an attribute of 277.  The XPath
expression is therefore:

"/pdf2xml/page/text[@left=277]"

Works great ... IF I change the XML output by the tool to remove the
DTD reference.  If I leave the DTD reference in there, it stops finding
any nodes. Why? Does the presence of the DTD reference automatically
assign a namespace? Do I need a XmlNamespaceManager? What do I use it
with?

Altering the input XML is not the preferred option here. I also have a
version that just uses the Reader to walk the tree ... I want to get
away from that because I eventually want to be able to specify an XPath
query as input.

My code:
Sub test()
        Dim inputfile As String = "test.xml"
        Dim r As New XmlTextReader(inputfile)
        Dim xd as New Xml.XPathDocument(r)
        Dim nav As XPath.XPathNavigator = xd.CreateNavigator()
        Dim expr As XPath.XPathExpression =
nav.Compile("/pdf2xml/page/text[@left=277]")
        Dim ni As XPath.XPathNodeIterator = nav.Select(expr)
        Do While ni.MoveNext()
            Dim node As XPath.XPathNavigator = ni.Current
            Dim ani As XPath.XPathNodeIterator = _
              node.SelectAncestors(XPath.XPathNodeType.Element, False)
            ani.MoveNext()
            Dim pagenum As Integer = ani.Current.GetAttribute("number",
"")
            Debug.WriteLine(pagenum)
        Loop
End Sub

My XML is below, showing two pages; the desired result is to get the
first page. It's actual output from pdf2html, slightly stripped and
censored.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">

<pdf2xml>
<page number="1" position="absolute" top="0" left="0" height="1188"
width="918">
<text top="805" left="277" width="0" height="18" font="0"><i><b>Person
Name</b></i></text>
<text top="805" left="298" width="0" height="18" font="0"><i><b>123
Main St</b></i></text>
<text top="805" left="319" width="0" height="18"
font="0"><i><b>Hometown, IL 60000</b></i></text>
</page>
<page number="2" position="absolute" top="0" left="0" height="1188"
width="918">
<text top="245" left="144" width="136" height="18"
font="0"><i><b>Person Name</b></i></text>
<text top="266" left="144" width="124" height="18" font="0"><i><b>123
Main St</b></i></text>
<text top="287" left="144" width="168" height="18"
font="0"><i><b>Hometown, IL 60000</b></i></text>
<text top="470" left="143" width="319" height="19"
font="1"><b>STATEMENT OF MANAGEMENT FEES</b></text>
</page>
</pdf2xml>

0
rpresser (10)
1/10/2007 7:00:56 PM
dotnet.xml 7266 articles. 0 followers. Follow

4 Replies
522 Views

Similar Articles

[PageSpeed] 33

* Ross Presser wrote in microsoft.public.dotnet.xml:
>Works great ... IF I change the XML output by the tool to remove the
>DTD reference.  If I leave the DTD reference in there, it stops finding
>any nodes. Why? Does the presence of the DTD reference automatically
>assign a namespace? Do I need a XmlNamespaceManager? What do I use it
>with?

Yes, unfortunately some DTDs declare a default namespace and cause such
confusion. If the generating tool does not itself declare the namespace
in the document, I would consider that a bug in the tool. For using the
XmlNamespaceManager, see http://msdn2.microsoft.com/en-us/d271ytdx.aspx
-- 
Bj�rn H�hrmann � mailto:bjoern@hoehrmann.de � http://bjoern.hoehrmann.de
Weinh. Str. 22 � Telefon: +49(0)621/4309674 � http://www.bjoernsworld.de
68309 Mannheim � PGP Pub. KeyID: 0xA4357E78 � http://www.websitedev.de/ 
0
bjoern1 (135)
1/10/2007 7:07:53 PM
Bjoern Hoehrmann wrote:
> * Ross Presser wrote in microsoft.public.dotnet.xml:
> >Works great ... IF I change the XML output by the tool to remove the
> >DTD reference.  If I leave the DTD reference in there, it stops finding
> >any nodes. Why? Does the presence of the DTD reference automatically
> >assign a namespace? Do I need a XmlNamespaceManager? What do I use it
> >with?
>
> Yes, unfortunately some DTDs declare a default namespace and cause such
> confusion. If the generating tool does not itself declare the namespace
> in the document, I would consider that a bug in the tool. For using the
> XmlNamespaceManager, see http://msdn2.microsoft.com/en-us/d271ytdx.aspx

The thing is, I can't figure out what namespace is being applied.
This was the DTD line in the XML file:

<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">

and this is the contents of pdf2xml.dtd:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!ELEMENT pdf2xml (page+)>
<!ELEMENT page (fontspec*, text*)>
<!ATTLIST page
	number CDATA #REQUIRED
	position CDATA #REQUIRED
	top CDATA #REQUIRED
	left CDATA #REQUIRED
	height CDATA #REQUIRED
	width CDATA #REQUIRED
>
<!ELEMENT fontspec EMPTY>
<!ATTLIST fontspec
	id CDATA #REQUIRED
	size CDATA #REQUIRED
	family CDATA #REQUIRED
	color CDATA #REQUIRED
>
<!ELEMENT text (#PCDATA | b | i)*>
<!ATTLIST text
	top CDATA #REQUIRED
	left CDATA #REQUIRED
	width CDATA #REQUIRED
	height CDATA #REQUIRED
	font CDATA #REQUIRED
>
<!ELEMENT b (#PCDATA)>
<!ELEMENT i (#PCDATA)>

Some experimentation with msxslt, by the way, did not seem to show a
need to use a namespace.

0
rpresser (10)
1/10/2007 7:17:18 PM
Ross Presser wrote:

> and this is the contents of pdf2xml.dtd:
> 
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <!ELEMENT pdf2xml (page+)>
> <!ELEMENT page (fontspec*, text*)>
> <!ATTLIST page
> 	number CDATA #REQUIRED
> 	position CDATA #REQUIRED
> 	top CDATA #REQUIRED
> 	left CDATA #REQUIRED
> 	height CDATA #REQUIRED
> 	width CDATA #REQUIRED
> <!ELEMENT fontspec EMPTY>
> <!ATTLIST fontspec
> 	id CDATA #REQUIRED
> 	size CDATA #REQUIRED
> 	family CDATA #REQUIRED
> 	color CDATA #REQUIRED
> <!ELEMENT text (#PCDATA | b | i)*>
> <!ATTLIST text
> 	top CDATA #REQUIRED
> 	left CDATA #REQUIRED
> 	width CDATA #REQUIRED
> 	height CDATA #REQUIRED
> 	font CDATA #REQUIRED
> <!ELEMENT b (#PCDATA)>
> <!ELEMENT i (#PCDATA)>
> 
> Some experimentation with msxslt, by the way, did not seem to show a
> need to use a namespace.

There is no xmlns attribute defined in that DTD.

As for your original problem with .NET code, which version of the .NET 
framework are you using?


-- 

	Martin Honnen --- MVP XML
	http://JavaScript.FAQTs.com/
0
mahotrash (1777)
1/11/2007 11:59:43 AM
Martin Honnen wrote:
> There is no xmlns attribute defined in that DTD.
>
> As for your original problem with .NET code, which version of the .NET
> framework are you using?

Version 1.1

0
rpresser (10)
1/11/2007 4:06:11 PM
Reply:

Similar Artilces:

Money Changes Transaction Dates
Hi all, How can I stop Money2004 from changing my transaction dates that I have already entered into the register? This happens when I accept/match downloaded transactions from my bank. Thanks! Well, I think I just answered my own question. I found a check box to untick under online options to fix this, but I will post back if necessary. "rustyfender04" <rustyfender1@hotmail.com> wrote in message news:eY3zaLITHHA.2124@TK2MSFTNGP06.phx.gbl... > Hi all, > > How can I stop Money2004 from changing my transaction dates that I have > already entered into t...

Copy and Paste not saving format changes
Version: 2008 Operating System: Mac OS X 10.5 (Leopard) Processor: Intel Hi, I just bought Microsoft Office for Mac early this month and am still getting used to it. Can someone help me with this: I have several documents typed onto a notebook layout with lots of bullet points, color changes, cross outs and the like. I wanted to copy and past all of that into a new document but when I did ALL of the formatting changes were lost. There were no bullet points, so the text flushed left, yada, yada, yada. I looked around the toolbar to try and find a setting that would allow me to do the cut an...

Outlook 2000 Change Startup Page
Have a problem with Outlook 2000, think it happened when I ran repair. Now everytime I open Outlook the "Outlook Today" page opens, then have to select the inbox folde to view my mail. Used to always open on the Inbox Folder. What have I done, is there a way to restore this. You’ve landed in a Macintosh group, sorry. Try asking your question on the general Office newsgroups. Start here: http://www.microsoft.com/office/community/en-us/default.mspx DooBee wrote: > Have a problem with Outlook 2000, think it happened when I ran repair. Now > everytime I open Outlook the...

Outlook 2003 offline caching changes
There has been a change in the default behavior of Outlook from version 2000 to 2003 with regard to how the offline cache responds when you have previously synchronized the cache, and then reconnect online (in our situation, it's particularly relevant when connecting over a phone line rather than over a broadband or network connection). We're wondering if there's a way to turn back the clock.. In Outlook 2000, you would connect to the online mailbox, and would view all the headers of all the messages in your Inbox, Deleted Items, etc. (you still had to wait for the head...

recording changes
I have a field called "TimeStamp" What I would like is that if someone modifiys or make any change in the form that the "TimeStamp" will record the time and date of change. How can this be achieved? sandrao In an event procedure in the form's AfterUpdate event, you could do something like: Me!txtYourTimeStampField = Now() Note that if you use the BeforeUpdate event, every time your code changes the value of this field, the form tries to update, the BeforeUpdate (re-)triggers, and you "loop up" (at least, it seems to me I've seen this...)...

Changing Paper Size
We print a monthly newsletter which includes a pullout insert. This month we want to turn this pullout into a foldable brochure. Before I send it off to the printer (who prints our newsletter to 11X17 glossy paper), I want to make sure the pullout, when folded, displays the information correctly (not upside down or out of sequence). We print to a photocopier that users 11X17 size paper. I cannot figure out a way to print two successive pages (8.5 X 11 each) to one sheet of 11X17 paper (ie., pages 2 and 4 to one sheet of paper). I tried doing this by changing the printer options to 11X17 b...

Changing source links in powerpoint
I currently have a powerpoint with source links to other powerpoint slides. These source links are currently internet address based links. I'm trying to change these current internet addresses to new internet addresses and each time I try I receive the error message: "You cannot use an internet address here. Enter a path that points to a location on your computer or on the network". I've been able to change the source links in the past and only began to encounter this problem around mid December. Does anyone know how to correct this or explain why I can no ...

Changing Costs in Detailed Sales Report
I just opened a new location and I have specified my costs in HQ Mgr under Supplier Tab in Item properties. And I also checked the Supplier tab in Item properties in the Store Ops Mgr, my cost is there. But when I ran a Detailed Sales report for yesterday, some of my items' costs are missing and listed as $0.00. So I open HQ Mgr and issued Worksheet 302 for the new location. I did a couple of test transactions today and ran a Detailed Sales report for today, and it shows the costs of the the test trasactions. But detailed sales report for yesterday still shows $0.00 costs. How do...

Changing to Combination-chart from existing chart
I have made a Line-chart containing 8 different series. I now want to change 2 of the series to Bars, but in the same chart. How do I proceed to pick the wanted series and transform them to bars? Regards Jan Hi Jan, Right click a line series that you wish to change. From the popup menu select Chart type and pick bar. Cheers Andy Jan wrote: > I have made a Line-chart containing 8 different series. I > now want to change 2 of the series to Bars, but in the > same chart. > > How do I proceed to pick the wanted series and transform > them to bars? > > Reg...

Change from MS Outlook Rich Text to HTML
I use Outlook 98 and I want to change my message format to HTML so that I can have nice backgrounds etc on my mail. For some reason this facility doesn't seem to be available to me on the 'Mail Format' Tab. It is blacked out and I cannot change it from Rich Text to HTML. Is there anything I can do to sort this out? Could it be because I am on a network at work and they have disabled it? If so, how can I enable it? Cheers! ------------------------------------------------ ~~ Message posted from http://www.ExcelTip.com/ ~~View and post usenet messages directly from http://www...

Saving html message as Draft changes text formatting...
WIN XP HE, OL 2002 Hi, I have recently noticed that whenever I write an email (using Word as editor) in html format, and instead of sending it, save it (to the drafts folder), the text itself changes format from my default to another one. It seems to change in the paragraph style which then changes the text format. The only change I recently made was to edit my signatures in html, rtf and plain text format. When I write a new email, it opens up with the signature already in it and perhaps there are format/style conflicts..? Tx for shedding some light into this. S As an added information, t...

PUZZLED
Dear All, I was never any good with maths and this just puzzles me! I have date starting from 100 down column B with the date in column A. The last number is 117.50. I make this an increase of 17.5%....however each day it may have moved down 1% or up 2%. When I summate all the daily percentage changes I was expecting to see 17.5% however I can see around 20%. The formulas are all correct but I really need to know how to work out the daily percentage change AND the cumulative percentage change from 100. I hope someone can assist me. S rounding? -- Don Guillett Microsoft MVP Excel Sal...

how do i change the default value of measure from points to inche.
how do i change the default value of measure from points to inches when setting the width and hight of cells? You don't. Excel uses only points for these measures best wishes -- Bernard Liengme www.stfx.ca/people/bliengme remove CAPS in email address "yoyo4u" <yoyo4u@discussions.microsoft.com> wrote in message news:33420157-6E05-4A55-9003-088D731E495E@microsoft.com... > how do i change the default value of measure from points to inches when > setting the width and hight of cells? yoyo Row heights are measured in points. There are 72 points to an inch. Th...

change exist chart to be dynamic
Hi all How can I change an existing chart to be dynamic? In series I saw a name in y axis only but no name of x axis. How can I create a series for x axis? Thanks in advance Daniel You need to understand how dynamic charts work. There are tutorials and links to more on this page: http://peltiertech.com/Excel/Charts/Dynamics.html - Jon ------- Jon Peltier, Microsoft Excel MVP Tutorials and Custom Solutions Peltier Technical Services, Inc. - http://PeltierTech.com _______ "Daniel" <Daniel@discussions.microsoft.com> wrote in message news:FF77A61A-58A9-4629-90E1-20742E6...

Change Backcolor for the selected line
I would like to be able to highlight or change the color of a single line on a subform formated as a continuous form. The user selects one line from many possible lines on a continuous form to respond to. When he selects the line another pop-up form is display which the user must complete. I would like to maintain some kind of highlight (backcolor?) so that the user can clearly see which line he is reponding to. Setting the backcolor after selecting a line sets the back color for the entire form, which really defeats the purpose. Any help? -- Message posted via AccessMons...

Jounal Viewer
I have read questions posted but not any answer for this issue. In the Journal Viewer - the batch look up is defaulted showing the oldest batch first rather than showing the most recent batch first. Does anyone know how to change this to default to listing the most recent batch first? Appreciate any help. ...

Default value for custom field?
How can I populate a custom field with a default value? Specifically, I have a custom field "DisplayName" associated with the Quote Detail object. I want initially to populate this field with the value of the product name field when a new product is added to a quote. The user can then edit the DisplayName custom field if desired. The DisplayName custom field will be used as the product name on a Crystal Reports quote form. ...

How do you change a field name in 2002 Excel
I have copied and pasted a whole database from Works into the Excel program but can't seem to find a way to change the field names from A, B, C, to what I want as Last Names, First Names, etc. Aarrrrgh. It can't be THAT difficult! <G> Using the HELP did nothing for me thus I am here asking this silly Q. AnnE in MN You cannot change the Column letters from A, B, C etc. You can choose to have column and row headers not shown under Tools>Options>View Enter your titles(names) in row 1 then select A2 and Window>Freeze Panes to lock row 1 in view. Gord Di...

initial default column width
Is there a way to configure Excel 2000 so that when I create a new Workbook or add a new Worksheet so that all the columns have a particular width instead of the default 64 pixels? TIA Create the workbook exactly the way you want it, then save it as a template with the name "Book.xlt" (no quotes) in you XLStart directory. It'll be then used as the template for new workbooks. Likewise, save a one-sheet workbook as a template, named "Sheet.xlt" for the template for Insert/Worksheet. In article <419E181F.6251D20D@nospam.net>, Bruceh <bruce@nospam.net&...

How to change the color of CComboBoxEx
I have derived my class from CComboBoxEx and I change the colors (SetBkColor and SetTextColor) in OnCtlColor() in response to CTLCOLOR_EDIT and CTLCOLOR_LISTBOX; I also return my brush; As a result the colors of the edit part of the combo do change but in the listbox only the space around the text is painted. The text itself and its backgound remain as the Windows defaults. "David A. Mair" <mairda@hotrmail.com> wrote in message news:<ObKF5Q#bDHA.2632@TK2MSFTNGP12.phx.gbl>... > I have it working OK, here's my CColorComboBox::OnCtlColor(): > > HBRUSH CColorCo...

Opening File using default viewer
Hello, how would I go about opening a file using the default viewer, so for example, if the file is a txt file, then notepad would open it. Or if it is an XML file, then maybe firefox woulld open it. I was hoping to use a CButton and if the user clicks on it, then it would grab the path specified and open the file with the corresponding viewer. thanks. Jon wrote: > Hello, how would I go about opening a file using the default viewer, so > for example, if the file is a txt file, then notepad would open it. Or > if it is an XML file, then maybe firefox woulld open it. > > ...

Changing Primary Attribute after creation
I cannot seem to find a way to change any aspect of the primary attribute after creation. Is this the case? It is very hard to work with. I cannot even change the requirement level after creation. This is very diffcult to work with. There's no way to change the primary attribute after creation, so define it carefully when creating a new entity. A big problem is that it's somewhat hidden on the second tab in the entity creation form and it's very easy to forget about it. Anyway, before creating a new entity, you should write down the settings that are not changeable la...

How to let OWA users to change their password ?
How to let OWA users to change their password ? On Wed, 9 Nov 2005 01:58:03 -0800, "Enid" <Enid@discussions.microsoft.com> wrote: >How to let OWA users to change their password ? There are quite a few steps but if you follow this: http://support.microsoft.com/default.aspx?scid=kb;en-us;297121 carefuly, you'll be ok. Thanks for your information. Follow the document to enable the change password funcation, but got error when using IE6 w/ SP2 to do change password "error number : 5". It is so tricky if using FireFox, the password change is work. Any suggesti...

Change from SBS 2003 to SBS 2003 R2
Does anyone know if the license from Windows SBS 2003 works on R2 Edition? Here's my situation: I have an old server running Win SBS 2003. This is a new client and i just found the installation to be all screwed up (they are not even using Exchange, the domain is not .local, I cant find the SBS management panel, etc ) and barring any magic solution that allows me to improve on the current install, what i really want is to start from scratch, probably on different hardware (new server). So I whas thinking to my buttons: During reinstallation, can I now use the latest ver...

Changes Since Moving to Exchange
Hi there; I have subfolders within the inbox, and then subfolders within the subfolders. I also have rules so that when a message comes in from a certain person the message goes into their folder. Before we moved to our new exchange server if someone sent a message and the message went to their folder, the number of unread emails in the folder changes (ie, if Joe Bloggs sends an email to me his message goes to the Joe Bloggs folder, and it looks like Joe Bloggs (1)). And the folder that the 'Joe Bloggs' is in (for this example it could be 'staff') would also...