Nov 122010
 

If you do a web search for how to use the LINQ XPathSelectElement function, the example code you will find falls into two categories:  Those that don’t use namespaces at all, and those that use the same namespace prefix in the query as in the original document.

Using no XML namespace at all when querying XML just doesn’t fly in the real world.  If you’re using any XML spec that isn’t your own, you will need to use namespaces to make sure the XYZ element you get back from the query is the brand or flavor of XYZ that you intended.  If you’re writing your own XML document schema, it’s critical to declare your namespaces!

The most nonintuitive aspect of working with XML is realizing that default namespaces are namespaces too, and apply to everything in the document or subtree that doesn’t have an explicit namespace.  Even though your XML document uses nice clean simple XYZ elements everywhere, if you try to query for XYZ using XPath you’ll get nothing but nulls back.  Your XYZ elements are under a default namespace in the document, so XPath will only return matches if you specify namespaces in your query as well.

What really kills me is when the sample code for making an XPath query using namespaces uses the same namespace prefix in the query as in the original XML document.  This gives the reader the impression that the namespace prefix strings in their queries have to match the original document.  That’s completely false.  The text of the prefix string is completely irrelevant – all that matters is what URI the prefix string represents, and that the URI of the namespace used in your query matches the URI of the namespace used in the XML document.

Take this C# example code offered up on none other than the MSDN documentation for XPathSelectElement:

string markup = @"
<aw:Root xmlns:aw='http://www.adventure-works.com'>
    <aw:Child1>child one data</aw:Child1>
    <aw:Child2>child two data</aw:Child2>
</aw:Root>";
XmlReader reader = XmlReader.Create(new StringReader(markup));
XElement root = XElement.Load(reader);
XmlNameTable nameTable = reader.NameTable;
XmlNamespaceManager namespaceManager = new XmlNamespaceManager(nameTable);
namespaceManager.AddNamespace("aw", "http://www.adventure-works.com");
XElement child1 = root.XPathSelectElement("./aw:Child1", namespaceManager);
Console.WriteLine(child1);

The “aw” namespace prefix used in the calls to AddNamespace and XPathSelectElement doesn’t need to match the “aw” prefix used in the original document.  The URI that the prefixes are associated with need to match, but that is all.

Here’s the sample again using different prefix strings:

string markup = @"
<aw:Root xmlns:aw='http://www.adventure-works.com'>
    <aw:Child1>child one data</aw:Child1>
    <aw:Child2>child two data</aw:Child2>
</aw:Root>";
XmlReader reader = XmlReader.Create(new StringReader(markup));
XElement root = XElement.Load(reader);
XmlNameTable nameTable = reader.NameTable;
XmlNamespaceManager namespaceManager = new XmlNamespaceManager(nameTable);
namespaceManager.AddNamespace("ns", "http://www.adventure-works.com");
XElement child1 = root.XPathSelectElement("./ns:Child1", namespaceManager);
Console.WriteLine(child1);

This second code snippet executes exactly the same as the first, but in my opnion is much clearer as to what’s going on because it doesn’t imply that the prefix strings must match in order for the query to work.

Why is this important?  Because when you’re processing an XML document, you usually know at compile time what namespace(s) you’re interested in working with elements of.  Just bind the URI of the namespace you care about to whatever prefix string you want and use that prefix string in your queries.  XPathSelectElement will figure out the mapping between your prefix string in the scope of your query and the equivalent prefix string in the scope of the XML document.

You don’t have to care what namespace prefix the document creator used to write the XML document, and you really don’t want to have to parse the XML document just to figure out what prefix string is bound to the namespace URI you care about.

Sorry, the comment form is closed at this time.