Using XML in the .NET Framework
The Extensible Markup Language (XML) is the latest offering in the world of data access. Microsoft has been actively supporting this language since its conception. XML provides a universal way for exchanging information between organizations. Its structure makes it perfect for online applications and working with data residing on the local or remote data sources.
Please take note that ADO.NET is not coded in XML but that ADO.NET revolves around XML. Some readers may confuse the terms. Microsoft has integrated the XML technology in its .NET Framework rather tightly. The core foundation of the entire ADO.NET architecture is built upon XML. The ADO.NET itself is not coded in XML; however, it provides the facilities to apply various existing and emerging XML technologies to manipulate data and information. The System.XML namespace offers perhaps the richest collection of classes for generating, transmitting, processing, and storing information via XML. In this chapter, we will first have a brief introduction to the structural components of an XML document. Then we will look into the architecture of the XML objects in the .NET Framework. Finally, we will study several major XML.NET objects with many examples.
XML is fast becoming a standard for data exchange in the next generation’s Internet applications. XML allows user-defined tags that make XML document handling more flexible than HTML, the conventional language of the Internet. Since XML is the heart and soul of ADO.NET, sound knowledge of XML is imperative for developing applications in ASP.NET. The following section touches on some of the basic concepts of XML.
The idea behind XML is surprisingly simple. The major objective is to organize information in such a way so that human beings can read and comprehend the data and its context; also, the document itself is technology and platform independent. Consider the following text file:
Obviously, it is difficult to understand exactly what information the above text file contains. Now consider the XML document shown in Figure 8.1. The code is available in the Catalog1.xml file on the accompanying CD.
The above document is the XML’s way of representing data contained in a product catalog. It has many advantages. It is easily readable and comprehendible, it is self-documented, and it is technology independent. Most importantly, it is quickly becoming the universally acceptable data container and transmission format in the current information technology era. Well, welcome to the exciting world of XML!
We can use Notepad to create an XML document.VS.NET offers an array of tools packaged in the XML Designer to work with XML documents. We will demonstrate the usages of the XML Designer later. Right now, go ahead and open the Catalog1.xml file from the CD that accompanies this book in IE 5.0 or higher. You will see that the IE displays the document in a very interesting fashion with drill-down features as shown in Figure 8.2.
The system will display two tabs for two views: the XML view and the Data view of your XML document. These views are shown in Figures 8.3 and 8.4. The XML Designer has many other tools to work with. We will introduce these later in this chapter.
Declaration Each XML document may have the optional entry <?xml version=“1.0”?>. This standard entry is used to identify the document as an XML document conforming to the W3C (World Wide Web Consortium) recommendation for version 1.0.
Schema or Document Type Definition (DTD) In certain situations, a schema or DTD may precede the XML document. A schema or DTD contains the rules about the elements of the document. For example, we may specify a rule like “A product element must have a ProductName, but a ListPrice element is optional.” We will discuss schemas later in the chapter.
Elements An XML document is mostly composed of elements. An element has a start-tag and end-tag. In between the start-tag and end-tag, we include the content of the element. An element may contain a piece of character data, or it may contain other elements. For example, in the Catalog1.xml, the Product element contains three other elements: ProductId, ProductName, and ListPrice. On the other hand, the first ProductName element contains a piece of character data like Shimano Calcutta.
Root Element In an XML document, one single main element must contain all other elements inside it. This specific element is often called the root element. In our example, the root element is the Catalog element. The XML document may contain many Product elements, but there must be only one instance of the Catalog element.
Attributes Okay, we agree that we didn’t tell you the whole story in our first example. So far, we have said that an element may contain other elements, or it may contain data, or both. Besides these, an element may also contain zero or more so-called attributes. An attribute is just an additional way to attach a piece of data to an element. An attribute is always placed inside the start-tag of an element, and we specify its value using the “name=value” pair protocol.
Let us revise our Catalog1.xml and include some attributes to the Product element. Here, we will assume that a Product element will have two attributes named Type and SupplierId. As shown in Figure 8.5, we will simply add the Type=“Spinning Reel” and SupplierId=“5” attributes in the first product element. Similarly, we will also add the attributes to the second product element. The code shown in Figure 8.5 is also available in the accompanying CD Let us not get confused with the “attribute” label! An attribute is just an additional way to attach data to an element. Rather than using the attributes, we could have easily modeled them as elements as follows:
At the initial stage, the necessity of an attribute may appear questionable. Nevertheless, they exist in the W3C recommendation, and in most situations these become handy in designing otherwise-complex XML-based systems.
Empty Element We have already mentioned a couple of times that an element may contain other elements, or data, or both. However, an element does not necessarily have to have any of them. If needed, it can be kept totally empty. For example, observe the following element:
The empty element is a correct XML element. The name of the element is Input. It has three attributes: type, id, and runat. However, neither does it contain any sub-elements, nor does it contain any explicit data. Hence, it is an empty element. We may specify an empty element in one of two ways:
At first sight, an XML document may appear to be like a standard HTML document with additional user-given tag names. However, the syntax of an XML document is much more rigorous than that of an HTML document. The HTML document enables us to spell many tags incorrectly (the browser would just ignore it), and it is a free world out there for people who are not case-sensitive. For example, we may use <BODY> and </Body> in the same HTML document without getting into trouble. On the contrary, there are certain rules that must be followed when we develop an XML document. Please, refer to the http://W3C.org Web site for the details. Some basic rules, among many others are as follows:
An XML document that is syntactically correct is often called a “well-formed” document. If the document is not well formed, Internet Explorer will provide an error message. For example, the following XML document will receive an error message, when opened in Internet Explorer, just because of the case sensitivity of the tag <product> and </Product>.
An XML document may be well formed, but it may not necessarily be a valid XML document. A valid XML document is a document that conforms to the rules specified in its Document Type Definition (DTD) or Schema. DTD and Schema are actually two different ways to specify the rules about the contents of an XML document. The DTD has several shortcomings. First, a DTD document does not have to be coded in XML. That means a DTD is itself not an XML document. Second, the data-types available to define the contents of an attribute or element are very limited in DTD. This is why, although VS.NET allows both DTD and schema, we will present only the schema specification in this chapter. The W3C has put forward the candidate proposal for the standard schema specification (www.w3.org/XML/Schema#dev). The XML Schema Definition (XSD) specification by W3C has been implemented in ADO.NET.VS.NET supports the XSD specifications.
A schema is simply a set of predefined rules that describe the data contents of an XML document. Conceptually, it is very similar to the definition of a relational database table. In an XML schema, we define the structure of an XML document, its elements, the data types of the elements and associated attributes, and most importantly, the parent-child relationships among the elements. We may develop a schema in many different ways. One way is to enter the definition manually using Notepad. We may also develop schema using visual tools, such as VS.NET or XML Authority. Many automated tools may also generate a rough-cut schema from a sample XML document (similar to reverse-engineering).
If we do not want to code a schema manually, we may generate a rough-cut schema of a sample XML document using VS.NET XML Designer. We may then polish the rough-cut schema to conform to our exact business rules. In VS.NET, it is just a matter of one click to generate a schema from a sample XML document. Use the following steps to generate a rough-cut schema for our Catalog1.xml document:
That’s all! The systems will create the schema named Catalog1.xsd. If we double-click on the Catalog1.xsd file in the Solution Explorer, we will see the screen as shown in Figure 8.6. We will see the DataSet view tag and the XML view tag at the bottom of the screen. We will elaborate on the DataSet view later in the chapter.
For discussion purposes, we have also listed the contents of the schema in Figure 8.7. The XSD starts with certain standard entries at the top. Although the code for an XSD may appear complex, there is no need to get overwhelmed by its syntax. Actually, the structural part of an XSD is very simple. An element is defined to contain either one or more complexType or simpleType data structures. A complexType data structure nests other complexType or simpleType data structures. A simpleType data structure contains only data.
In our XSD example (Figure 8.7), the Catalog element may contain one or more (unbounded) instances of the Product element. Thus, it is defined to contain a complexType structure. Besides containing the Product element, it may also contain other elements (for example, it could contain an element Supplier). In the XSD construct, we specify this rule using a choice structure as follows:
Because the Product element contains further elements, it also contains a complexType structure. This complexType structure, in turn, contains a sequence of ProductId, and ListPrice. The ProductId and the ListPrice do not contain further elements. Thus, we simply provide their data types in their definitions. The automated generator failed to identify the ListPrice element’s text as decimal data. We converted its data type to decimal manually. The complete listing of the Catalog.xsd is shown in Figure 8.7. The code is also available in the accompanying CD.
In an XML document, the data are stored in a hierarchical fashion. A hierarchy is also referred to as a tree in data structures. Conceptually, the data stored in the Catalog1.xml can be represented as a tree diagram, as shown in Figure 8.8. Please note that certain element names and values have been abbreviated in the tree diagram, mostly to conserve real estate on the page.
In this figure, each rectangle is a node in the tree. Depending on the context, a node can be of different types. For example, each product node in the figure is an element-type node. Each product node happens to be a child node of the catalog node. The catalog node can also be termed as the parent of all product nodes. Each product node, in turn, is the parent of its PId, PName, and Price nodes.
In this particular tree diagram, the bottom-most nodes are not of element-type; rather, these are of text-type. There could have been nodes for each attribute and its value, too, although we have not shown those in this diagram.
The Product nodes are the immediate descendants of the Catalog node. Both Product nodes are siblings of each other. Similarly, the PId, PName, and Price nodes under a specific product node are also siblings of each other. In short, all children of a parent are called siblings.
At this stage, you may have been wondering why we are studying the family history rather than ASP. Well, you will find out pretty soon that all of these terminologies will play major roles in taming the beauties and the beasts of something called XML technology.
The entire ADO.NET Framework has been designed based on XML technology. Many of the ADO.NET data-handling methodologies, including DataTables and DataSets, use XML in the background, thus keeping it transparent to us. The .NET Framework’s System.Xml namespace provides a very rich collection of classes that can be used to store and process XML documents. These classes are also often referred to as the XML.NET.
Before we get into the details of the XML.NET objects, let us ask ourselves several questions. As ASP NET developers, what kind of support would we need from .NET for processing XML documents? Well, at the very least, we would like .NET to assist us in creating, reading, and parsing XML documents. Anything else? Okay, if we have adequate cache, we would like to load the entire document in the memory and process it directly from there. If we do not have enough cache, then we would like to read various fragments of an XML document one piece at a time. Do we want more? How about the ability for searching and querying the information contained in an XML document? How about instantly creating an XML document from a database query and sending it to our B2B partners? How about converting an XML document from one format to another format and transmitting it to other servers? Actually, XML.NET provides all of these, and much more! All of the above questions fall into two major categories:
As mentioned earlier, XML is associated with a growing family of technologies and frameworks. The major trends in this area are W3C DOM, XSLT, XPath, XPath Query, SAX, and XSLT. In XML.NET, Microsoft has incorporated almost all of these frameworks and technologies. It has also added some of its own unique ideas. There is a plethora of alternative XML.NET objects to satisfy our needs and likings. However, it’s a jungle out there! In the remainder of this section, we will have a brief glance over this jungle.
Two primary classes in this group are XmlReader and XmlWriter. Both of these classes are abstract classes, and therefore we cannot create objects of these classes. Microsoft has provided a number of concrete implementations of both of these classes:
We may create objects of these classes and use their methods and properties. If warranted, we may also extend these classes to provide further specific functionalities. Fortunately, the XmlWriter class has only one concrete implementation: XmlTextWriter. It can be used to write XML document on a forward-only basis. These classes and their relationships are shown in Figure 8.9.
Once XML data are read, we need to structure these data in the computer’s memory. For this purpose, the major offerings include the XmlNode class and the XPathDocument class. The XmlNode class is an abstract class. There are a number of concrete implementations of this class, too, such as the XmlDocument, XmlAttribute, XmlDocumentFragment, and so on. We will limit our attention to the XmlDocument class, and to one of its subsequent extensions named the XmlDataDocument. The characteristics of some of these classes are as follows:
Above classes are essentially used for storing the XML data in the cache. Just storing data in the memory serves us no purpose unless we can process and query these data. The .NET Framework has included a number of classes to operate on the cached XML data. These classes include XPathNavigator, XPathNodeIterator, XSLTransform, XmlNodeList, etc. These classes are shown in Figure 8.10.
Once an instance is created, the imaginary cursor is set at the top of the document. We may use its Read() method to extract fragments of data sequentially. Each fragment of data is distantly similar to a node of the underlying XML tree. The NodeType property captures the type of the data fragment read, the Name property contains the name of the node, and the Value property contains the value of the node, if any. Thus, once a data fragment has been read, we may use the following type of statement to display the node-type, name, and value of the node.
The attributes are treated slightly differently in the XmlTextReader object. When a node is read, we may use the HasAttributes property of the reader object to see if there are any attributes attached to it. If there are attributes in an element, the MoveToAttribute(i) method can be applied to iterate through the attribute collection. The AttributeCount property contains the number of attributes of the current element. Once we process all of the attributes, we need to apply the MoveToElement method to move the cursor back to the current element node. Therefore, the following code will display the attributes of an element:
Microsoft has loaded the XmlDocument class with a variety of convenient class members. Some of the frequently used methods and properties are AttributeCount, Depth, EOF, HasAttributes, HasValue, IsDefault, IsEmptyElement, Item, ReadState, and Value.
In this section, we will apply the XMLTextReader object to parse and display all data contained in our Catalog2.xml (as shown in Figure 8.5) document. The code for this example and its output are shown in Figures 8.11 and 8.12, respectively. The code shown in Figure 8.12 is available in the accompanying CD. Our objective is to start at the top of the document and then sequentially travel through its nodes using the XMLTextReader’s Read() method. When there is no more data to read, the Read() method returns “false.” Thus, we are able to build the While myRdr.Read() loop to process all data. Please review the code (Figure 8.12) and its output cautiously. While displaying the data, we have separated the node-type, node-name, and values using colons. Not all elements have names or values. Hence, you will see many empty names and values after respective colons.
In the previous section, we extracted and displayed all data, including the “whitespaces” contained in an XML document. Now, we will illustrate an example where we will navigate through the document and pick up only those data that are necessary for an application. The output of this application is shown in Figure 8.13. In this example, we will display the names of our products in a list box. We will load the list box using the Product Name data from the XML file. The user will select a particular product. Subsequently, we will search the XML document to find and display the price of the product. We will travel through the XML file twice, once to load the list box, and once to find the price of a selected product. Please be aware that we could have easily developed the application by building an array or arraylist of the products during the first pass through the XML data, thus avoiding a second pass. Nevertheless, we are reading the file twice just to illustrate various methods and properties of the XmlTextReader object.
To load the List Box, we will go through the following process: We will load the list box in the Page_Load event. Here, we will read the nodes one at a time. If the node type is of element-type, we will check if its name is ProductName. If it is a ProductName node, we will perform a Read() to get to its text node and then apply the myRdr.ReadString() method to extract the value and load it in the list box. Finally, we will close the reader object. Caution: We are assuming that there is no “whitespace” between the ProductName and its Text node. If there is a “whitespace,” we will need to put the second Read() in a loop until the node-type is Text.
To find the price of the selected product, we will go through the following process: We will include the necessary code in the “unclick” event code of the command button “Show Price.” We will create a second XmlTextReader object based on the catalog2.xml file. Of course, we may scan all nodes sequentially to find the price. However, the XmlTextReader class enables you to skip undesirable nodes, such as the “whitespace” or the declaration nodes via the MoveToContent() method. According to Microsoft, all nonwhitespace, Element, End Element, EntityReference, and EndEntity nodes are content nodes. The MoveToContent() method checks whether the current node is a content node. If the node is not a content node, then the method skips to the next content node. You need to be careful though. If the current node happens to be a content node, the cursor does not move to the next content node automatically on a further MoveToContent().
Initially, when we instantiate the reader object, its node type is None. It happens to be a noncontent node. Hence our first MoveToContent() statement takes us to a content node. There, we check if it is an Element-type node named “ProductName” and if its ReadString() is equal to the name of the selected product. If all are true, then we apply a Read() to go to the next node. This Read() may take us to a “whitespace” node, and thus we have applied a MoveToContent()to get to the ListPrice node. Figure 8.14 shows an excerpt of the relevant code. The complete code is available in XmlTextReader2.aspx file in the CD.
The XmlTextWriter class is a concrete implementation of the XmlWriter abstract class. An XmlTextWriter object can be used to write data sequentially to an output stream, or to a disk file as an XML document. The data to be written may come from the user’s input and/or from a variety of other sources, such as text files, databases, XmlTextReaders, or XmlDocuments. Its major methods and properties include Close, Flush, Formatting, WriteAttribues, WriteAttributeString, WriteComment, WriteElementString, WriteElementString, WriteEndAttribute, WriteEndDocument, WriteState, and WriteStartDocument.
In this section, we will collect user-given data via an .aspx page, and write the information in an XML file. The run-time view of the application is shown in Figure 8.15. On the click event of the “Create XML File,” the application will create the XML file (in the disk) and display it back in the browser as seen in Figure 8.16.
We have included the necessary code in the click event of the command button. Our objective is to write the data in a disk file named Customer.xml. In the code, first we have created an instance of the XmlTextWriter object as follows:
The second parameter “Nothing” is specified to map the file to a UTF-8 format. Then it is just a matter of writing the various elements, attributes, and their values judiciously. Once the file is written, we simply employed the Response.Redirect(Server.MapPath(“Customer.xml”)) to display the XML documents information in the browser. The complete code for the application is shown in Figure 8.17. Both Customer.xml and XmlTextWriter1.aspx files are available in the accompanying CD.
The W3C Document Object Model (DOM) is a set of specifications to represent an XML document in the computer’s memory. Microsoft has implemented the W3C Document Object Model via a number of .NET objects. The XmlDocument is one of these objects. When an XmlDocument object is loaded, it organizes the contents of an XML document as a “tree” (as shown in Figure 8.18). Whereas the XMLTextReader object provides a forward-only cursor, the XmlDocument object provides fast and direct access to a node. However, a DOM tree is cache intensive, especially for large XML documents.
An XmlDocument object can be loaded from an XmlTextReader. Once it is loaded, we may navigate via the nodes of its tree using numerous methods and properties. Some of the frequently used members are the following: DocumentElement (root of the tree), ChildNodes (all children of a node), FirstChild, LastChild, HasChildNodes, ChildNodes.Count (# of children), InnerText (the content of the sub-tree in text format), Name (node name), NodeType, and Value (of a text node) among many others.
If needed, we may address a node using the parent-child hierarchy. The first child of a node is the ChildNode(0), the second child is ChildNode(1), and so on. For example, the first product can be referenced as DocumentElement .ChildNodes(0). Similarly, the price of the second product can be addressed as DocumentElement.ChildNodes(1).ChildNodes(2).InnerText.
In this example we will implement our product selection page using the XML document object model. The output of the code is shown in Figure 8.19.
Let’s go through the process of loading the XmlDocument (DOM tree). There are a number different ways to load an XML Document object. We will load it using an XmlTextReader object. We will ask the reader to ignore the “whitespaces” (more or less to conserve cache). As you can see from the following code, we are loading the tree in the Page_Load event. On “PostBack”, we will not have access to this tree. That is why we are storing the “tree” in a Session variable. When the user makes a selection, we will retrieve the tree from the session, and search its node for the appropriate price.
Next, let’s investigate how to retrieve the price of a selected product. On click of the Show Price button, we simply retrieve the tree from the session, and get to the Price node directly. The SelectedIndex property of the list box does a favor for us, as its Selected Index value will match the corresponding child’s ordinal position in the Catalog (DocumentElement). Figure 8.20 shows an excerpt of the relevant code that is used to retrieve the price of a selected product. The complete code is available in the XmlDom1.aspx file in the accompanying CD.
A tree is composed of nodes. Essentially, a node is also a tree because it contains all other nodes below it. A node at the bottom does not have any children; hence, most likely it will be of a text-type node. We will employ this phenomenon to travel through a tree using a VB recursive procedure. The primary objective of this example is to travel through DOM tree and display the information contained in each of its nodes. The output of this exercise is shown in Figure 8.21.
1. DisplayNode(node As XmlNode) It will receive a node and check if it is a terminal node. If the node is a terminal node, this subprocedure will print its contents. If the node is not a terminal node, then the subprocedure will check if the node has any attributes. If there are attributes, it will print them.
2. TravelDownATree(tree As XmlNode) It will receive a tree, and at first it will call the DisplayNode procedure. Then it will pass the sub-tree of the received tree to itself. This is a recursive procedure. Thus, it will actually fathom all nodes of a received tree, and we will get all nodes of the entire tree printed.
The complete listing of the code is shown in Figure 8.22. The code is also available in the file named XmlDom2.aspx in the accompanying CD. As usual, we will load the XmlDocument in the Page_Load() event using an XmlTextReader. After the DOM tree is loaded, we will call the TravelDownATree recursive procedure, which will accomplish the remainder of the job.
The XmlDataDocument class is an extension of the XmlDocument class. It more-or-less behaves almost the same way the XmlDocument does. The most fascinating feature of an XmlDataDocument object is that it provides two alternative views of the same data, the “XML view” and the “relational view.” The XmlDataDocument has a property named DataSet. It is through this property that XmlDataDocument exposes its data as one or more related or unrelated DataTables. A DataTable is actually an imaginary table-view of XML data. Once we load an XmlDataDocument object, we can treat it as a DOM tree, or we can treat its data as a DataTable (or a collection of DataTables) via its DataSet property. Figure 8.23 shows the two views of an XmlDataDocument. Because these views are drawn from the same DataDocument object, these are automatically synchronized. That means that any changes in any one of them will change the other. In this section, we will provide three examples.
In this section we will load an XmlDataDocument using our Catalog2.xml file. After we load it, we will retrieve the product names and load them in a list box. The output of this example is shown in Figure 8.24. The code for this application is listed in Figure 8.25, and it is also available in the file named XmlDataDocument1 .aspx in the accompanying CD.
The XmlDataDocument is a pleasant object to work with. In this example, the code is pretty straightforward. After we have loaded the XmlDataDocument, we have declared an XmlNodeList collection named productNames. We have populated the collection by using the GetElementsByTgName(“ProductName”) method of the XmlDataDocument object. Finally, it is just a matter of iterating through the productNames collection and loading each of its members in the list box.
At this stage, you will probably ask why we are not finding the unit price of the selected product. Actually, therein lies the beauty of the XmlDataDocument. Because it has extended the XmlDocument class, all of the members of the XmlDocument class are also available to us. Thus, we could use the same technique as shown in our previous example to find the price. Nevertheless, the reason for not showing the searching technique here is that we will cover it later when we discuss the XPathIterator object.
In this example, we will process and display the Catalog3.xml document’s data as a relational table in a DataGrid. The Catalog3.xml is exactly the same as Catalog2.xml except that it has more data. The Catalog3.xml file is available in the accompanying CD. The output of this example is shown in Figure 8.26.
If we want to process the XML data as relational data, we need to load the schema of the XML document first. We have generated the following schema for the Catalog3.xml using VS.NET. The schema specification is shown in Figure 8.27 (also available in the accompanying CD).
Since the DataDocument provides two views, we have exploited its DataSet.Table(0) property to load the DataGrid and display our XML file’s information in the grid. The complete listing of the code is shown in Figure 8.28. The code is also available in the XmlDataDocDataSetl.aspx file in the accompanying CD.
In many instances, an XML document may contain nested elements. Suppose that a bank has many customers, and a customer has many accounts. We have modeled this simple scenario in an XML document with nested elements. This document, named Bank1.xml, is shown in Figure 8.29. It is also available in the accompanying CD.
If we load the above XML document and its schema in an XmlDataDocument object, it will provide two relational tables’ views: one for the customer’s information, and the other for the account’s information. Our objective is to display the data of these relational tables in two DataGrids as shown in Figure 8.30.
To develop this application, first we had to generate the schema for our Bank1.xml file. We used the VS.NET XML designer to accomplish this task. It is interesting to observe that while creating the schema, VS.NET automatically generates the 1: Many relationship between the Customer and Accounts elements. To establish the relationship, it also creates an auto-numbered primary key column (Customer_Id) in the Customer DataTable. Simultaneously, it inserts the appropriate values of the foreign keys in the Account DataTable. The DataSet view of the generated schema is shown in Figure 8.31.
In order to provide the relational view of our XML document (Bank1.xml), VS.NET included the Customer_Id attributes in both Customer and Account elements in its generated schema. It also generated the necessary schema entries to describe the implied relationship among the Customer and Account elements. Figure 8.32 shows an excerpt of the generated schema for our XML file. The complete schema is available in a file named Bank1.xsd in the accompanying CD.
In the above fragment of the generated schema, the xsd:unique element specifies the Customer_Id attribute as the primary key of the Customer element. Subsequently, the xsd:keyref element specifies the Customer_Id attribute as the foreign key of the Account element. XPath expressions have been used to achieve the afore-mentioned objectives.
The complete listing of the application is shown in Figure 8.33. It is also available in the xmlDataDocDataSet2.aspx file in the accompanying CD. The code is pretty straightforward. We have loaded two data grids from two DataTables of the DataSet, associated with the XmlDataDocument object.
In this example, we have illustrated how an XmlDataDocument object maps nested XML elements into multiple DataTables. Typically, an element is mapped to a table if it contains other elements. Otherwise, it is mapped to a column. Attributes are mapped to columns. For nested elements, the system creates the relationship automatically.
The XmlDocument and the XmlDataDocument have certain limitations. First of all, the entire document needs to be loaded in the cache. Often, the navigation process via the DOM tree itself gets to be clumsy. The navigation via the relational views of the data tables may not be very convenient either. To alleviate these problems, the XML.NET has provided the XPathDocument and XPathNavigator classes. These classes have been implemented using the W3C XPath 1.0 Recommendation (www.w3.org/TR/xpath).
The XPathDocument class enables you to process the XML data without loading the entire DOM tree. An XPathNavigator object can be used to operate on the data of an XPathDocument. It can also be used to operate on XmlDocument and XmlDataDocument. It supports navigation techniques for selecting nodes, iterating over the selected nodes, and working with these nodes in diverse ways for copying, moving, and removal purposes. It uses XPath expressions to accomplish these tasks.
The W3C XPath 1.0 specification outlines the query syntax for retrieving data from an XML document. The motivation of the framework is similar to SQL; however, the syntax is significantly different. At first sight, the XPath query syntax may appear very complex. But with a certain amount of practice, you may find it very concise and effective in extracting XML data. The details of the XPath specification are beyond the scope of this chapter. However, we will illustrate several frequently used XPath query expressions. In our exercises, we will illustrate two alternative ways to construct the expressions. The first alternative follows the recent XPath 1.0 syntax. The second alternative follows XSL Patterns, which is a precursor to XPath 1.0. Let us consider the following XML document named Bank2.xml. The Bank2.xml document is shown in Figure 8.34, and it is also available in the accompanying CD. It contains data about various accounts. We will use this XML document to illustrate our XPath queries.
The first expression can be read as “Give me the descendents of all Name nodes.” The second expression can be read as “Give me the Name nodes of the Account nodes of the Bank node.” Both of these expressions will return the same node set.
Which of the alternative expressions would you use? That depends on your personal taste and on the structure of the XML document. The second alternative appears to be easier than the first one. However, in the case of a highly nested document, the first alternative will offer more compact expressions. Regardless of the syntax used, please be aware that each of the above queries will return a set of nodes. In our ASP code, we will have to extract the desired information from these sets using an XPathNodeIterator.
Okay, now that we have traveled through the XPath waters, we are ready to venture into the usages of the XPathDocument. In this context, we will provide two examples. The first example will extract the names of the customers from Ohio and load a list box. The second example will illustrate how to find a specific piece of data from an XPathDocument.
In this section we will use the XPathDocument and XPathNavigator objects to load a list box from our Bank2.xml file (as shown in Figure 8.34). We will load a list box with the names of customers who are from Ohio. The output of this application is shown in Figure 8.35. The complete code for this application is shown in Figure 8.36. The code is also available in the XPathDoc1.aspx file in the accompanying CD.
At this stage, we need two more objects: an XPathNavigator for retrieving the desired node-set, and an XPathNodeIterator for iterating through the members of the node-set. These are defined as follows:
The Bank/Account[child::State=‘OH’]/Name search expression returns the Name nodes from the Account node-set whose state is “OH.” To get the value inside a particular name node, we need to use the Current. Value property of the Iterator object. Thus, the following code loads our list box:
This section will illustrate how to search an XPathDocument using a value of an attribute, and using a value of an element. We will use the Bank3.xml to illustrate these. A partial listing of the Bank3.xml is shown in Figure 8.37. The complete code is available in the accompanying CD.
The Account element of the above XML document contains an attribute named AccountNo, and three other elements. In this example, we will first load two combo boxes, one with the account numbers, and the other with the account holder’s names. The user will select an account number and/or a name. On the click event of the command buttons, we will display the balances in the appropriate text boxes. The output of the application is shown in Figure 8.38. The application has been developed in an .aspx file named XpathDoc2.aspx. Its complete listing is shown in Figure 8.39. The code is also available in the accompanying CD.
Extensible Stylesheet Language Transformations (XSLT) is the transformation component of the XSL specification by W3C (www.w3.org/Style/XSL). It is essentially a template-based declarative language, which can be used to transform an XML document to another XML document or to documents of other types (e.g., HTML and Text). We can develop and apply various XSLT templates to select, filter, and process various parts of an XML document. In .NET, we can use the Transform() method of the XSLTransform class to transform an XML document.
Internet Explorer (5.5 and above) has a built-in XSL transformer that automatically transforms an XML document to an HTML document. When we open an XML document in IE, it displays the data using a collapsible list view. However, the Internet Explorer cannot be used to transform an XML document to another XML document. Now, why would we need to transform an XML document to another XML document? Well, suppose that we have a very large document that contains our entire catalog’s data. We want to create another XML document from it, which will contain only the productId and productNames of those products that belong to the “Fishing” category. We would also like to sort the elements in the ascending order of the unit price. Further, we may want to add a new element in each product, such as “Expensive” or “Cheap” depending on the price of the product. To solve this particular problem, we may either develop relevant codes in a programming language like C#, or we may use XSLT to accomplish the job. XSLT is a much more convenient way to develop the application, because XSLT has been developed exclusively for these kind of scenarios.
Before we can transform a document, we need to provide the Transformer with the instructions for the desired transformation of the source XML document. These instructions can be coded in XSL. We have illustrated this process in Figure 8.40.
In this section, we will demonstrate certain selected features of XSLT through some examples. The first example will apply XSLT to transform an XML document to an HTML document. We know that the IE can automatically transform an XML document to a HTML document and can display it on the screen in collapsible list view. However, in this particular example, we do not want to display all of our data in that fashion. We want to display the filtered data in tabular fashion. Thus, we will transform the XML document to a HTML document to our choice (and not to IE’s choice). The transformation process will select and filter some XML data to form an HTML table. The second example will transform an XML document to another XML document and subsequently write the resulting document in a disk file, as well as display it in the browser.
In this example, we will apply XSLT to extract the account’s information for Ohio customers from the Bank3.xml (as shown in Figure 8.37) document. The extracted data will be finally displayed in an HTML table. The output of the application is shown in Figure 8.41.
If we need to use XSLT, we must at first develop the XSLT style sheet (e.g., XSLT instructions). We have saved our style sheet in a file named XSLT1.xsl. In this style sheet, we have defined a template as <xsl:template match=“/”> … </xsl:template>. The match=“/” will result in the selection of nodes at the root of the XML document. Inside the body of this template, we have first included the necessary HTML elements for the desired output.
The “<xsl:for-each select=“Bank/Account[State=‘OH’]” >” tag is used to select all Account nodes for those customers who are from “OH.” The value of a node can be shown using a <xsl:value-of select=attribute or element name>. In case of an attribute, its name must be prefixed with an @ symbol. For example, we are displaying the value of the State node as <xsl:value-of select=“State”/>. The complete listing of the XSLT1.xsl file is shown in Figure 8.42. The code is also available in the accompanying CD. In the .aspx file, we have included the following asp:xml control.
While defining this control, we have set its DocumentSource attribute to “Bank3.xml”, and its TransformSource attribute to XSLT1.xsl. The complete code for the .aspx file, named XSLT1.aspx, is shown in Figure 8.43. It is also available in the accompanying CD.
Suppose that our company has received an order from a customer in XML format. The XML file, named OrderA.xml, is shown in Figure 8.44. The file is also available in the accompanying CD.
Now we want to transmit a purchase order to our supplier to fulfill the previous order. Suppose that the XML format of our purchase order is different from that of our client as shown in Figure 8.45. The OrderB.xml file is also available in the accompanying CD.
We have developed an XSLT file (shown in Figure 8.48) to achieve the necessary transformation. In the XSLT code, we have used multiple templates. The complete listing of the XSLT code is shown in Figure 8.48. The code is also available in the order.xsl file in the accompanying CD.
Subsequently, we have developed the XSLT2.aspx file to employ the XSLT code in the order.xsl file to transform the OrderA.xml to OrderB.xml. The complete listing of the .aspx file is shown in Figure 8.49. This code is also available in the accompanying CD. The transformation is performed in the ShowTransformed() subprocedure of our .aspx file. In this code, the Transform method of an XSLTransform object is used to transform and generate the target XML file.
Databases are used to store and manage organization’s data. However, it is not a simple task to transfer data from the database to a remote client or to a business partner, especially when we do not clearly know how the client will use the sent data. Well, we may send the required data using XML documents. That way, the data container is independent of the client’s platform. The databases and other related data stores are here to stay, and XML will not replace these data stores. However, XML will undoubtedly provide a common medium for exchanging data among sources and destinations. It will also allow various software to exchange data among themselves. In this context, the XML forms a bridge between ADO.NET and other applications. Since XML is integrated in the .NET Framework, the data transfer using XML is lot easier than it is in other software development environments. Data can be exchanged from one source to another via XML. The ADO.NET Framework is essentially based on Datasets, which, in turn, relies heavily on XML architecture. The DataSet class has a rich collection of methods that are related to processing XML. Some of the widely used ones are ReadXml, WriteXml, GetXml, GetXmlSchema, InferXmlSchema, ReadXmlSchema, and WriteXmlSchema.
In this context, we will provide two simple examples. In the first example, we will create a DataSet from a SQL query, and write its contents as an XML document. In the second example, we will read back the XML document generated in the first example and load a DataSet. What are the prospective uses of these examples? Well, suppose that we need to send the products data of our fishing products to a client. In earlier days, we would have sent the data as a text file. But in the .NET environment, we can instead develop a XML document very fast by running a query, and subsequently send the XML document to our client. What is the advantage? It is fast, easy, self-defined, and technology independent. The client may use any technology (like VB, Java, Oracle, etc.) to parse the XML document and subsequently develop applications. On the other hand, if we receive an XML document from our partners, we may as well apply XML.NET to develop our own applications.
In this section, we will populate a DataSet with the results of a query to the Products table of SQL Server 7.0 Northwind database. On the click event of a command button, we will write the XML file and its schema. (The output of the example is shown in Figure 8.50). We have developed the application in an .aspx file named DataSet1.aspx. The complete listing of the .aspx file is shown in Figure 8.51. The file is also available in the accompanying CD.
Here, we will read back the XML file created in the previous example (as shown in Figure 8.50) and populate a DataSet in the Page_Load event of our .aspx file. We will use the ReadXml method of the DataSet object to accomplish this objective. The output of the application is shown in Figure 8.52. The application has been developed in an .aspx file named DataSet2.aspx. The complete code for this application is shown in Figure 8.53. The code is also available in the accompanying CD. The code is self-explanatory.
In this chapter, we have introduced the basic concepts of XML, and we have provided a concise overview of the .NET classes available to read, store, and manipulate XML documents. The examples presented in this chapter also serve as good models for developing business applications using XML and ASP.NET.
The .NET’s System.Xml namespace contains probably the richest collection of XML-related classes available thus far in any other software development platform. The System.Xml namespace has been further enriched by the recent addition of XPathDocument and XPathNavigator classes. We have tried to highlight these new features in our examples. Since XML can be enhanced using a family of technologies, there are innumerable techniques a reader should judiciously learn from other sources to design, develop, and implement complex real-world applications.
XML cannot be singled out as a stand-alone technology. It is actually a framework for exchanging data. It is supported by a family of growing technologies such as XML parsers, XSLT transformers, XPath, XLink, and Schema Generators.
Its major methods and properties include Close, Flush, Formatting, WriteAttribues, WriteAttributeString, WriteComment, WriteElementString, WriteElementString, WriteEndAttribute, WriteEndDocument, WriteState, and WriteStartDocument.
Frequently Asked Questions
The following Frequently Asked Questions, answered by the authors of this book, are designed to both measure your understanding of the concepts presented in this chapter and to assist you with real-life implementation of these concepts. To have your questions about this chapter answered by the author, browse to www.syngress.com/solutions and click on the “Ask the Author” form.
A: DOM Level 2 became an official World Wide Web Consortium (W3C) recommendation in late November 2000. Although there is not much of difference in the specifications, one of the major features was the namespaces in XML being added, which was unavailable in prior version. DOM Level 1 did not support namespaces. Thus, it was the responsibility of the application programmer to determine the significance of special prefixed tag names. DOM Level 2 supports namespaces by providing new namespace-aware versions of Level 1 methods.
A: The most significant change in the Beta 2 edition was the restructuring the XmlNavigator Class. XmlNavigator initially was designed as an alternative to the general implementation of DOM. Since Microsoft felt that there was a mismatch in the XPath data model and DOM-based data model, XmlNavigator was redesigned to XpathNavigator, employing a read-only mechanism. It was conceived of using with XPathNodelterator that acts as an iterator over a node set and can be created many times per XPathNavigator.
Alternatively, one can have the DOM implementation as XmlNode, and methods such as SelectNodes() and SelectSingleNodes() can be used to iterate through a node set. A typical code fragment would look like this:
Although XPathNavigator is implemented as a read-only mechanism to manipulate the XML documents, it can be noted that certain other classes like XmlTextWriter can be implemented over XPathNavigator to write to the document.
A: XSL Patterns are predecessors of XPath 1.0 that have been recognized as a universal specification. Although similar in syntax, there are some differences between them. XSL pattern language does not support the notion of axis types. On the other hand, the XPath supports axis types. Axis types are general syntax used in Xpath, such as descendant, parent, child, and so on. Assume that we have an XML document with the root node named Bank. Further, assume that the Bank element contains many Account elements, which in turn contains account number, name, balance, and state elements. Now, suppose that our objective is to retrieve the Account data for those customers who are from Ohio. We can accomplish the search by using any one of the following alternatives:
Which of the above alternatives would you use? That depends on your personal taste and on the structure of the XML document. In case of a very highly nested XML document, the XSL Path offers more compact search string.