XML is one of the key technologies that is driving Enterprise Web development today.
XML promises a standard data format that can be shared easily by different organizations. In this installation of this two part series, Rick reviews XML's key features and problems as a data representation format for relational data and objects. He'll also introduce some free tools to provide easy translation between XML and traditional data structures and shows how to use them with quick examples sharing data over the Web. Next issue Rick delves into some practical examples of how to implement flexible solutions that utilize these XML tools.
XML is shaping up as a key technology in distributed applications. You couldn't have missed it in the hype happening in the trade papers. But unlike other over-hyped technologies XML has been rapidly accepted because it solves very specific technical problems by using a standard protocol/data representation. The simple act of agreeing to a common data format for data is making data exchange drastically easier than it ever was before.
In this article I want to discuss XML from the perspective of a messaging and data representation mechanism in distributed applications. For these applications XML is the ideal transport mechanism as it can open up applications to all sorts of clients including Fat Client, Thin Client, Browser based and even Palm Top applications. I'll introduce some concepts of persisting data and objects to XML generically that make it possible to build flexible solutions that can work well in standalone applications as well as distributed applications sharing data over the Web by building smart components and objects and using XML to represent that data as it travels over the Web.
In the process I'll also introduce some free Visual FoxPro and COM tools that help you easily persist your objects and data into XML quickly and efficiently with a few lines of code as well as introducing you to the client side tools you can use to use the XML data in a browser and in a Visual FoxPro Fat Client application transparently. But before we dig into code, let's review the premise of XML as it relates to a message mechanism used to represent and transfer data.
XML as a Data Representation and Messaging Standard
The full XML spec is rather complex because there are so many related technologies such as XSL, schemas and data definitions that are making XML complex for the first time user. Those offshoots provide powerful functionality to XML, but they don't exactly make learning it easy. However, at the core XML is a straightforward standard that's easy to understand and work with thanks to a standard XML parser object model which is consistent on different development platforms (client/server, Thin Client/Fat Client, Windows/Unix/Mac).
Even in its simplest form XML as a standard has lots of potential as a data representation and messaging mechanism. Data representation typically involves translating the data from a native format into XML, and then back into the same format or even a completely different one on the other end of a connection. A good example may be a Visual FoxPro/VB application publishing some of its data via XML and another application, possibly running a Java applet on Unix, picking up that XML data and using it internally. For a second consider that this data is dynamically generated on a Web server with the Java applet asking for specific data and receiving the result back as XML.
XML as a standard data exchange format
As you might expect, this is particularly useful in Web applications where XML can provide a standard, agreed upon format to transfer data over the Internet. In this respect XML is similar to a data format such as Comma Delimited or SDF
file in the past. However, as a data representation format XML is also much more flexible than these old formats, because it can carry just about any kind of data. Most other text formats have been hampered by their limited ability to transport complex data like memo fields or binary data. XML is very flexible in what kind data it can carry because of its tag based language definition where every XML data element is marked with a start and end tag (
Support for multiple sub-documents
Another huge benefit of XML as a data representation mechanism is that XML can combine multiple pieces of data into a single document. The markup language has support for stacked and hierarchical data representation. XML documents can combine several separate entities (be it tables, objects, messages or metadata) into a single XML document. For example, you can send the actual data of say a table, as well as a message header that describes the data or maybe contains any error conditions that might have occurred in obtaining that data. You could also combine multiple tables (as an example) into a single document. Or a table and an object both parsed into XML.
A stacked document may look like this:
<?xml version="1.0"?>
<docroot>
<errors>
<error errorno="1">No PO data to retrieve</error>
<error errorno="2">Invoice rebalancing failed</error>
</errors>
<jobinfo>
<action>processcustomers</action>
<action>processinbox</action>
</jobinfo>
<customers>
<customer>
<lastname>Strahl</lastname>
<firstname>Rick</firstname>
</customer>
<customer>
<lastname>Shizm</lastname>
<firstname>Frank</firstname>
</customer>
</customers>
<invoices>
... Invoice data here
</invoices>
</docroot>
There are multiple data representations in this single document such as customer, invoices and even the jobinfo and errors XML fragments. Note that you can only have a single root element (<docroot>
in this case), but you can nest multiple items on the second level and down. The XML fragments may be totally unrelated to each other or they may all be related - it's entirely up to your implementation.
XML can also represent hierarchical data. Hierarchical data is extremely useful for packaging related data in a logical fashion that is easy to read and group without relational concepts. Instead of representing say an invoice as a set of related tables, you can actually have an invoice XML fragment, which nests inside of it the invoice header, the customer information, and a set of lineitems:
<?xml version="1.0"?>
<invoices>
<invoice>
<invno>IN_122121</invno>
<date>01/01/99</date>
<terms>Net 30</terms>
<total>123.12</total>
<customer>
<lastname>Strahl</lastname>
<firstname>Rick</firstname>
<address>32 Kaiea</address>
...
</customer>
<lineitems>
<item>
<sku>LABOR_PRG</sku>
<descript>Programming Labor</descript>
<qty>5.5</qty>
<price>150.00</price>
</item>
<item>
<sku>LABOR_PRG</sku>
<descript>Programming Labor</descript>
<qty>3.5</qty>
<price>200.00</price>
</item>
</lineitems>
</invoice>
<invoice>
<invno>IN_122122</invno>
...
</invoice>
</invoices>
XML provides flexibility and the ability to do things in a single pass that may otherwise require multiple passes. With XML I'm able to send multiple files at once where with an encoded file I'd have to make multiple requests to the server. Also, what happens if there's a problem? With
file encoding there's no standard form to report errors. With XML error handling will be immediately obvious with an error header part of the document. You wouldn't have to explain the error format to anybody. One look at the result XML would tell the story.
All of this provides for a lot of flexibility in how the data is packaged for using XML in messaging between
applications either locally or remotely over the Internet. The cool thing is that you as the developer can determine the level of complexity you want to implement. You can go the simple route and simply dump data into XML or you can build a whole framework that deals with error handling and processing instructions implemented through special XML fragments that are part of the complete XML document as I hinted at in the first XML example. You can use the parser and build complex hierarchical objects or you can use plain application development code to generate the XML as a string yourself or the wwXML generating class I'll describe shortly.
Version Independence
If you use XML you're also not relying on a specific binary mechanism like COM, which requires a binary contract between the client and server. You're not bound to a property interface with XML ? the data structures themselves can change by way of the XML structure without affecting the binary data representation. The XML may map to a binary object eventually, but there's an intermediary layer that pulls the data that knows what to do with it.
This makes for easier maintenance as you're not relying on binary binding and recompiling your code to make changes to data structures. This makes it a snap to add new functionality without breaking compatibility with existing clients. Old and new functionality can coexist with the same data structures without breaking either version of the app. Older clients simply don't include the new data and your application can handle this accordingly while newer client can use the additional data as needed.
Object Persistance
XML can also provide a good way to present persistent data from an abstract element such as an object that exists in memory. XML's hierarchical structure actually makes a very good fit for mapping complex nested objects into XML structures. Typical real world objects tend to be hierarchical and a good programmatic implementation of these objects can closely map those relationships. The Invoice example above is a simple example. You'd have an oInvoice
object with sub objects for oCustomer
, oLineItems
and so on.
These types of objects are great for passing data around locally, but when the times comes to persist this data things get more tricky. XML can provide a great way to take a snapshot of the object and store the contents into any persistable form - a file or a database typically. Or you can send the result and send it over the Web essentially marshalling an object from a client to the server.
This has a large number of applications: Transferring data over the Web is an obvious choice. For example, a client application can build a purchase order object and send that object persisted as XML to the server. The Server validates the PO and then either saves it into the database (unwinding the hierarchical relationship into a relational DBMS structure) or sends back a note that says the PO could not be accepted.
Another extremely useful use of persisted objects in XML is Server Session state. Servers, especially Web servers should be stateless, but most Web applications need to track state for users using a site. Persisting real binary (COM)
objects across multiple requests is problematic in terms of resources and scalability and multiple machines. But persisting state as strings in a database table via XML is fairly lightweight and flexible. As long as the data can be written somewhere persistent it can be easily retrieved using XML.
For example you could have a COM object that can accept an XML input parameter, and generate XML output. By doing so, you can use that same object on a Web backend to process incoming Web requests directly over the Web using exactly the same code base. This is one of the biggest points for XML in my experience ? the ability to apply logic directly without change in multiple layers of applications, just by using a common XML interface to pass messages around. Once such a mechanism is in place it doesn't matter much whether you use COM, HTTP or any other client to access your code, either directly or indirectly over the Internet.
Some Problems With XML
But XML also has a few downsides that you need to be aware of:
- XML conversion can be very resource intensive especially with large data sets
- The current XML parsers are fairly slow especially COM based access in high level languages such as Visual FoxPro, VB and VBScript.
- XML documents can bloat data considerably with all of the markup tags. When sending data over the wire this can be a problem if data is not compressed
- XML has issues with binary data
The downsides are not so frequently discussed and as well known as the pros of XML. In short, XML conversion is very resource intensive both in terms of CPU resources to create and unpack XML into native data as well the actual data size. In high performance applications ? especially those that need to provide lots of data in single transactions - this can be a major hurdle for XML implementation.
Even if you do build Internet applications XML may not always be the right choice. For example, if you have a client/server application that uses the Web as the network XML may seem obviously very useful. But if the app is fully under your control on both client and server and you use say Visual FoxPro on both sides, there's little to gain by passing messages back and forth via XML. Why convert a table to XML on the server, only to convert the XML back to a cursor ? it may be much more efficient to
package up the table in a more native format like in the Distributed HTTP applications article (http://www.west-wind.com/articles.asp) using EncodeDBF/DecodeDBF. The original code for exporting is about twice as fast than XML conversion and a whopping 5 times faster for importing. It's also much faster than marshalling ADO recordsets using RDS. There's a fine line where this rule applies - efficiency over possible functionality.
Size is also an issue for XML. The start and end tags cause significant bloat of the data being sent over the wire. Additionally binary formats like numerics must be converted to human readable string to be embedded in an XML document. In many cases this string is larger than the original data format. On the other hand XML can strip out trailing spaces that are embedded in string fields, thus saving some bytes. XML data size can be significantly reduced by using compression of some sort, since much of the data is highly repetitive (start and end tags).
But remember, performance isn't everything! XML is a great tool when data needs to be shared with disparate clients, especially if those clients are Internet based. By exposing data outputs and even inputs via XML you can possibly open up an application to other platforms since XML can be used with every one of them.
The point I'm making here is simple: Don't use XML unless you have a good reason to use it. In many full-on, high performance environments you may find that XML will only slow you down without providing any real benefits. If you're in full control of your application and control data flow on both ends, XML may be overkill. At the same time think about the benefits that XML can provide especially when you want to build open applications that can be accessed from various platforms and dev tools. XML is the best way to do this in a consistent fashion and I'll show you some examples that demonstrate how a framework can take advantage of this using standard functionality that you'll want to implement into any object architecture you might want to expose to this approach.
As with most technologies there are serious pros and cons when using XML in your applications. I think the pros can heavily outweigh the cons, especially if applications are designed to bring back small amounts of data at a time so that the XML conversion doesn't impact the process significantly. Take the time to make sure you understand what impact use of XML has on your application, both in terms of functionality that it provides your app and the performance implications it imposes on it.
Next month I'll look at specific examples how you can utilize XML between client and server applications and I'll introduce a flexible class that can make the XML conversions between native data structures almost transparent. I'll also discuss some strategies for designing smart objects that inherently support XML persistence to make sharing data even easier. Until then, think about the promise of XML and how it can help your applications be more open.