Appeared in Object Magazine, November 1997
XML and OO Applications
Knowledge sharing. Concurrent engineering. Business process reengineering. Agility. These are a few of the business trends that require, or benefit from, collaboration among people and software systems. Such collaboration may involve a simple use of a Web server for exchanging documents between individuals and workgroups, or a more sophisticated knowledge-based system that can serve as a virtual team member. In a future scenario, a distributed network of software agents will collaborate on behalf of their human owners in order to achieve a shared goal, e.g. design a new product.
My theme throughout this column has been to suggest potential synergies between Web and object technologies, and to analyze how they might be applied to a problem like collaboration. Given the accelerating pace of technological change, these observations are always a snapshot in time. However, I try to look for the relatively stable undercurrents that readers can use to influence their next generation systems.
This month Id like to share some interesting new undercurrents in Web-based metadata. The Extensible Markup Language (XML) is a data interchange language for heterogeneous systems that is especially tuned for fast, on-line systems. XML is much more than a better approach for formatting Web documents, its a representation language for describing the content and semantics of Web-based resources. Within the document management world, where XML has its origins, the notion of a document is much broader than the typical definition. While a document still contains the familiar sections, headings, and so on, it may also contain embedded structured data, attributes associated with the document elements, and relationships between those elements. All elements may be contained within one file, or split across many files and databases. A document is an aggregate of text and data components, not simply formatted words.
I see great potential for integrating XML documents with object-oriented application programs. First, the structure of XML documents can be very easily parsed into objects that can be programmatically manipulated. Second, XML document objects can be commingled with other application objects to create hybrid Web-Object systems. As a way to introduce these possibilities, Ive divided this article into three topics: extensibility, reflection, and semantic models.
Extensibility is the hallmark of object-oriented systems. Using Java as an example, the java.lang package defines the essential foundation classes of the language, e.g. Object, Thread, String, etc. (For readers who dont speak Java, a package is a module of classes that delimits a namespace and provides useful software engineering features for controlling access privileges.) This foundation is then extended with the core packages java.io and java.net that provide input/output and network connectivity, respectively. Other more specialized classes can extend this foundation and core to provide database connectivity, data structure packages, GUI widgets, and so on. You can then buy additional class libraries or write your own application-specific classes that are essential to your Java applet, but are completely transparent to its users. Ok, this is basic stuff that we all take for granted, right?
Now consider the Webs Hypertext Markup Language (HTML). Like Java, you do have to pay some attention to version numbers. The JDK version 1.1 adds features not available in JDK 1.02, and HTML version 3.2 adds new markup tags not available in HTML 2.0. However, if you extend these HTML tags with application specific elements, like a <product> tag, then youve violated the standard. Period. No extensibility. Makes it kind of hard to differentiate your company in the marketplace, or to add document content structure for a particular on-line ordering system. Thus, developers are forced to invent convoluted, non-standard solutions for embedding and parsing data, often contained within the HTML comment tag. And if you want to define the rules for how your new tags should be displayed in Netscapes browser, forget it.
Well, it looks like a very interesting solution is just around the corner. The Extensible Markup Language (XML) is making rapid progress through the WWW Consortiums (W3C) standards process (http://www.w3.org/XML). XML has many benefits for folks who want to improve structure, maintainability, searchability, presentation, and other aspects of their document management. But Im going to slip by the document management features and get into its benefits for application developers. With XML, you can define company or application specific Document Type Definitions (DTDs) that specify new markup elements, or extend those from other DTDs. The XML committee did not invent this concept, but it is a specialization and simplification of the Standard Generalized Markup Language (SGML).
Even though the XML specification is not finalized, several other standards are being proposed that extend it. The Channel Definition Format (CDF) may define resource content for Internet "push" technology; the Open Software Description Format (OSD) may define software packages and their interdependencies for use in automated software distribution; and proposals for both XML-Data and the Meta Content Framework (MCF) provide a general approach to define metadata about document structures. Ill elaborate on the XML-Data proposals benefits for enabling reflection within software systems and an improved capability for semantic modeling (http://www.microsoft.com/standards/xml/xmldata.htm).
Reflection allows a software program to report its own class definitions or schema so that other application programs can use objects that are not statically predefined. Or, more simply, you give me an object, and Ill ask that object to describe itself to me. Java 1.1 introduced the java.lang.reflection package that allows any Java program to enumerate the attributes and methods of any object, and to invoke a method by dynamically constructing a parameter list for an instance of the Method object. Javas reflection capability is one of the principal enablers for introspection in JavaBeans components. In a similar way, CORBA supports reflection through the interface repository and the dynamic invocation interface (DII).
In the XML-Data proposal, a document can embed its schema definition within the same file, or it can refer to a standard schema imported from another URL. Or, it can do both, supplementing a standard schema with locally defined extensions. To make this more concrete, Ive modified a simple example from the XML-Data specification and included both the schema definition and a simple instance of the <book> element. However, please remember that this is still a draft specification. The final syntax may change.
<?XML version="1.0" encoding="UTF-8" ?>
<elementType id="BOOK" extends="#Publication">
<elementType id="PROCEEDINGS" extends="#Publication">
<relationType id="EDITOR" extends="#AUTHOR"/>
A simple instantiation of this schema might look like this:
Because an application program can access the schema for any XML element, you can develop a reflection API that, given an XML object, will return its definition. I havent seen such an API yet, but its design would be relatively straightforward. There are several XML validating parsers available for free download, including one written in Java by Microsoft that includes source code (http://www.microsoft.com/standards/xml/). A validating parser reads the XML DTD and verifies documents that use it. Ive successfully parsed the above schema using Microsofts implementation, and I plan to continue work on this prototype using a more complex schema model.
OK, lets circle back around to collaboration support systems. An XML document is a heterogeneous collection of objects that can be exchanged between programs. So a CAD system can define an XML document for sharing product design information that is automatically extracted from the CAD database. The marketing department can define an XML document that contains requirements objects for a product design. Any of these documents can contain a combination of displayable text and non-displayed embedded data, plus relationship links that connect objects within and between documents. A Java applet might parse these XML documents and guide the user through evaluating the design against requirements. To expand on this capability for modeling object relationships, Ill take a final foray into XMLs semantic models.
Current HTML Web pages are semantically opaque to a computer system. Although HTML documents can be parsed, the tags have no domain specific meaning. XML seeks to add meaning back into a documents structure and relationships by supporting, in addition to HTMLs unidirectional hyperlinks, the more sophisticated multi-ended, typed, self-describing links. The authors of the XML-Data specification suggest that it is "sufficiently advanced for use in artificial intelligence and natural language systems, yet retains the architecture and investment of existing XML and the efficiency of its representation."
Look back at the schema example. Notice that generalization hierarchies are supported between element types, e.g. book and proceedings are subtypes of publication. Attributes and relations are inherited as well, so book inherits the title relation, plus adds its own author relation type. Also, relation types are treated as first class objects, so they can have their own attributes and range restrictions. Relation types can inherit from other relation types, e.g. editor is a subclass of author. This example does not cover all of the XML-Data proposal, see the specification draft for more details and updates.
At the time Im writing this column, there are several efforts underway to propose a base set of schemas for use with XML. There is a common set of properties, called the Resource Description Framework (RDF) Core Schema, that may be optionally used by all other metadata schemas. There are other proposals floating about for standard schemas that extend from these core definitions. Some useful information can be found at http://www.dstc.edu.au/RDU/RDF/. Stay tuned for late breaking news.
A common theme among document management vendors is that documents are the basis for an organizations knowledge. Documents are an integral part of a knowledge-sharing system that supports collaboration across an organization. XML enables a simple, but potentially powerful knowledge representation language for building the hybrid Web-Object systems that Ive been writing about in this column since August 1996. Im optimistic about the future.