XML Schemas in eCommerce

Arofan Gregory - Commerce One

Dave Hollander

Friday, September 10, 1999

2nd Draft

Introduction

"For systems to interoperate they must manage the exchange of e-commerce documents without requiring manual intervention."[1] The eCo Working Group's Semantic Recommendation establishes guidelines for the development of interoperable e-commerce documents. One of the most significant recommendations is: "The WG recommends that an e-commerce business library expressed in XML take advantage of XML schemas for two reasons: validation and extensibility."
Schemas define and describe a class of XML documents by defining, constraining and documenting the meaning, usage and relationships of the parts of a document. These parts include datatypes, elements and their content, attributes and their values. XML Schemas is a specification being developed by the W3C XML Schema Working Group. [2]
XML Schemas provide better tools for expressing the design of e-commerce documents than the more commonly used Document Type Definition (DTDs). This paper, an edited extract from the eCo Semantic Recommendation, will explore how extensibility governs your ability to apply the library to your business needs and how validation can impact the cost of implementing reliable e-commerce systems. Because of these features, and other features to be described in future reports, XML Schema should have a significant impact on how we create XML business libraries in the near future.

Validation

All XML documents must be well-formed. A well formed document is a document that follows the syntax rules of XML which assures a XML parser can reliably separate content from markup. In addition to well-formedness, XML provides two basic modes of validation:
    1. DTD Processing—structural validation with minimal data validation
    2. Schema Processing—structural validation and data validation
Traditionally, EDI-based e-commerce systems have performed thorough structure and data validation, and this experience point out the need for similar levels of validation in XML. Full structural and data validation is only possible with schema based processing.
The difficulty in an EDI system is that both structural and data validation relies on the creation of application-specific parsers. In XML, implementation is simplified by using generic tools for performing structural and weak data validation through the use of XML DTDs. The level of data validation provided by XML parsers working with XML DTDs is not sufficient, however. If we look at EDI-based applications, we find that much of the data comes from standard code lists, or is strongly typed numeric data. Further, text-based fields may be restricted by the field lengths of the databases used to process the EDI messages.
XML schemas provide us with the ability to provide full validation for both structure and for data types. By providing us with a syntax for describing field length, degrees of precision, or an enumeration of token values (such as codes), we gain the benefits of full validation and of using XML as a syntax. Compared with tradition EDI implementations, XML and XML schemas are much simpler to develop, not only because you can use generic tools for parsing and validation, but also because generic tools can be used to express your validation constraints expressed in the schema.

Extensibility

The requirements of e-commerce are such that many basic document types are generally useful, but for specific tasks or for particular markets, minor structural variations add even more value. If a truly common XML structure is to be established for e-commerce, it will need to be easily modifiable, while minimizing the costs associated with implementation around these variations on standard data structures.
If we look at EDI, one common phenomenon is a gradual increase in the number of different elements, to accommodate market-specific variations. Several efforts within the EDI community are focused on the elimination of this problem which points out the fact that variations are both a requirement and not easy to solve. Another related EDI phenomenon is the overloading of the meaning and use of existing elements, creating a tangible bar to interoperation without low-level coordination between trading partners. The end result is a high cost in implementation.
XML DTDs require that the data structure be fully described before implementation, in terms of the elements, attributes, and their structural relationships and content models. In order for documents of a given document type to be interoperable across different XML applications, they must conform to a single DTD, with only minimal variation in their structures (as captured in the internal declaration subset of the documents being exchanged). In practice, the high degree of cross-application coordination required to handle structural variation reduces the usefulness of this built-in document-specific capability of XML processing with DTDs. The end result is that it is difficult to accommodate the level of variation required by EDI applications using DTD based validation.

Schema-based XML processing offers us a way to enhance the ability of applications to interoperate, accommodating the required variations in basic data structures, without either overloading the meaning and use of existing data elements or requiring wholesale addition of data elements specific to a particular industry or process. Using data modeling techniques borrowed from Object Oriented programming, schemas allow designers to specify new element types that inherit the properties of existing elements. When defining new element types you can exactly specify the datatypes and structure of the new properties that are added to the properties inherited from the existing element. Inheritance in schemas, as in Object Oriented programming, serves to enable reuse and reduce the cost of development.

[replaced 9/17: Schema-based XML processing offers us a way to enhance the ability of applications to interoperate, accommodating the required variations in basic data structures, without either overloading the meaning and use of existing data elements, or requiring wholesale addition of data elements specific to a particular industry or process. Borrowing data modeling techniques Object Oriented programming where inheritance serves to enable reuse and reduces the cost of development, schemas allow implementers to specify new element types that inherit the properties of existing elements. Schemas also allow you to exactly specify the structural and data content of the additions made to existing data structures. In this way, schemas allow us to limit variations and to minimize the amount of additional implementation effort required in building an application. ]
This benefit derives from the nature of most variations required in e-commerce documents: many data structures are very similar to "standard" data structures, but have some significant semantic difference in a particular industry or process. Because schemas give us a mechanism for indicating the semantic "predecessors" of a particular variation, generic processing of standard types provides us with a basis for implementing just the refinements needed to handle the specific semantic variation. (An example of this would be the addition of a field to an address block, to describe some industry-specific addressing information. The address structure from a common library could be taken, and only the single additional field would require new processing, even though the entire structure was given a different name, to distinguish it from the "normal" address structure.)
In those cases where a variation in data structure is required only for some particular process, schemas again allow us to minimize implementation effort. It is possible to add a mechanism that allows a system to process a modified data element exactly as it would process its direct, standard parent, except for that specific interaction that requires the modified structure. By having most processes ignore the variation, except where it is specifically needed, schemas again help us reduce the effort required to build e-commerce applications, and enhance the level of interoperability.
It should be noted that it is not merely structural extensions that can be expressed in schema syntax, but also information about new data types, which can also help users accommodate requirements placed on them by legacy processing systems with nonstandard specifications.
While the problems encountered in EDI applications cannot be entirely avoided, the use of XML schemas helps us to identify variations in data structure, and to better manage them. Further, it gives us a solid syntax for modifying only those specific aspects of the data structure that require modification.

Conclusion

Most XML based e-commerce applications to date have used XML DTDs to define and describe the documents that are interchanged. While this has improved the affordability and interoperability of e-commerce systems, it has proved difficult to use this approach to develop XML based alternatives for EDI systems.
XML Schemas provide a wide range of features and capabilities not available in XML DTDs. Features such as strong datatyping, as described in the July research report[3], and validation and extensibility should help overcome many of the hurdles that have made it difficult to develop interoperable e-commerce libraries. XML Schemas and the tools developed to work with schemas should provide necessary environment to economically create e-commerce systems as large and complex as today’s EDI systems.

[1] eCo Architecture for electronic commerce interoperability; Draft version 0.8; http://www.eco.commerce.net/eco/spec

[2] XML Schema Part 1: Structures, W3C Working Draft 6-May-1999; http://www.w3.org/TR/xmlschema-1/

[3] If XML is So Good, Why Do We Need XML Schemas; http://www.commerce.net/research/pw/bulletin/core/99_30_n.html