XML at the Crossroads

Dave Hollander

Co-Chair W3C XML Coordination Group

Co-Chair W3C XML Schemas Work Group

May 13, 2001

We are at a crossroads near the beginning of the “information age”. As interchangeable parts and the assembly line were essential to the industrial age, reusable information should be essential to the information age. XML is the leading contender to deliver reusable-information to the information age. There are a lot of dangers facing us at this nascent period of the information age: companies will happily lock up our information and knowledge with their technology, standards efforts will readily allow us to sublimate information to behaviors, process, semantics or other traits that should be derived from information, not lead it into a niche. If information is to lead the information age, we, the XML leadership, need to reconsider where we are going to take it.

XML is at a crossroads. After a brief but eventful period beginning in February 1998, the original vision has either been delivered upon or work is underway to complete it. What is needed now is to step back, consider the gains that have been made, and then decide upon the future direction. Central to the decisions that remain to be made is re-examining the vision driving XML.

The original vision was relatively straightforward, at least to practitioners of SGML—capture information in a form that maximized its reusability across various contexts and across various technologies. In short, to free information from the circumstances in which it was created to make it available to expand the human experience (as much as practical). For this vision and tireless efforts to make it a reality the XML community (and the world in general) owes a debt of gratitude to Charles Goldfarb, Yuri Rubinski and the original SGML development team. I believe XML’s success is in large part due to the realization that simplicity and web-friendliness were necessary to achieve these goals.

In reaching this goal, the XML Working Group relentlessly used their 100+ combined experience in SGML to answer the question: “Is this necessary for success?”. The apparent success of XML in fields far beyond those of the original team participated in indicates that the question was answered successfully. XML quickly spread far beyond the contact experience and influence of its creators.

Success created an environment in which it became increasingly difficult to repeat. After the first success, the team grew and began to tackle issues and opportunities that were bypassed in the initial development. Namespaces, linkage, modeling, transformations and other capabilities were developed while queries and transport remain in development. Growth was an inevitable outcome of the success. No longer the “hobby” of dedicated individuals, XML became a business opportunity. With this success, came additional interests some of which broadened the scope of the project while others were simply accommodated to gain acceptance, even if they were counter to the original vision.

Combined, the specifications for new facilities, both those known to be necessary to the original team and those brought by the additional interests, have dwarfed the original XML specification and indeed the SGML itself. While these additional specifications add significant capabilities, they also add significant technical burden and makes the mastery of XML and the information it captures difficult for the majority. The majorities that are making breakthroughs in biology, humanities, business and other human endeavors that characterize the information age.

The leadership of XML needs to address the issue of whom the specifications are being developed for and to base the vision on that conclusion. Is XML for the technical elite and their member companies that sponsor the W3C? Surely, these are the companies that sponsor the development, but should they be the primary benefactors of the information revolution of which XML is at the center? Or should the true benefactors of XML be those who will use new information technologies to create new solutions to human problems: the mapping of the human genome, the development of more efficient businesses, the ability to communicate and collaborate with others across time and space on new ideas?

The W3C member companies have and continue to make the investment necessary to advance XML, and for this they deserve to be rewarded by insight and early understanding that can lead to commercial advantage. However, their principle benefit is the pervasiveness of XML and its ability to simplify the mechanics of information development. The rest of this paper proposes changes in how the W3C XML activity should lead the continuing development of XML to help assure that the true benefactors of XML will be those who will use it to solve new human problems.

Organization

Organizations are properly structured to respond to environment in which they operate. The current organization is based on Working Groups divided according to functional architecture. The way we were organized was successful at completing, or nearly completing a set of XML specifications. I expect that each of our specs require some follow up design work to meet expectations and needs.

But now, I believe, we need to organize differently to achieve different results. Unlike two years ago when query and schema were obvious gaps in the XML architecture, today, I do not see large gaps. More detailed functional architecture work could identify gaps and overlaps, but I expect these gaps will be small as long as the scope remains interchangeable information and compared to the gaps and needs of the XML user community.

In spite of efforts to coordinate during the development of the disparate specs, there are significant overlaps and gaps. Given my understanding of the of the user community and personal experience, the following are the most sever gaps and needs for XML:

Complexity—tricks and tips are absolutely necessary to successfully create an XML based system. Much of the complexity is due to the multiple ways WG solved problems while staying within the scope of their charters.

Consolidation—there are multiple ways to specify a location in a document, data types, physical structure and many other aspects of information. Many of these are artifacts of the process in which they were created.

Clarity—different authors created each specification, with little guidance on terminology or exposition. The result is that many of the apparent differences are just differences in writing style and jargon. In addition, each team has had differing ideas about how systems should be architected and developed, resulting in additional complexity that must be understood to fully implement XML systems.

During the original development of XML, the seminal question was "is this necessary for success?" I believe this question was successful because we had a complete implementation as the starting point. All subsequent specs both borrowed from existing implementations and were required to develop new features, functions and capabilities.

The simplicity cry needs to be heard again! We need to reorganize to effectively represent the users of XML and developers of XML based systems to make the question effective. The purpose of this organization of work groups is to focus efforts and resolve conflicts from the perspective of our user community. I propose the following Work Groups be created:

Information Exchange – this group to be responsible for assuring XML can be effectively used to capture, transfer and manage information.

Information Architecture – this group is responsible for assuring that information systems can be architected to meet the needs of information modelers. 

Information Processing – this group is responsible for assuring that information systems can be cost effectively developed and maintained.

This proposal places charter authority into these new work groups. The majority of work done in these WGs will be developing requirements, samples, examples, test suites, training materials, etc and empowering editorial teams to address the requirements, and reviewing and approving the results of the editorial teams.

Existing WGs would continue to carry their existing charters until they have reached an agreed upon deliverable, typically the publication of their current draft as a Recommendation. Existing WGs would then be re-chartered to make them responsible for:

continuing development of their specifications based on requirements and approval from the new WGs,

developing and maintaining subject expertise,

authoring research working drafts not intended for the Recommendation track, and

assuring that the needs of the three new WGs do not conflict.

I believe this new organizational structure will result in specifications that address most of the issues facing the XML community today. They will be able to strike the balance needed for XML to be relevant to fully structured information as well as suitable for adding structure to the bulk of unstructured and semi-structured information that exists today.  Information reuse requires easy-to-understand and easy-to-master standards to assure the balance between technologists and information owners and managers remains equitable. With care, XML can lead the way to maximize actual information reuse and therefore be the standard that preserves the balance.

References:

The State of XML: Why Individuals Matter; by Edd Dumbill; XML.COM; May 30, 2001