Ontologies

Ontologies have become popular again. What were once the domain of philosophers pursuing a more rigorous study of the nature of our world, has now become a pursuit of the technical elite and shrouded in an almost mystical faith that they will unlock latent capabilities of our computers—the ability to understand and reason about the vast quantity of information stored in their memories.

Ontologies cross over philosophy, natural language and computation. Spurred on by the hype surrounding the "Semantic Web", our recent examinations of ontologies has been dominated by the computational domain. Most references to "ontologies" found on the web today describe ontologies from the narrow focus on machine-computation and first-order logic. It is almost intuitively believed that these computational ontologies will take computing further toward mimicking human behavior.

While I believe that computational perspective on ontologies is important, a much broader perspective will be needed to make ontologies live up to their potential. Insight into these perspectives can come from looking afresh at the definition of ontologies and the structures used to describe them. This paper will use various web resources to describe the computational perspective of ontologies and then some of the critical papers to explore where the state-of-the-art may require more attention. Finally, a perspective of designing sustainable, technology assisted human processes is used to explore pragmatic considerations.

An explicit goal of this paper is to explore ontologies from the perspective of business oriented technologists. To explore what ontologies are and how they are defined, maintained and leveraged to make our technologies serve today's real-world needs. To begin, lets look at some definitions and settle on a definition that provides us a framework for understanding and harnessing ontologies.

Definitions

A classic definitions of ontology is: An ontology is a specification of a conceptualization. This classic definition appears to be correct --but because it does not give insight to those not already familiar with the concept we will come back to it.

From dictionary.com comes the following two definitions:

<philosophy> A systematic account of Existence.
<artificial intelligence> (From philosophy) An explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them.
<information science> The hierarchical structuring of knowledge about things by subcategorizing them according to their essential (or at least relevant and/or cognitive) qualities. See subject index. This is an extension of the previous senses of "ontology" (above) which has become common in discussions about the difficulty of maintaining subject indices.

From these definitions a couple of key concepts emerge:

systematic / explicit formal specification: ontologies are the result of an organized and methodical effort to describe something. The something can be metaphysical, e.g. "what is life?" or real world, e.g. "what is a customer?".
objects, concepts and other entities, relationships: these are how we describe "something" computationally.
hierarchical structuring, subcategorizing: computationally, describing things has become synonymous with hierarchies and categorizing "things".

From the W3C semantic web [1] [2] comes

OWL can be used to explicitly represent the meaning of terms in vocabularies and the relationships between those terms. This representation of terms and their interrelationships is called an ontology.
An ontology formally defines a common set of terms that are used to describe and represent a domain. Ontologies can be used by automated tools to power advanced services such as more accurate web search, intelligent software agents and knowledge management.

Interesting to note that both of these definitions focus on the language used to describe something, its "terms" and "vocabulary".

Finally, from a book by Berthold Daum comes a description of ontologies as they have been used by data architects and database designers. Ontologies typically contain the following structures:

Lexicon - list of words
Taxonomy - words structured according to the ontology's classification system
Thesaurus - taxonomy plus related terms
Integrity rules and axioms
Technical organizational model - relational model, arbitrary relations, etc.

Berthold's description gives us insight into the kinds of structured used to describe ontological objects. He provides more insight into how a vocabulary is structured and what types of relationships are used.

If we return to: an ontology is a specification of a conceptualization, and do some substitutions we come up with a more detailed, compatible definition:

An ontology is a structure that describes the vocabulary (words, context, and relationships) used to describe a domain (group) of objects and entities.

With this as our working definition, the next part will look at the characteristics of some common ontologies so that we can choose which characteristics are import in putting ontologies to practical use.

Characteristics of Ontologies

Now lets pursue the goal of exploring ontologies from a broader perspective than just the technology—from the perspective of how to put them into production in today's real-world environment. Towards this goal we looked at some definitions and derived a working definition:

An ontology is a structure that describes the vocabulary (words, context, and relationships) used to describe a domain (group) of objects and entities.

To put ontologies to work we are going to have to make some decision about what structures to use and how to develop processes to create, maintain and deliver value from these structures. To make these decisions, we need to know more about the characteristics of ontologies.

Ontology Concepts

To see how to make ontologies pragmatic, lets take a look at ontologies, their underlying structure and design principles and at the needs of today's real-work that we can directly attack using the structured approach to describing a conceptualization.

Ontological characteristic can be described by either how they describe the "group of objects and entities", the structures used to capture ontological knowledge or the relationships and operators permitted.

Characteristics of the Group

Classes

The taxonomic class relation is fundamental to most contemporary, technology focused ontology languages in which the things-objects-entities in an ontology are called classes. Class and set theory introduce the ability to use 1st class logic which is the basis for automated reasoning. Formal set theory also introduces complexity in defining class membership, scope and span control as well as conundrums that detractors of formal ontologies are quick to point out.

The grouping used to describe objects need not conform to the formalisms of set and class theory. They can be arbitrarily defined, fluid or even lacking completely. Such "organic" groups are often described by their practitioners as "sets" or "classes" but lack to formal rigor necessary to have meaningful computational reasoning. These informal sets do however provide better support for many human based activities such as combinatorial learning and ontology development.

Individuals and Classes

The group of objects/entities can be conceptual, individuals or both. This characteristic seems intuitive, as in: Concept=vehicles, individual=1996 Chevy Caprice. However, consider that the individual instance in this ontology may be a class in another ontology: class=1996 Chevy Caprice, individual="my son's car". Considering "1996 Chevy Caprice" as a class or individual is neither right nor wrong, but one of many example of the decisions that must be made by the ontology designer.

Today, it is common to describe a group as either "in the TBox" or "in the ABox" where the TBox contains conceptual classes and the ABox contains individuals. Many ontologies will have different types of assertions allowed in the TBox (assertions on concepts) than in the ABox (assertions on individuals).

Technically, ontology languages typically distinguish between class knowledge and instance knowledge. While ontologies describe class knowledge they rely on applications to instantiate those classes’ by fulfilling the class relations defined in an ontology.

Ontological Level

If the group is a set of concepts or individuals specific to a particular subject or "domain" then the ontology is said to be a lower level, domain ontology. Upper level ontologies limits the objects to those that are meta, generic, abstract and philosophical, and therefore are general enough to address (at a high level) a broad range of domain areas.

The graphic from Travis Breaux's lecture on Information Analysis Using Upper Ontologies does a good job of providing an intuitive understanding of ornithological level.

An upper ontology is limited to concepts. Formal upper ontologies provide a rich meta vocabulary to help develop ontologies and suggest abstract data types which can be supported by technology to make it easier to build rich ontologies. Upper ontologies are also very useful to provide a more universal framework for use in defining domain ontologies. Domain ontologies that specialize upper ontologies have the benefit of being able to leverage the knowledge captured in the upper level ontology.

Grounding

All ontologies will implicitly have primitive concepts that are not introduced by definitions. However, the number of primitives and whether they have special status within the formal framework will vary from ontology to ontology. No matter how complete an ontology is, at some point it will have to rely on assertions about object or entities that are not in the ontology. These "grounding" assertions are typically made against individuals. In the ontology above, there must be either an implicit or explicit set of assertions that separate "Predator" from the "Boeing747". If the ontology structure described an attribute: "number of pilots" then the grounding would be considered explicit.

From a philosophical view, even this explicit grounding would rely upon implicit assertions such as a "computer" or 'remote operator" is not a "pilot". The reliability of the grounding assertions is critical to the reliability and accuracy of the results of reasoning applied to an ontology.

Reasoning along the lines: drones do not have pilots, the predator is a drone therefore the Predator operations do not need to schedule a human operator will only be correct if grounding assertion about pilots correctly distinguishes between computers and remote operators.

Yet another approach to use superordinate learning to achieve linkage beyond the ontology. the upper level classes in one ontology are linked to classes in an upper-level ontology. Saying that an "Aircraft" is a type of "Vehicle" implies that Aircraft inherit all the properties assigned to Vehicle.

Superordinate linkages provide considerable leverage but bring with them developmental and technical challenges. Developmentally, it means that the two ontologies are co-dependent and therefore there must be maintenance procedures in place to assure they evolve consistently. Technically, you must be sure that the same relational constraints and operators are used or the logic may well deliver unexpected results: "vehicles have operators" is an incompatible assertion with the "ABox" grounding of "Predators do not have pilots".

Characteristics of the Structures and Relations

The structure and relationships used to describe or classify the objects-entities has significant impact on the applications the ontology can be applied to as well as the processes needed to create and maintain the ontology.

Expressiveness

The expressiveness of an ontology language refers to the number and type of different relations available for defining classes of knowledge. Ontology expressiveness considers Part-of-Speech (noun and verb), the type of axioms and assertions made, how the objects are classified, what types of relationships are recorded, etc.

The ontology illustrated above is not very expressive. It has only a few classes, one implied relation (is-a) and covers only objects, not verbs, qualifiers or context. Within the expressiveness of the ontology there is no difference between a "car" and a truck or motorcycle.

Expressiveness is a key design decision. The more expressive the more value can theoretically be derived but it also is more difficult to create and maintain.

Axioms and Assertions

Axioms and assertions are how you assign or operate on individuals and classes. Some assertions divide individuals in classes or subclasses. These describe how you decide if an individual "is-like" another or how one class "subsumes" another. Other axioms describe operations or integrity relationships.

The reliability of assertions and axioms is critical to successfully computing meaningful results. Misunderstanding the scope and span of the assertions and how they apply to groundings within the overall expressiveness of the ontology can turn deductive results into gibberish and bad business results.

Taxonomy

A taxonomy is a list of objects or entities organized according to a classification system. As noted earlier, formally defined taxonomic or class relations are fundamental to most technically-focused ontologies. In the languages used to describe these ontologies the things group according to a classification system are often called classes. Based on these class relations, subsumption inference can be used to reason about document content at conceptually higher levels than statistical methods.

Informal classification schemes can include empirical assignment or other judgment based assignments. These schemes can avoid many of the developmental challenges associated with making a rigorous assertion about the "is-ness" of an object (see -morphism below).

Semantic Relations

Semantic features commonly represented in ontologies, include:

Hypernym – the taxonomic relation. The class A is a hypernym of class B if we can say class B is a kind/ type/ form of class A.
Meronym – the component relation. The class C is a meronym of class B if we can say class C is a part/ component of class B.
Synonym – the identity relation. The class D is a synonym of class C if in fact class C and class D are the same class.

Note that these semantic relations are bound to the expressiveness of the ontology. The car, truck and motorcycle are synonyms according to the simple illustration regardless of how counterintuitive the synonym relation seems to us.

Morphism

Morphism is the quality or state of having (such) a form as in homomorphism or isomorphic. In ontologies, a fundamental design decision is the morphism allowed to describe controlling relationships between operators, individuals and their classes, or number of subsumptive relationships classes can have.

Isomorphism:

A one-to-one correspondence between the elements of two sets such that the result of an operation on elements of one set corresponds to the result of the analogous operation on their images in the other set.

Polymorphism:

the quality or state of existing in or assuming different forms

Going back to the ontology that describe aircraft above, consider adding a class relation describing the fuel from which power is generated or hybrid vehicles such as amphibious trucks or even the flying car from James Bond movies. Is such an individual forced to be either "landcraft" or "watercraft" but not both?

Forcing isomorphic relations makes the reasoning computations much simpler to contemplate and implement. But they also create significant complications to development and maintenance as in how to classify a cell phone with a camera or personal organizer with MP3 audio.

Polymorphic ontologies would provide the opportunity to describe these real-world object without prejudice to which characteristic is more important and which is to be denied. They reduce errors in creating in maintaining ontologies because they allow multiple users to come to record the same knowledge about an object or entity whereas a the stored knowledge in an isomorphic ontology is dependent on the decisions made by each individual contributing to the ontology--"it looks like a phone so I will call it a phone" v. "I am into cameras and this phone-camera takes fine pictures so I call it a camera".

Another useful feature of polymorphic ontologies is they support a process of Polymorphic refinement: where a definition from an ontology is included and refined. For example, the addition operator, defined in a number ontology, can be included in ontology A and extended to apply to strings and included in ontology B and extended to apply to vectors.

Polymorphic refinement allows for an ontology to become developed by consensus and evolution. The phone-camera can be classified both ways and eventually used to create yet a new class. By relaxing the relation requirements it becomes simpler to make an assertion and to let that assertion be compared with other assertions on the same object.

Quality and Veracity

Quality and veracity (how well a representation of an object reflects the salient properties of that object) are often overlooked in contemporary literature on ontologies. We can not overlook quality if we are to deploy ontologies in real-world applications.

Quality needs to be considered at all stages of the lifecycle of an ontology. How are classification decisions made? Is there anyway to apply a system of checks and balances to assure they are made consistently and within the scope of expressiveness. How does the ontology evolve? Can the evolution be compatible with previous assertions? How are the results used? Can the results be interpreted in a way that does not take into account the constraints and limitations of the groundings and expressiveness?

It is tempting to incorporate ontologies created by others when developing your ontology-based application. Yet to do is to incorporate all of the risks associated with that ontologies lifecycle. From the technologist perspective, the potential of linking dozens and hundred of ontologies in a "semantic web" is compelling—from a process engineer's perspective the implications of quality results and their associated risk is daunting.

Ontology Languages

These languages are meta-ontology languages in that they provide the ability to describe ontologies, not a specific ontology.

Resource Description Framework (RDF) and RDF-Schema (RDFS)
DARPA Agent Mark-up Language (DAML) and the Ontology Inference Layer (OIL)
Web Ontology Language (OWL)
Unified Modeling Language (UML)
Description Logic (DL), a subset of First Order Logic
Structured Query Language (SQL)

Types of Ontologies

Upper

Formal upper ontologies provide a rich meta vocabulary to help develop ontologies and suggest abstract data types which can be supported by ontology servers to make it easier to build rich ontologies.

An upper ontology is limited to concepts that are meta, generic, abstract and philosophical, and therefore are general enough to address (at a high level) a broad range of domain areas.

Suggested Upper Merged Ontology (SUMO) - The SUMO provides a foundation for middle-level and domain ontologies, and its purpose is to promote data interoperability, information retrieval, automated inference, and natural language processing. The SUMO consists of approximately 4,000 assertions (including over 800 rules) and 1,000 concepts. The SUMO is designed to be relatively small so that these assertions and concepts will be easy to understand and apply. Some of the general topics covered in the SUMO include:

Structural concepts such as instance and subclass
General types of objects and processes
Abstractions including set theory, attributes, and relations
Numbers and measures
Temporal concepts, such as duration
Parts and wholes
Basic semiotic relations
Agency and intentionality.

If you reference the SUMO, please include a link to the primary paper, from FOIS-2001. Teknowledge SUMO ontology browser can be accessed online.

Note, there is little new material about SUMO posted after 2004, which may indicate it has become unsuitable for use.

Other Upper Ontologies

BWW
Cyc - CyCL - approximately 3,000 terms
Dolce
Generalised Upper Model (GUM)
EDR
F-logic (Frame-Logic)
FrameNet
Knowledge Interchange Format (KIF) is a formal language for the interchange of knowledge among computer programs, written by different programmers, at different times, in different languages.
- KIF is a monotonic first order logic with a simple syntax and some minor extensions to support reasoning about relations.
Mikrokosmos - Mikrokosmos knowledge-based machine translation system currently under development at the Computer Research Laboratory, New Mexico State University.
Ontolingua
OpenCyc
REA Enterprise Ontology - initially created by William E. McCarthy, mainly for modeling of accounting systems.
SENSUS - Sensus ontology (formerly known as the Pangloss ontology) is a freely available `merged' ontology produced by the Information Sciences Institute (ISI), California
SUMO
WordNet

Application Domains

Logical foundations and formal verification

References

Cyc Merged Ontology
IEEE P1600.1 Standard Upper Ontology Working Group (SUO WG)
Expert Advisory Group on Language Engineering Standards
- Preliminary Recommendations on Semantic Encoding
Relative Definability in Formal Ontologies
Methodologies for Ontology Development
OWL Web Ontology Language
The Wikipedia entry on ontologies provides a lot of useful background.

Dave McComb. Semantics in Business Systems: The Savvy Manager’s Guide. Morgan Kauffman. September 2003.
Semantic Pipelines describes the mechanics of semantic processing starting with lexical analysis through semantic reconciliation.
Vocabularies and Security - describes using Vocabularies and XML Schema based validation to improve security in SOA.
Contivo Builder Vocabularies - Contivo's technical implementation of vocabularies as a framework for developing XML Schemas.
Semantic Integration - White Paper Executive Overview
Semantic Integration White Papers by Zapthink and Contivo - requires registration