Ontologies: Basic Groundings

In computer science and information science, an ontology is a formal representation of a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to define the domain. An ontology provides a similar role for the semantic Web as does a schema, say, for relational databases.

Ontologies, especially what we term adaptive ontologies, are at the heart of an OSF installation. Because of this central role, we call OSF and its instantiations an ontology-driven application. It is also important to understand the basis, construction and logical role of ontologies in order to develop, maintain and extend them for the application. This document provides the links to these basic groundings.

Introduction and Use
Ontologies share many aspects with other means to organize information such as categories, folksonomies, taxonomies or the aforementioned relational data schema. However, there are a number of key differences when ontologies are constructed to best practice standards (see below):


 * They are written in languages such as RDF, OWL or other logical standards such as common logic that have clearly defined semantics with agreed-upon syntaxes in order to enable and promote interoperability
 * Their resulting schema are not strictly hierarchical, but graph-like in nature, which is better able to capture multiple and diverse relationships between things
 * The data model and means of representing information within ontologies is well-suited to capture information that is unstructured (say, documents), semi-structured (say, Web pages, metadata or markup) or structured (say, spreadsheets or conventional databases)
 * In the context of the semantic Web, all objects and relationships are formally defined and given a Web-address URI, which means any system with Web access (via HTTP) can use and interoperate with the information
 * And, there is a coherent logic underlying ontologies (see next) that helps to promote inter-ontology mapping, reasoning, inferencing and consistency testing within and across ontologies. This means that information in diverse domains and from diverse perspectives can be made to work together.

The accompanying Basic Guide to Ontologies describes the use and distinctions of ontologies at an executive level. There is also an Intro to Ontologies that provides a more comprehensive survey across the space of information ontologies.

Logic Basis
Beginning with philosophy and extending to mathematical logic and computer science and information theory, there is a robust logical foundation to the design and construction of ontologies. In the case of OSF installations and its best practices, we firmly ground these practices into description logics based on the open world assumption.

Description Logics
Description logics and their semantics traditionally split concepts and their relationships from the different treatment of instances and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox (for terminological knowledge, the basis for T in TBox) and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. The second split of instances is known as the ABox (for assertions, the basis for A in ABox) and describes the attributes of instances (and individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts.

Though it is not strictly necessary to separate these perspectives from one another, it is good practice to do so (especially for domain ontologies, which are attempts to capture particular world views about various topic or domain areas). This is because of the different roles and work activities played by both the TBox and ABox.

Conscious separation of the so-called ABox (assertions or instance records) and TBox (conceptual structure) in ontology design provides some compelling benefits:


 * Easier ingest and incorporation of external instance data, including conversion from multiple formats and serializations
 * Faster and more efficient inferencing and analysis and use of the conceptual structure (TBox)
 * Easier federation and incorporation of distributed data stores (instance records), and
 * Better segregation of specialized work to the ABox, TBox and specialty work modules, as this figure shows:



Maintaining identity relations and disambiguation as separate components also has the advantage of enabling different methodologies or algorithms to be determined or swapped out as better methods become available.

Open World Assumption
The open world assumption (OWA) is a key underpinning to the semantic Web and basically states that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity. This is in contrast to a closed world assumption (CWA) common to traditional databases that state that what is not known to be true is presumed to be false; it needs to be explicitly stated as true. Negation as failure (NAF) is a related assumption in CWA, since it assumes as false every predicate that cannot be proven to be true. Under CWA, any statement not known to be true is false, and everything is prohibited until it is permitted.

The open world assumption provides a number of benefits to knowledge management tasks including:


 * Domains can be analyzed and inspected incrementally
 * Schema can be incomplete and developed and refined incrementally
 * The data and the structures within these open world frameworks can be used and expressed in a piecemeal or incomplete manner
 * We can readily combine data with partial characterizations with other data having complete characterizations
 * Systems built with open world frameworks are flexible and robust; as new information or structure is gained, it can be incorporated without negating the information already resident, and
 * Open world systems can readily bridge or embrace closed world subsystems.

See further the Overview of the Open World Assumption document.

Creating and Maintaining Ontologies
The central role of ontologies means they are a constructed artifact and, like software or applications, need to be properly designed, engineered, learned and managed. There are a number of contrasting methodologies for developing ontologies. It is useful to study these background concepts.

For OSF installations, the two major tools for editing and managing ontologies are the third-party Protégé ontology development environment, and structOntology, which is one of the embedded OSF-Drupal tools within the OSF framework.

Best Practices
In their particular role as adaptive ontologies, there a number of best practices recommended for ontologies as used within an OSF installation.

For domain ontologies, one aspect of this is "punning" via metamodeling, whereby concepts can be treated either as classes (aggregrations of things) or as instances, which can be characterized in their own right. Here is a basic representation of how the same idea of a thing, in this case trucks, can be treated in both of these ways:



Additional best practices include guidance on use of labels and definitions to support user interfaces and ways to organize and manage vocabularies. There is also a series on ontology best practices from Mike Bergman's blog. and a further document on ontology vocabulary design as well as a general tutorial series.