Ontology-driven Applications

From OSF Wiki
Jump to navigation Jump to search
Important yellow.pngThis page needs to be updated to OSF v. 3.0; terminology and perhaps images are out of date !!!

The Open Semantic Framework is premised around the idea of ontology-driven applications. There are a number of steps and design considerations for what constitutes an ODapp (ontology-driven application).[1]

Overview

The diagram below shows a general workflow for migrating existing instance data into the semantic enterprise. The diagram is broken down into three parts. The first part is to characterize and stage existing data and information into the underlying structured data framework. The specific form of this to support ODapps is called an adaptive ontology.

Adaptive ontology workflow.png

The right-hand side of the diagram is the access and display part. It is here that developers or users can make selections from dropdown lists and so forth to define the “slices” of diced results sets they wish to display. The results of those interactions are structured data results sets that are pre-staged to “drive” various applications and displays [2][3] These same capabilities can also be embedded into standard Web end user applications, such as content management systems.

The third and middle part of the diagram is the critical part, the pivot point. It is the interface layer between the structured data on the left and the display and presentation of that data on the right. This abstraction layer is the OSF Web Services framework that “bridges” between the black box of what happens with RDF and semantic Web structured data characterizations on the left in order to feed, or “drive”, useful services and functions on the right.

We call this general design and architecture “ontology-driven applications”.

Part 1: Structured Data Instances and Ontologies

Adaptive ontologies set the structural basis for all subsequent data display, analysis, inferencing, entailments, and the like. We call them “adaptive” because they embrace a set of unique best practices. These practices enable the ontologies to do the double-duty of first structuring data and then driving generic applications by properly informing user interfaces, dropdown lists, menus and the like.

This structuring results in faceting key important dimensions and attributes of available content. Structured data gets organized. Unstructured data (text) gets tagged via this structure and integrated with it.

As Structured Dynamics’ general product schema makes clear (see the diagram at [4]), the approach leverages existing assets as much as possible. Often, this means leaving most existing data structures in place. These existing assets are staged and converted in two complementary manners that largely correspond to the conceptual ABox (instance) and TBox (concepts and schema) split central to description logics and pivotal to SD’s methodology.[5]

Whether transitioning small chunks or big chunks, this staging of existing data in Part 1 results in an RDF-accessible haracterization of the starting content. Instances and their attributes are represented via a common notation, generally based on irON (instance record and Object Notation), that is an extensible notation and vocabulary for capturing the data characterizations, attributes and metadata of the candidate instance data (“records” in RDBMS parlance). These instances may either be internal or proprietary records, or instance data on the Web or in the public domain. By properly matching same or similar instances to one another, any source of instance characterization can be meaningfully combined.

This instance notation is extremely lightweight, and really is merely an RDF representation of data characterizations. In the characterizations to this point there is not yet any “world view” involved: we are simply describing instances and their attributes in a manner akin to key-value pairs. The process to this point is entirely descriptive.

However, these instance characteristics do contain within them the semantics as to how to describe these attributes (your “glad” is my “happy”), as well as potentially a schematic or conceptual view of how these instances relate to one another and to the broader world. Instance characterizations provide the building blocks, that are then related and made semantically whole via a second “terminological” level.

These terminological, or conceptual, relationships (the TBox), reside at a different level from simply decribing things. Rather, these schema — what in this context are best known as ontologies — provide a precise language and means for describing conceptual relationships. If these structural relationships are done well, they are coherent. Coherence is a matter of a consistent world view that “hangs together” when analyzed via powerful logical techniques available via description logics and other broader mechanisms of the semantic enterprise.

Thus, as we transition from the existing, the operational workflow splits the input data stream into two pathways:

  • Instances, and their descriptive characteristics, and
  • Conceptual relationships, or ontologies.

A sequential flow of these steps and splits is provided by this diagram below that shows: 1) the conceptual structure of the concept ontology; as 2) matched with the instances and their descriptive attributes that populate that schema.

Structure processing.png

A key point is that ontologies can be grown and scaled incrementally. We leverage as much existing starting structure as possible and can readily bound the scope to meet budget and delivery imperatives.

The concepts and entities that occur within these structures help inform a fairly simple tagging system, OSF Tagger (scones). (There are also benefits from “triangulating” between entity or instance identification and concept identification that helps inform disambiguation nearly for free; see further [6]).

These approaches are pretty straightforward for any organization wanting to test the idea of becoming a semantic enterprise. Real benefits — such as concept retrievals overcoming the limitations of standard keyword search — can be demonstrated from even small starting ontologies and structures. Given the inherent connectedness of the data, it is possible to expand the scope and usefulness of the information incrementally within fixed and manageable budgets.

Part 2: OSF Web Service: A Web-oriented Services API and Framework

A pivotal part of the ontology-driven application infrastructure is OSF Web Service, which is platform-independent Web services middleware. OSF Web Service is an abstraction layer that provides the APIs, search endpoints, and specific Web services for accessing, querying or getting results sets from the underlying structured data and ontologies.

OSF Web Service has a standard set of access and retrieval services including browse, full-text search, CRUD, direct record retrievals, and the like. It is embedded within an access and permissions service that acts at the level of registered datasets. Then, based on the requested protocol, OSF Web Service returns the filtered results set. These results sets can be delivered as XML, JSON, or any of the other formats already available.[7] They can readily and dynamically populate HTML pages and forms in any deployment framework. For specific purposes, these results sets can also be returned as pre-staged, properly formatted results streams for driving specific applications.

As an API, the OSF Web Services can be interacted with and driven via standard HTTP requests. Alternatively, these requests can come from simple to complicated Web apps that create the API queries based on user interface choices such as selections from dropdown lists or clicking on various listed options, such as provided by OSF-Drupal modules or the OSF widgets.

Queries and requests to OSF Web Service may also include a parameter for results sets to be returned in particular formats. The irON protocol supports requests or results in CSV, XML or JSON, in addition to other flavors including multiple serializations of RDF.

In this manner, only a simple converter need be added to the OSF Web Services stack in order to “drive” a new application with a particularly formatted results set stream.

OSF Web Service thus acts as a single, uniform Web interface to all of the “black box” nuances of the structured data system organized by the adaptive ontologies. Further, virtually any data structure may be ingested and converted from external sources via an import service and made part of the underlying canonical structure, making the framework perfect for data federation.[8] Lastly, the dataset nature of the framework, and its neutrality to underlying data stores or content management systems, also makes OSF Web Service an excellent framework for one or many nodes to share information and collaborate across the Web.[9]

The following diagram shows how a diverse, Web-based network, involving a diversity of Web portals and data gateways and hubs, can work via the OSF Web Service framework to establish a complete collaboration network. Via datasets and differential access rights and permissions, virtually any combination of potential interactions can be supported:

OSF Web Services network.png

These potentials are fundamentally new, and support the emerging collaboration environment. These potentials provide exciting prospects for integrating various regional offices or to enable direct collaboration with customers, partners or suppliers.

Part 3: Ontology-driven Applications

The basic design of OSF Web Service is to provide a middleware layer that fulfills one or more of these broad user interaction modes:

  • To create, update, delete or otherwise manage data records
  • To browse or view existing records or record sets, based on simple to possible complex selection or filtering criteria, or
  • To take one of these results sets and progress it through a workflow of some nature, involving specialized analysis, applications, or visualization.

The OSF framework provides generic applications in these areas (with many more possible), the operations of which are guided by the instructions and nature of the underlying data that feeds them. This design proves data characterization practices may be adopted within ontologies so as to stage or “drive” such generic applications.

In the case of a standard structured data display (say, a simple table like a Wikipedia infobox, for example), such generic design includes templates tailored to various instance types (say, locational information presenting on a map versus people information warranting a image and vital statistics). Alternatively, in the generic design for some specialized application (say, Adobe Flash), the information output of the results set may need to contain certain formats and attributes.

“Ontology-driven apps”, then, are really informed structured results sets that are outputted in a form suitable to various intended applications. This output form can include a variety of serializations, formats or metadata. This flexibility of output that is tailored to and responsive to particular generic applications is what makes them “adaptive”.

Using this structure, then, it is possible to either “drive” queries and results sets selections via direct HTTP request or via simple dropdown selections on HTML forms (that is, from right to left as shown on the first diagram). Similarly, it is possible with a single parameter change to drive either a visualization app or a structured table template from the equivalent query request (that is, from left to right on the first diagram).

“Ontology-driven apps” thus provide two profound benefits. First, the entire system can be driven via simple selections or interactions without the need for any programming or technical expertise. And, second, simple additions of new and minor output converters can work to power entirely new applications available to the system. If, say, Adobe graphics applications need to change tomorrow for Microsoft Silverlight, that switch is easy and can be made transparent to the designer.

Endnotes

  1. This article is taken from M.K. Bergman, 2009. "Ontology-driven Applications Using Adaptive Ontologies,", AI3:::Adaptive Information blog, November 23, 2009.
  2. These selections and requests need not occur only via user interfaces or HTML forms, but may also occur programmatically via API or direct Web services calls.
  3. There are two main classes of visualizations possible with this design: 1) navigations or explorers of the concept space, which is a particularly open challenge for large, graph-based knowledge bases; or 2) conventional data visualizations or graphics or mappings of instance data. Both are shown as workflow boxes on the diagram above.
  4. See [1] for a general descriptive illustration of Structured Dynamics’ product stack. There is also a longer slideshow, from which this diagram is drawn as slide #37.
  5. The reference to ABox and TBox is in accordance with the for description logics:
    “Description logics and their semantics traditionally split concepts and their
     relationships from the different treatment of instances and their attributes and roles,
     expressed as fact assertions. The concept split is known as the TBox (for ''terminological''
     knowledge, the basis for T in TBox) and represents the schema or taxonomy of the domain at 
     hand. The TBox is the structural and intensional component of conceptual relationships. 
     The second split of instances is known as the ABox (for assertions, the basis for A in ABox) 
     and describes the attributes of instances (and individuals), the roles between instances, and 
     other assertions about instances regarding their class membership with the TBox concepts.”
  6. Via this approach we now can assess concept matches in addition to entity matches. This means we can triangulate between the two assessments to aid disambiguation. Because of these logical segmentations, we also have multiple “clusters” (that is, either the concept, type, superType or dimension) upon which to do our disambiguation evaluations, either between concepts and entities or within the various concept clusters. We can do so via either statistical-based methods or features (machine learning) methods. In other words, because of logical segmentation, we have increased the informational power of our concept graph. See further [[2]]
  7. See [3].
  8. See, for example, [4].
  9. See, for example, [5].