A Complete Overview of OSF

From OSF Wiki
Jump to: navigation, search




Screencast Tutorial

0.jpg

The Open Semantic Framework is a complete and integrated software stack that combines external open source components with specific enhancements developed by Structured Dynamics. OSF is a complete foundation to bring semantic technology capabilities to the enterprise. OSF has a variety of potential applications from enterprise information integration to collaboration networks and open government.

OSF can integrate and manage all types of content — unstructured documents, semi-structured files, spreadsheets, and structured databases — using a variety of best-of-breed engines. All external content is converted to the canonical RDF data model, enabling common tools and methods for tagging and managing all content. Ontologies provide the schema and common vocabularies for integrating across diverse datasets. These capabilities can be layered over existing information assets for unprecedented levels of integration and connectivity. All information within OSF may be powerfully searched and faceted, with results datasets available for export in a variety of formats and as linked data.

The OSF stack consists of multiple layers. In the standard configuration, there is tight integration with Drupal 7 and its leading modules, enabling use of OSF with standard Drupal interfaces and constructs. All interactions with OSF occur via a robust layer of nearly 30 Web services and their associated APIs, which abstract and simplify how to interact with the stack. The OSF engines layer provides RDF and OWL management capabilities using the proven Virtuoso (RDF), Solr (search), OWL API (ontologies) and GATE (tagging and NLP) standalone applications. Besides Drupal and these engines, all remaining OSF components and Web services have been developed specifically to achieve the complete architecture of the Open Semantic Framework. OSF has been developed over five years and is now in version 3.x.

OSF features an automatic installer that retrieves and then installs all components in the stack. It is supported by a variety of command-line tools useful for managing the ontologies and datasets used within OSF, as well as permissions management and unit and systems integration testing. The Open Semantic Framework is supported by a comprehensive library of open-source documentation comprising more than 500 articles and associated figures and diagrams.

Simple Architecture

The basic architecture of the Open Semantic Framework pivots around the OSF Web Services; there are now nearly 30 providing a wealth of functionality. Full CRUD under user permissions and security is provided to all digital objects in the stack. This OSF access layer provides a means to access best-of-breed data management and indexing engines through uniform RESTful Web services. These access services both: 1) abstract away the complexity of the individual engines, while 2) enabling combined capabilities orchestrated by OSF not available from the engines alone.

This intermediate OSF Web Services layer may also be accessed directly via API or command line or utilities like cURL, suitable for interfacing with standard content management systems (CMSs), or via a dedicated suite of connectors and modules that leverage the open source Drupal CMS. These connectors and modules, also part of the standard OSF stack and called OSF for Drupal, natively enable Drupal's existing 10,000 modules and ecosystem of developers and capabilities to access OSF using familiar Drupal methods.

This basic architecture from user interface to engines is quite simple in design:

Simple OSF Stack

In this design, OSF is the meat in the sandwich that links a proven content management system, Drupal, with proven semantic technology engines such as Virtuoso, GATE, OWL API 2, and Solr. What had heretofore been unconnected capabilities are now integrated via the OSF glue.

The premise of the entire stack is based on the RDF data model. RDF provides the ready means for integrating existing structured data assets in any format, with semi-structured data like XML and HTML, and unstructured documents or text. The OSF framework is made operational via ontologies that capture the domain or knowledge space, matched with internal ontologies that guide OSF operations and data display. This design approach is known as ODapps, for ontology-driven applications.

The OSF stack is supported by complete documentation, automated installation routines, comprehensive unit and end-to-end tests, and workflows and use case studies to ease adoption. SD and its partners provide experienced support and extension services.

Reference Architecture

Screencast Tutorial

0.jpg

The Open Semantic Framework — currently at version 3.x with five years of continuous development — is an integrated software stack that combines external open source components with specific enhancements developed by Structured Dynamics. OSF is a complete foundation to bring semantic technology capabilities to the enterprise.

The basic architecture of the Open Semantic Framework pivots around the OSF Web Services; there are nearly 30 providing a wealth of functionality. Full CRUD under user permissions and security is provided to all digital objects in the stack. This OSF access layer provides a means to access best-of-breed data management and indexing engines through uniform RESTful Web services. These access services: 1) abstract away the complexity of the individual engines, while 2) enabling combined capabilities orchestrated by OSF not available from the engines alone.

This intermediate OSF Web Services layer may be accessed directly via API by interfacing with standard content management systems (CMSs), or by using dedicated connectors and modules to the open source Drupal CMS. These connectors and modules, also part of the standard OSF stack and called OSF for Drupal, natively enable Drupal's existing 10,000 modules and ecosystem of developers and capabilities to work with OSF using familiar Drupal methods.

A general overview and specific layers in the OSF stack are described below.

A Web-oriented Architecture

This diagram shows the detailed architecture for the Open Semantic Framework stack:

Detailed OSF Stack

In this design, OSF is the meat in the sandwich that links a proven content management system, Drupal, with proven semantic technology engines such as Virtuoso, GATE, OWL API 2, Solr, and Memcached. All OSF-specific components are shown in yellow in the diagram. In some cases these are newly developed components; in others, they are wrappers and such surrounding existing third-party open source capabilities. All external third-party capabillities are shown in the colors other than yellow.

The overall philosophy in architecting the OSF stack is to provide a Web-based, scalable framework for integrating data and content from a variety of sources. OSF corresponds to what is known as a Web-oriented architecture. WOA has a number of features:

  • Data is generally exposed (and universally available) as linked data
  • SPARQL endpoints and APIs are generally RESTful in design
  • The overall architecture is modular, with inherent decentralized and distributed aspects
  • All display and visualization aspects are cross-browser ready and capable.

WOA builds on aspects of many of the largest properties on the Web, with proven scalability and extensibility. As used in OSF, these proven Web aspects are enhanced by adhering to open standards from the W3C (World Wide Web Consortium) in the areas of semantic technologies and vocabularies. This standards adherence helps ensure that instances built with the Open Semantic Framework have a high degree of interoperability with other sites and capabilities on the Web.

OSF provides a standardized content storage and management environment that is Web-accessible, scalable and distributed. The content that can be hosted within OSF includes documents (unstructured data), metadata (semi-structured data), conventional database information (structured data) and multimedia metadata. While this content can exist in multiple native formats in the wild, it is converted to a common RDF format that enables the development of common ("canonical") tools and operations to act upon this content.

The OSF design and architecture is explicitly generic. The same set of tools and capabilities used in OSF can be applied to manage and understand information in any domain. What changes from domain to domain are the data structures (the ontologies, schema and entity reference lists) used by OSF. Differences between domains may also determine which components are included or not for a given instantiation.

The OSF for Drupal Layer

Though any mature content management system (CMS) could act as the presentation front-end to the Open Semantic Framework, Drupal is the standard option packaged with OSF. Drupal has a rich ecosystem of developers and support, plus thousands of modules that extend its functionality and an architecture well-suited to the requirements of OSF. The OSF for Drupal layer leverages existing, well-known Drupal modules and Drupal itself in ways familiar to the broader Drupal community. Integration at this layer assigns CMS and user interface responsibilities to Drupal in ways an accomplished Drupal developer can implement.

OSF's integration with Drupal occurs via the standard plug-in modules of Drupal and "Drupal connectors". OSF Drupal modules are conventional Drupal modules written specifically to act as a management interface to the OSF. There is a corresponding OSF for Drupal module for every OSF Web Service noted in the main architectural diagram.

Drupal connectors are specific to OSF; they are Drupal libraries written specifically for OSF that extend current, popular Drupal modules. These code libraries enable these Drupal modules to interact directly with OSF, sometimes with extended visible functionality, but always in concert with the current module design. The idea is to leverage pivotal and common Drupal modules as familiar interfaces to OSF. The three Drupal modules connected so far are Views, Entities, and the Search API.

OSF for Drupal also comes with a series of "semantic components". These widgets present results from specific queries to one or more OSF Web Service instances. These results sets are generated by the widget and issued to the underlying OSF based on user interactions. Current display widgets include: filter; structured record displays; tabular templates (similar to infoboxes); maps; bar, pie or linear charts; relationship (concept) browser; story and text annotator and viewer; and a series of mapping or geo-locational widgets. These widgets enable semantic information from OSF to be presented via a variety of data visualization or data presentation methods such as charts, tables, records, image galleries, or maps.

Besides market share and familiarity, the Drupal CMS layer gives OSF the ready ability to change themes and layouts (”skins"), and to extend site functionality.

The Middleware Layer

This Open Semantic Framework stack may be controlled or interacted with via external services at the middleware layer. Some of this interaction may occur via a dedicated API, some programmatically.

The OSF Web Sevices PHP API is a framework available to PHP developers to help them generate queries to any OSF Web Service endpoint. Each endpoint has its own WebServiceQuery class in the API that is used to generate the query, send it to the appropriate endpoint, and get back a resultset. The resultset can then be manipulated by using the Resultset API. This same API can be used to transform the resultset into different formats.

The OSF Web Services PHP API enables developers to write function calls directly in PHP that then issue the HTTP queries to the respective OSF Web Service endpoints. It is also the API interace for other Drupal modules and connectors (see the OSF for Drupal overview) to programmatically interact with the Web service endpoints. This same design can be replicated in other languages (such as Java, etc.).

It is via the middleware layer that security and external services may also interact with the system. In these areas specific APIs are not available, but programmatically other external systems have been successfully interfaced in the past.

For security, it is possible to either use the native OSF service or invoke an external system. Additional experience with incorporating other external applications in authoring, version control or harvesting are also documented at the middleware layer.

The OSF Web Services Layer

The OSF Web Services are the pivotal layer to the stack. The OSF Web Services provide the standard, common interface to access and manage the OSF engines, via standard API calls and endpoints, either from the Drupal layer or from external systems. The OSF Web Services are generally RESTful in design and are based on HTTP and Web protocols and open standards.

OSF Web Services at present comprise 27 individual Web services, all operating within the same framework. Individual API and Web services documentation is available for these:

All Web services are exposed via APIs and SPARQL endpoints. Each request to an individual Web service returns an HTTP status and optionally a document of resultsets. Each results document can be serialized in many ways, and may be expressed as either RDF or pure XML. An internal representation, structXML, is used for internal communications across all OSF Web Services and with other layers.

The OSF Web Services have a native security service that governs access rights and permissions (which may be swapped out with an external security service; see description of the middleware layer). These rights occur at the level of the dataset, which gives immense flexibility to how data may be accessed, read, modified, created or deleted (or not). Depending on rights and query, results sets may be returned from a given OSF Web Service instance in a rich variety of ways.

Each OSF Web Service has a unique Web address that enables one or a multitude of instances to communicate and share with one another. This simple, but elegant, method enables OSF Web Service instances to participate or not in potentially global or restricted local networks and collaboration environments. Since any OSF instance on the Web has its own respective Web services and access URIs, a broader distributed network with differential user access and permissions can be readily established.

Each OSF Web Service is accessible or not via a three-dimensional design based on:

  1. Users (or groups or roles)
  2. The individual Web service, and
  3. Datasets.

What this means is that a given user may be granted access or not — and various rights or not from reading to the creation or deletion of information — in relation to specific datasets. Stated another way, it is in the nexus of user type and dataset that access control is established for the OSF semantic system.

In an enterprise context, a given individual (“user”) may have different access rights depending on circumstance. A worker in a department may be able to see and do different things for departmental information than for enterprise information. A manager may be able to view partner information that is not readable by support personnel. A visitor to a different Web site or portal may see different information than visitors to other Web sites. Supervisors or content editors might be able to enter or modify content that is only viewable by others.

Besides management, a key function of the layer is to get external information assets into the system. These external assets may exist in many formats and may be described by many schema. They may come from internal transaction systems or warehouses, or may exist external on the Web or at supplier or partner sites. These information assets may span from conventional databases and relational data systems to XML interchange standards, Web pages and standard internal text or documents. In short, there is no information asset that is not amenable to be included in this framework.

External assets and their structure may be ingested according to defined protocols and may be structurally tagged using the OSF tagging service. The ontologies and entities of a given OSF instance are the basis for tagging using this service. Depending on the source, the net result of the ingest is to produce interoperable data and information for use at the engines layer.

The OSF Engines Layer

The premise of the Open Semantic Framework stack is based on the RDF data model. Using a common data model means that all Web services and actions against the data only need to be programmed via a single, "canonical" form. Simple converters convert external, native data formats to the RDF form at time of ingest; similar converters can translate the internal RDF form back into native forms for export (or use by external applications). This use of a "canonical" form leads to a simpler design at the core of the stack and a uniform basis to which tools or other work activities can be written. This leads to lower development and maintenance costs, and faster implementation. This framework is then made operational via ontologies that both capture the domain or knowledge space with internal ontologies that guide OSF (see separate Role of Ontologies). This design approach is known as ODapps, for ontology-driven applications.

The OSF engines are all open source and work to support this premise. The OSF engines layer governs the index and management of all OSF content. Documents are indexed by the Solr engine for full-text search, while information about their structural characteristics and metadata are stored in an RDF database, called a "triple store." The schema aspects of the information (the "ontologies") are separately managed and manipulated with their own W3C standard application, the OWL API. At ingest time, the system automatically routes and indexes the content into its appropriate stores. Another engine, GATE, is available for semi-automatic assistance in tagging input information and other natural language processing (NLP) tasks.

The RDF triple store is provided by OpenLink's Virtuoso software. Virtuoso is a cross-platform ‘universal server’ for SQL, XML, and RDF data, including data management, that also includes a powerful virtual database engine, native hosting of existing applications, Web services deployment platform, Web application server, and bridges to numerous existing programming languages. We mostly use the RDF storage and management, SPARQL and inferencing capabilities of Virtuoso.

Many structured data systems lack good performing full-text search. Also, structured data based on linked data RDF often substitutes Web identifiers for literal text values. This practice is good for linking and tracking purposes, but can excise much text, leading to incomplete results sets during standard text search. To address these issues, we: 1) changed standard RDF practice to also record literals in addition to URI identifiers; and 2) integrated our structured data store with the Solr text-search engine. Solr is an open source enterprise search server based on the Lucene Java search library, with faceted search, caching, and many more features.

The OWL API is a Java implementation for creating, manipulating and serializing OWL ontologies. This engine gives us a very flexible and powerful way for managing the ontology schema at the core of OSF and to conduct special retrieval manipulation tasks based on the structure in those schema.

The General Architecture for Text Engineering (GATE) engine is a Java suite of tools used by a worldwide community of scientists, companies, teachers and students for all sorts of natural language processing tasks, including information extraction in many languages. GATE is one of the acknowledged best tools for conducting all types of computational tasks involving human language or text analysis. The primary use of GATE in OSF is to drive the semi-automatic tagging of subject tags within documents.

The OSF engines layer also includes the PHP/Java Bridge, an XML-based network protocol to connect a native script engine (in our case, PHP) to a Java virtual machine. It is fast and efficient. The bridge gives us the capability to run Java-based engines efficiently within the stack. It connects to GATE and the OWL API within OSF, and provides a ready means for integrating still other Java-based capabilities and engines as customers may need.

For efficiency, Web service requests are handled by Memcached. It is an open source, high-performance, distributed memory object caching system. The generic Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects), well suited to these API calls.

The fundamental unit of record aggregation upon which these engines act is the "dataset". A dataset refers to a named grouping of records, best designed as similar in record types and intended access rights (though technically a dataset is any named grouping of records). Datasets are one of the three major access dimensions to the OSF (the other two being users/groups and tools/endpoints, see next).

All data objects (what is called in various settings as entities, kinds, types or classes) and their relations (properties, fields, attributes) and their annotations (metadata) are given Web identifiers in the form of URIs. These are similar to Web site URLs, but now designate objects and properties as opposed to Web sites. This means any and all data within the OSF has a unique identifier, accessible using the HTTP protocol.

OSF Management Tools

There are a number of tools that accompany the Open Semantic Framework that aid in managing and configuring the stack and the data and ontologies it uses. These tools are included as part of the standard OSF installs:

OSF Tests Suites

The OSF Tests Suites are a series of about 800 unit tests applied against various OSF Web Services. These tests can be applied automatically via script to check for inadvertent problems during development.

OSF Datasets Management Tool

The OSF Datasets Management Tool (DMT) is a command-line tool used to manage datasets with a OSF Web Services network instance. Different operations can be performed related to datasets management. The Datasets Management Tool can handle any size of dataset. If the dataset file is too big, the framework will slice it in multiple slices and will send each slice to the OSF Web Services instance.

OSF Permissions Management Tool

The OSF Permissions Management Tool (PMT) is a command-line tool used to manage access permissions on a OSF Web Services network instance. This tool is used to list, create and delete access permissions, groups and users.

OSF Ontologies Management Tool

The OSF Ontologies Management Tool (OMT) is a command-line tool used to manage ontologies of a OSF Web Services network instance. It can be used to list ontologies of a OSF Web Services instance, to create/import new ones, to delete existing ones, to generate underlying ontological structures, etc.

OSF Installer

Screencast Tutorial

0.jpg

In addition to these management tools, there is an automatic OSF Installer script that is used to install and deploy a OSF stack on a Ubuntu server. It can also be used to install, upgrade and configure parts of the stack, or related external tools such as the Datasets Management Tool, the Ontologies Management Tool, the OSF Web Service-PHP-API, etc.







Screencast Tutorial

0.jpg

OSF Tagger

OSF Widgets

An OSF Widget is a Flex or JavaScript component that takes record(s) description(s) and structXML schema(s) as input from the Open Semantic Framework, and then outputs some (possibly interactive) visualization(s) of that record. Depending on the logic described in the input schema(s) and the input record(s) description(s), the OSF widget will behave differently to optimize its presentation to users. The OSF widgets are provided as a library, though instructions are also provided for how developers may extend the library with their own widgets.

The existing library is available under Apache2 license, and contains the following widgets:

Core Components

  1. sControl
  2. sMap
  3. sRelationBrowser
  4. sStory
  5. sBarChart
  6. sLinearChart
  7. sGenericBox

Extended Components

  1. sText
  2. sImage
  3. sHBox

A site that showcases some OSF widgets is UMBEL, 35,000 reference concepts for the Web.

You are encouraged to go to these sites and see the various OSF widgets in action. You may also want to see the wiki descriptions of the OSF widgets and their APIs.

This category currently contains no pages or media.