StructXML

From OSF Wiki
Jump to: navigation, search

Introduction

structXML is a straightforward RDF serialization in XML format used for internal communications between OSF Web Services Web services, the Flex Semantic Components and OSF for Drupal; this is the core format used to transmit information between any Open Semantic Framework (OSF) component. In OSF Web Service, any data is internally processed as structXML, and is then converted into several other formats (RDF+XML, RDF+N3, structJSON, irJSON, commON, etc.).

A structXML file is composed of a resultset element which aggregates a series of subjects (records) that are defined with properties and values. Values can be "data" values such as literals or "object" values that are reference to other subjects.

structXML is comparable to structJSON, which uses the same structure but that is serialized in JSON instead of XML. Both are based on the OSF Web Service Internal Resultset Structure

Features

The structXML format support the following features:

  1. Description of subject records
  2. Each record have unique identifiers
  3. Each record have one or multiple types (belongs to one or multiple classes)
  4. Each record can be described with an unlimited number of data or object attributes
  5. Reification is supported on object attributes
  6. The value of data attributes can be defined with a type, or a lang tag.

Specification

<resultset />

The goal of any Web service is to return results. The root element of any OSF Web Service Web service is the element where all results in a given results document are nested.

Here is an example of a resultset that has a single subject but that has all the features outlined above:

<?xml version="1.0" encoding="utf-8"?>
<resultset>

  <prefix entity="owl" uri="http://www.w3.org/2002/07/owl#" />
  <prefix entity="rdf" uri="http://www.w3.org/1999/02/22-rdf-syntax-ns#" />
  <prefix entity="rdfs" uri="http://www.w3.org/2000/01/rdf-schema#" />
  <prefix entity="iron" uri="http://purl.org/ontology/iron#" />
  <prefix entity="xsd" uri="http://www.w3.org/2001/XMLSchema#" />
  <prefix entity="wsf" uri="http://purl.org/ontology/wsf#" />
  <prefix entity="foaf" uri="http://xmlns.com/foaf/0.1/" />
  <prefix entity="ns0" uri="http://test.com#" />

  <subject type="foaf:Person" uri="http://dataset1.com/record-a/">
    <predicate type="rdf:type">
      <object uri="http://umbel.org/umbel/rc/Person" />
    </predicate>
    <predicate type="iron:prefLabel">
      <object type="rdfs:Literal">Bob</object>
    </predicate>
    <predicate type="iron:altLabel">
      <object type="rdfs:Literal">Robert</object>
    </predicate>
    <predicate type="iron:altLabel">
      <object type="rdfs:Literal">Rob</object>
    </predicate>
    <predicate type="iron:altLabel">
      <object type="rdfs:Literal">Ti Rob</object>
    </predicate>
    <predicate type="iron:description">
      <object type="rdfs:Literal">It is a good person</object>
    </predicate>
    <predicate type="foaf:note">
      <object type="rdfs:Literal" lang="en">language test</object>
    </predicate>
    <predicate type="foaf:knows">
      <object uri="http://dataset2.com/record-b/" type="foaf:Person">
        <reify type="iron:prefLabel" value="Ginette" />
      </object>
    </predicate>
    <predicate type="foaf:img">
      <object type="rdfs:Literal">http://dataset1.com/imgs/bob.jpg</object>
    </predicate>
    <predicate type="foaf:img">
      <object type="rdfs:Literal">http://dataset1.com/imgs/bob2.jpg</object>
    </predicate>
    <predicate type="foaf:age">
      <object type="xsd:int">34</object>
    </predicate>
    <predicate type="foaf:age">
      <object type="ns0:int">45</object>
    </predicate>
    <predicate type="iron:prefURL">
      <object type="rdfs:Literal">http://ti-rob.com</object>
    </predicate>
  </subject>
</resultset>

<prefix />

Prefixes are used to shorten URI references. The prefix elements within a resultset element are used to shorten all the URI references within the resultset. There is no obligation to shorten a URI reference (we refers to this action as prefixize a URI). Prefixized URIs can appear:

  1. In the type or the uri attribute of a subject element
  2. In the type attribute of a predicate element
  3. In the uri or the type attribute of an object element
  4. In the type attribute of a reify element

Each time a structXML parser parse a structXML document, it should try to unprefixize any values that appear in one of these attributes.

Prefixes are always introduced with a : character. If we have this prefix defined in a given resultset:

  <prefix entity="foaf" uri="http://xmlns.com/foaf/0.1/" />

Then all the URIs that uses this namespace will be shortened using that prefix. This means that if we have a URI http://xmlns.com/foaf/0.1/Person, the prefixized equivalent string of this URI will be foaf:Person. Both are equivalent in that resultset, but a prefixized URI will be simpler to read for humans and will be shorter to transmit over the web.

<subject />

A "subject" (consistent with the understanding of subject within the standard subject-predicate-object RDF triple) is a record description returned by a web service endpoint for a given query.

A resultset is composed of one or multiple subject(s) depending on the Web service query. This means that the subject element represents the subject of a query to a Web service endpoint.

Each subject has a type and a uri attribute. The type of a subject can be seen as its kind. The URI of a subject is its unique identifier.

<subject type="foaf:Person" uri="http://dataset1.com/record-a/">
  ...
</subject>

<predicate />

A predicate is what describes a subject. A predicate can be used to refer a subject to another subject (in this case, we are talking about an "subject predicate" (which is equivalent to an object predicate in RDF)). A predicate can also be used to describe a subject using some literal strings.

Any subject has zero, one or multiple predicate(s) relationships with other objects.

Every predicate has a type attribute. The type of a predicate can be seen as the kind of relationship between two things (a subject and an object).

There are two families of predicates:

  1. datatype predicates
  2. subject predicates (which is equivalent to an object predicate in RDF)

The datatype predicates are the ones that refers a subjectto a literal value, or any other textual, types, values such as integers, dates, time, etc.

The object predicates are the ones that refers a subjectto another subject.

Here is an example of a datatype predicate:

<predicate type="iron:prefLabel">
  <object type="rdfs:Literal">Bob</object>
</predicate>

Here is an example of a subject predicate:

<predicate type="foaf:knows">
  <object uri="http://dataset2.com/record-b/" type="foaf:Person" />
</predicate>

<object />

Any predicate refers to one or multiple objects. An object can be a reference to another subject, or a literal value.

An object has a type and a possible uri attribute. The type of an object can be seen as its kind. The URI of an object is its unique identifier. It is optional if the object reference is a literal, such as a string name or a number.

A special kind of object exists: rdfs:Literal. The characteristics of this kind of object will be discussed in a special section below.

Here is an example of an object value which is a literal:

<object type="rdfs:Literal">Bob</object>

Here is an example of an object value which is a uri (a reference to another subject):

<object uri="http://dataset2.com/record-b/" type="foaf:Person" />

<reify />

Sometimes it is useful to be able to assert facts about a given triple statement <subject, predicate, object>. This is what reification is about.

The reification example below means: we have a subject that is a bibo:Document. This document has a predicate relationship umbel:isAbout with the thing, that itself is a umbel:RefConcept, referred as http://.../War. Basically, this triple relationship means: "I have a document that is about War".

However we can also assert a certain ratio that shows the confidence level in asserting that statement. By using the umbel:withLikelihood reification property, we can assign a confidence level regarding the "fact" (assertion) of the initial triple statement, as follows: <bibo:Document, umbel:isAbout, umbel:SubjectConcept>, umbel:withLikelihood, "0. 87345872835434">.

This reification gets expressed in the XML data structure as:

  <resultset>
    <subject type="bibo:Document" uri="http://...">
      <predicate type="umbel:isAbout">
        <object type="umbel:RefConcept" uri="http://umbel.org/umbel/sc/War">
          <reify type="umbel:withLikelihood" value="0.87345872835434" />
        </object>
      </predicate>
    </subject>
  </resultset>

The above example shows how an object property value is being reified.

Here is how a datatype property value is being reified in structXML:

  <resultset>
    <subject type="bibo:Document" uri="http://...">
      <predicate type="umbel:isAbout">
        <object type="iron:prefLabel">
          Default Document Name
        </object>
        <reify type="umbel:withLikelihood" value="0.87345872835434" />
      </predicate>
    </subject>
  </resultset>

So, basically, the reify element helps us to assert a fact about another fact (triple statement). In this sense, then, reification can be seen as a metadata assertion about the original statement.

Data consumers should thus parse the XML document in this following way:

If there is a <reify /> element within the body of a <object /> element, the data consumer must check the three parent nodes of the <reify /> element to compose the assertion fact about <subject, predicate, object> comprising the three nodes of the triple.

Unique Identifiers: URIs

Nearly all resources and their associated subject, predicate or object have a unique identifier called a URI. (Subjects and predicates must have a URI; objects most frequently do, but sometimes may optionally be assigned a literal.)

These URIs are unique to each resource. Since these IDs are unique, if a Web service A refers to a resource X and another Web service B also refers to a resource X, then both Web services A and B refers to the same thing. This understanding must hold true for the reason that atomic Web services can easily interact together to create compound Web services.

However, sometimes, the subjects or the objects of a resultset may not have a defined URI (the attribute). If such a case happens, the consumer of this Web service data must itself define a unique identifier for that thing.

Literals and Datatype Values

A literal is a special kind of object. Unlike any other object, a literal object can not be a subject of a predicate. (Technically, a resource could describe a literal, but the literal itself can't be described; but this fact is out of the scope of this document).

A literal object does not have a uri attribute.

Optionally, a literal object can have a type and/or a lang attribute.

A literal value (rdfs:Literal) can be further defined using any defined XSD type. We can say that a literal value is not only a literal, but more precisely an integer. Here is an example of such a typed literal:

<predicate type="foaf:age">
  <object type="xsd:int">34</object>
</predicate>

Here is the list of the most commonly used XSD datatypes:

xsd:anyURI
xsd:int
xsd:boolean
xsd:decimal
xsd:float
xsd:double
xsd:long
xsd:short
xsd:byte
xsd:hexBinary
xsd:base64Binary
xsd:dateTime
xsd:date
xsd:time

Additionally, a literal string can be defined using a language identifier. Such a language identifier is used to specify what human language has been used to write the string value. Here is an example of such a value:

<predicate type="foaf:note">
  <object type="rdfs:Literal" lang="en">language test</object>
</predicate>

With this example, we specify that the string "language test" has been written in English. The language tag used in the lang attribute are the ones suggested in the RFC 4646: the 2 charactes ISO639-1 language codes:

aa, ab, af, am, ar, as, ay, az, ba, be, bg, bh, bi, bn, bo, br, ca, co, cs, cy, da, de, dz, el, en, eo, es, et, eu, fa, fi, fj, fo, fr, fy, ga, gd, gl, gn, gu, ha, hi, hr, hu, hy, ia, ie, ik, in, is, it, iw, ja, ji, jw, ka, kk, kl, km, kn, ko, ks, ku, ky, la, ln, lo, lt, lv, mg, mi, mk, ml, mn, mo, mr, ms, mt, my, na, ne, nl, no, oc, om, or, pa, pl, ps, pt, qu, rm, rn, ro, ru, rw, sa, sd, sg, sh, si, sk, sl, sm, sn, so, sq, sr, ss, st, su, sv, sw, ta, te, tg, th, ti, tk, tl, tn, to, tr, ts, tt, tw, uk, ur, uz, vi, vo, wo, xh, yo, zh, zu

Flexibility of this XML Data Structure

This XML data structure is thus flexible enough to describe any relation within an RDF graph produced by a OSF Web Service.

The advantage of re-using the triple assertions of the RDF data model with types and URIs is that a data consumer can easily handle the data produced by any Web service, even without knowing the type of the subjects, predicates and objects returned by that Web service. The data consumer can always say: I have this thing that refers to this other thing with this given predicate. The data consumer can manipulate results in some ways even if it doesn't know much or anything about the types of those things.

This consistent abstraction is helpful since even if the Web services evolve and change over time, the data consumers of these Web services will be able to handle the things it knows, and only discard the new types that have been added that it may not know, all without having to change anything in the procedures that manage the resultsets returned by these Web services.