StructJSON

From OSF Wiki
Revision as of 03:17, 12 January 2014 by Mike (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction

structJSON is a straightforward RDF serialization in JSON format used for internal communications between OSF Web Services Web services, the JavaScript Semantic Components and OSF for Drupal; this is one of the core format used to transmit information between any Open Semantic Framework (OSF) component.

A structJSON file is composed of a resultset object which aggregates a series of subjects (records) that are defined with properties and values. Values can be "data" values such as literals or "object" values that are reference to other subjects.

structJSON is comparable to structXML, which uses the same structure but that is serialized in XML instead of JSON. Both are based on the OSF Web Service Internal Resultset Structure.

Features

The structJSON format support the following features:

  1. Description of subject records
  2. Each record have unique identifiers
  3. Each record have one or multiple types (belongs to one or multiple classes)
  4. Each record can be described with an unlimited number of data or object attributes
  5. Reification is supported on object attributes
  6. The value of data attributes can be defined with a type, or a lang tag.

Specification

"resultset": { ... }

The goal of any Web service is to return results. The root element of any OSF Web Services Web service is the element where all results in a given results document are nested.

Here is an example of a resultset that has a single subject but that has all the features outlined above:

{
  "prefixes": {
    "owl": "http://www.w3.org/2002/07/owl#",
    "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "iron": "http://purl.org/ontology/iron#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "wsf": "http://purl.org/ontology/wsf#",
    "foaf": "http://xmlns.com/foaf/0.1/",
    "ns0": "http://test.com#"
  },
  "resultset": {
    "subject": [
      {
        "uri": "http://dataset1.com/record-a/",
        "type": "foaf:Person",
        "predicate": [  
          {
            "rdfs:type": "http://umbel.org/umbel/rc/Person"
          },
          {
            "iron:prefLabel": "Bob"
          },
          {
            "iron:altLabel": "Robert"
          },
          {
            "iron:altLabel": "Rob"
          },
          {
            "iron:altLabel": "Ti Rob"
          },
          {
            "iron:description": "It is a good person"
          },
          {
            "foaf:note": {
              "value": "language test",
              "lang": "en"
            }
          },
          {
            "foaf:knows": {
              "uri": "http://dataset2.com/record-b/",
              "type": "foaf:Person",
              "reify": [
                {
                  "type": "iron:prefLabel",
                  "value": "Ginette"
                }
              ]
            }
          },
          {
            "foaf:img": "http://dataset1.com/imgs/bob.jpg"
          },
          {
            "foaf:img": "http://dataset1.com/imgs/bob2.jpg"
          },
          {
            "foaf:age": {
              "value": "34",
              "type": "xsd:int"
            }
          },
          {
            "foaf:age": {
              "value": "45",
              "type": "ns0:int"
            }
          },
          {
            "iron:prefURL": "http://ti-rob.com"
          }
        ]
      }
    ]
  }
}

"prefixes": { ... }

Prefixes are used to shorten URI references. The prefix objects within a resultset object are used to shorten all the URI references within the resultset. There is no obligation to shorten a URI reference (we refers to this action as prefixize a URI). Prefixized URIs can appear:

  1. In the type or the uri attribute of a subject object
  2. In the type attribute of a predicate object
  3. In the uri or the type attribute of an object object
  4. In the type attribute of a reify object

Each time a structJSON parser parse a structJSON document, it should try to unprefixize any values that appear in one of these attributes.

Prefixes are always introduced with a : character. If we have this prefix defined in a given resultset:

  "prefixes": {
    "foaf": "http://xmlns.com/foaf/0.1/",
  }

Then all the URIs that uses this namespace will be shortened using that prefix. This means that if we have a URI http://xmlns.com/foaf/0.1/Person, the prefixized equivalent string of this URI will be foaf:Person. Both are equivalent in that resultset, but a prefixized URI will be simpler to read for humans and will be shorter to transmit over the web.

"subject": [ {...}, {....}, ... ]

A "subject" (consistent with the understanding of subject within the standard subject-predicate-object RDF triple) is a record description returned by a web service endpoint for a given query.

A resultset is composed of one or multiple subject(s) depending on the Web service query. This means that the subject element represents the subject of a query to a Web service endpoint.

Each subject has a type and a uri attribute. The type of a subject can be seen as its kind. The URI of a subject is its unique identifier.

"subject": [
  {
    "uri": "http://dataset1.com/record-a/",
    "type": "foaf:Person"
  }
]

"predicate": [ {...}, {...}, ...]

A predicate is what describes a subject. A predicate can be used to refer a subject to another subject (in this case, we are talking about an "subject predicate" (which is equivalent to an object predicate in RDF)). A predicate can also be used to describe a subject using some literal strings.

Any subject has zero, one or multiple predicate(s) relationships with other objects.

Every predicate has a type attribute. The type of a predicate can be seen as the kind of relationship between two things (a subject and an object).

There are two families of predicates:

  1. datatype predicates
  2. subject predicates (which is equivalent to an object predicate in RDF)

The datatype predicates are the ones that refers a subjectto a literal value, or any other textual, types, values such as integers, dates, time, etc.

The object predicates are the ones that refers a subjectto another subject.

Here is an example of a datatype predicate:

"predicate": [  
  {
    "iron:prefLabel": "Bob"
  }
]

Here is an example of a subject predicate:

"predicate": [  
  {
    "foaf:knows": {
      "uri": "http://dataset2.com/record-b/",
      "type": "foaf:Person"
  }
]

Object: { ... }

Any predicate refers to one or multiple objects. An object can be a reference to another subject, or a literal value.

An object has a type and a possible uri attribute. The type of an object can be seen as its kind. The URI of an object is its unique identifier. It is optional if the object reference is a literal, such as a string name or a number.

A special kind of object exists: rdfs:Literal. The characteristics of this kind of object will be discussed in a special section below.

Here is an example of an object value which is a literal:

"predicate": [  
  {
    "iron:prefLabel": "Bob"
  }
]

Here is an example of an object value which is a uri (a reference to another subject):

"predicate": [  
  {
    "foaf:knows": {
      "uri": "http://dataset2.com/record-b/",
      "type": "foaf:Person"
  }
]

In these examples, the objects are introduced by the JSON object markup: { ... }

"reify": [ {...}, {...}, ... ]

Sometimes it is useful to be able to assert facts about a given triple statement <subject, predicate, object>. This is what reification is about.

Here is an example of a reification statement that says that preferred label that we should use for the subject referenced by the <http://dataset1.com/record-a/, foaf:knows, http://dataset2.com/record-b/> triple is Ginette:

{
  "foaf:knows": {
    "uri": "http://dataset2.com/record-b/",
    "type": "foaf:Person",
    "reify": [
      {
        "type": "iron:prefLabel",
        "value": "Ginette"
      }
    ]
  }
}

So, basically, the reify element helps us to assert a fact about another fact (triple statement). In this sense, then, reification can be seen as a metadata assertion about the original statement.

Unique Identifiers: URIs

Nearly all resources and their associated subject, predicate or object have a unique identifier called a URI. (Subjects and predicates must have a URI; objects most frequently do, but sometimes may optionally be assigned a literal.)

These URIs are unique to each resource. Since these IDs are unique, if a Web service A refers to a resource X and another Web service B also refers to a resource X, then both Web services A and B refers to the same thing. This understanding must hold true for the reason that atomic Web services can easily interact together to create compound Web services.

However, sometimes, the subjects or the objects of a resultset may not have a defined URI (the attribute). If such a case happens, the consumer of this Web service data must itself define a unique identifier for that thing.

Literals and Datatype Values

A literal is a special kind of object. Unlike any other object, a literal object can not be a subject of a predicate. (Technically, a resource could describe a literal, but the literal itself can't be described; but this fact is out of the scope of this document).

A literal object does not have a uri attribute.

Optionally, a literal object can have a type and/or a lang attribute.

A literal value (rdfs:Literal) can be further defined using any defined XSD type. We can say that a literal value is not only a literal, but more precisely an integer. Here is an example of such a typed literal:

{
  "foaf:age": {
    "value": "34",
    "type": "xsd:int"
  }        
}

Here is the list of the most commonly used XSD datatypes:

xsd:anyURI
xsd:int
xsd:boolean
xsd:decimal
xsd:float
xsd:double
xsd:long
xsd:short
xsd:byte
xsd:hexBinary
xsd:base64Binary
xsd:dateTime
xsd:date
xsd:time

Additionally, a literal string can be defined using a language identifier. Such a language identifier is used to specify what human language has been used to write the string value. Here is an example of such a value:

{
  "foaf:note": {
    "value": "language test",
    "lang": "en"
  }
}

With this example, we specify that the string "language test" has been written in English. The language tag used in the lang attribute are the ones suggested in the RFC 4646: the 2 charactes ISO639-1 language codes:

aa, ab, af, am, ar, as, ay, az, ba, be, bg, bh, bi, bn, bo, br, ca, co, cs, cy, da, de, dz, el, en, eo, es, et, eu, fa, fi, fj, fo, fr, fy, ga, gd, gl, gn, gu, ha, hi, hr, hu, hy, ia, ie, ik, in, is, it, iw, ja, ji, jw, ka, kk, kl, km, kn, ko, ks, ku, ky, la, ln, lo, lt, lv, mg, mi, mk, ml, mn, mo, mr, ms, mt, my, na, ne, nl, no, oc, om, or, pa, pl, ps, pt, qu, rm, rn, ro, ru, rw, sa, sd, sg, sh, si, sk, sl, sm, sn, so, sq, sr, ss, st, su, sv, sw, ta, te, tg, th, ti, tk, tl, tn, to, tr, ts, tt, tw, uk, ur, uz, vi, vo, wo, xh, yo, zh, zu

Flexibility of this JSON Data Structure

This JSON data structure is thus flexible enough to describe any relation within an RDF graph produced by a OSF Web Service.

The advantage of re-using the triple assertions of the RDF data model with types and URIs is that a data consumer can easily handle the data produced by any Web service, even without knowing the type of the subjects, predicates and objects returned by that Web service. The data consumer can always say: I have this thing that refers to this other thing with this given predicate. The data consumer can manipulate results in some ways even if it doesn't know much or anything about the types of those things.

This consistent abstraction is helpful since even if the Web services evolve and change over time, the data consumers of these Web services will be able to handle the things it knows, and only discard the new types that have been added that it may not know, all without having to change anything in the procedures that manage the resultsets returned by these Web services.