StructEDN

From OSF Wiki
Jump to: navigation, search

structEDN is a straightforward RDF serialization in EDN format used for internal communications between OSF Web Services and Clojure applications via the clj-osf project. This format is strongly based on the structJSON format, but it is optimized for Clojure.

A structEDN file is composed of a resultset object that aggregates a series of subjects (records), which are defined with properties and values. Values can be "data" values such as literals or "object" values that are reference to other subjects. This is the standard RDF construct.

structEDN is comparable with both structJSON and structXML, which uses the same structure but is serialized in XML instead of JSON. Both are based on the OSF Web Service Internal Resultset Structure.

Features

The structEDN format support the following features:

  1. Description of subject records
  2. Each record have unique identifiers
  3. Each record have one or multiple types (belongs to one or multiple classes)
  4. Each record can be described with an unlimited number of data or object attributes
  5. Reification is supported on object attributes
  6. The value of data attributes can be defined with a type, or a lang tag.

Specification

"resultset": { ... }

The goal of any Web service is to return results. The root element of any OSF Web Services Web service is the element where all results in a given results document are nested.

Here is an example of a resultset that has a single subject but that has all the features outlined above:

{:prefixes {:dcterms "http://purl.org/dc/terms/"
            :ns0 "http://purl.org/ontology/bibo#"
            :xsd "http://www.w3.org/2001/XMLSchema#"
            :owl "http://www.w3.org/2002/07/owl#"
            :rdfs "http://www.w3.org/2000/01/rdf-schema#"
            :iron "http://purl.org/ontology/iron#"
            :cognonto "http://purl.org/ontology/cognonto#"
            :wsf "http://purl.org/ontology/wsf#"
            :rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#"}
:resultset {
           :subject
           [{:uri "http://techcrunch.com/?p=1081212"
             :type "ns0:Article"
             :predicate
             {:iron:prefLabel ["Microsoft's ..."]
              :cognonto:url ["http://techcrunch.com/?p=1081212"]
              :cognonto:tag [{:uri "http://purl.org/ontology/bso#interactive-computing"
                              :reify [{:type "cognonto:weight"
                                       :value "0.04369007180887"}]}
                             {:uri "http://purl.org/ontology/bso#data-migration"
                              :reify [{:type "cognonto:weight"
                                       :value "0.034965652904319"}]}
                             {:uri "http://purl.org/ontology/bso#semantic-technologies"
                              :reify [{:type "cognonto:weight"
                                       :value "0.043697216645113"}]}
                             {:uri "http://purl.org/ontology/bso#machine-learning-algorithms"
                              :reify [{:type "cognonto:weight"
                                       :value "0.056243105409715"}]}
                             :cognonto:reviewed [{:value "0"
                                                  :type "xsd:integer"}]
               :cognonto:published [{:value "Tue Nov 11 08:05:49 EST 2014"
                                     :type "xsd:dateTime"}]
               :cognonto:inDomain [{:value "0"
                                    :type "xsd:integer"},
               :cognonto:content ["Microsofts New 199 Bundle Includes ... "]
               :dcterms:isPartOf [{:uri "http://bigstructure.org/datasets/articles/"}]}}]}

"prefixes": { ... }

Prefixes are used to shorten URI references for properties, types and values. The prefixes map within a resultset map is used to shorten all the URI references within the resultset. There is no obligation to shorten a URI reference (we refers to this action as prefixize a URI) of values. But there is an obligation to use them to create the keys of the properties of the records being described.

"subject": [ {...}, {....}, ... ]

A subject (consistent with the understanding of subject within the standard subject-predicate-object RDF triple) is a record description returned by a web service endpoint for a given query.

A resultset is composed of one or multiple subject(s) depending on the Web service query. This means that the subject element represents the subject of a query to a Web service endpoint.

Each subject has a type and a uri attribute. The type of a subject can be seen as its kind. The URI of a subject is its unique identifier.

{:subject [{:uri "http://techcrunch.com/?p=1081212"
            :type "ns0:Article"}]}

"predicate": [ {...}, {...}, ...]

A predicate is what describes a subject. A predicate can be used to refer a subject to another subject (in this case, we are talking about an "subject predicate" (which is equivalent to an object predicate in RDF)). A predicate can also be used to describe a subject using some literal strings.

Any subject has zero, one or multiple predicate(s) relationships with other objects.

Every predicate prefixed key refers to a property defined in the ontology referenced by the prefix.

There are two families of predicates:

  1. datatype predicates
  2. subject predicates (which is equivalent to an object predicate in RDF)

The datatype predicates are the ones that refers a subject to a literal value, or any other textual, types, values such as integers, dates, time, etc.

The object predicates are the ones that refers a subjectto another subject.

The value of a property is always a vector of values.

Here is an example of a datatype predicate:

{:predicate {:iron:prefLabel ["Microsoft's ..."]}}

Here is an example of a subject predicate:

{:predicate {:cognonto:tag [{:uri "http://purl.org/ontology/bso#interactive-computing"}]}}

Object: { ... }

Any predicate refers to one or multiple objects. An object can be a reference to another subject, or a literal value.

An object has a type and a possible uri attribute. The type of an object can be seen as its kind. The URI of an object is its unique identifier. It is optional if the object reference is a literal, such as a string name or a number.

A special kind of object exists: rdfs:Literal. The characteristics of this kind of object will be discussed in a special section below.

Here is an example of an object value which is a literal:

  {:predicate {:iron:prefLabel ["Microsoft's ..."]}}
  {:predicate { :cognonto:published [{:value "Tue Nov 11 08:05:49 EST 2014"
                                      :type "xsd:dateTime"}]}}

Here is an example of an object value which is a uri (a reference to another subject):

  {:predicate {:cognonto:tag [{:uri "http://purl.org/ontology/bso#interactive-computing"}]}}

In these examples, the objects are introduced by the JSON object markup: { ... }

"reify": [ {...}, {...}, ... ]

Sometimes it is useful to be able to assert facts about a given triple statement <subject, predicate, object>. This is what reification is about.

Here is an example of a reification statement that specify a weight for the tripple represented by the :cognonto:tag relationship (tripple):

  {:predicate {:cognonto:tag [{:uri "http://purl.org/ontology/bso#interactive-computing"
                               :reify [{:type "cognonto:weight"
                                        :value "0.04369007180887"}]}]}}

The reify element helps us to assert a fact about another fact (triple statement). In this sense, then, reification can be seen as a metadata assertion about the original statement.

Unique Identifiers: URIs

Nearly all resources and their associated subject, predicate or object have a unique identifier called a URI. (Subjects and predicates must have a URI; objects most frequently do, but sometimes may optionally be assigned a literal.)

These URIs are unique to each resource. Since these IDs are unique, if a Web service A refers to a resource X and another Web service B also refers to a resource X, then both Web services A and B refers to the same thing. This understanding must hold true for the reason that atomic Web services can easily interact together to create compound Web services.

However, sometimes, the subjects or the objects of a resultset may not have a defined URI (the attribute). If such a case happens, the consumer of this Web service data must itself define a unique identifier for that thing.

Literals and Datatype Values

A literal is a special kind of object. Unlike any other object, a literal object can not be a subject of a predicate. (Technically, a resource could describe a literal, but the literal itself can't be described; but this fact is out of the scope of this document).

A literal object does not have a uri attribute.

Optionally, a literal object can have a type and/or a lang attribute.

A literal value (rdfs:Literal) can be further defined using any defined XSD type. We can say that a literal value is not only a literal, but more precisely an integer. Here is an example of such a typed literal:

  {:predicate {:cognonto:published [{:value "Tue Nov 11 08:05:49 EST 2014"
                                     :type "xsd:dateTime"}]}}

Here is the list of the most commonly used XSD datatypes:

xsd:anyURI
xsd:integer
xsd:int
xsd:boolean
xsd:decimal
xsd:float
xsd:double
xsd:long
xsd:short
xsd:byte
xsd:hexBinary
xsd:base64Binary
xsd:dateTime
xsd:date
xsd:time

Additionally, a literal string can be defined using a language identifier. Such a language identifier is used to specify what human language has been used to write the string value. Here is an example of such a value:

  {:predicate {:iron:prefLabel [{:value: "Microsoft's ..."
                                 :lang :en}]}}

With this example, we specify that the string "language test" has been written in English. The language tag used in the lang attribute are the ones suggested in the RFC 4646: the 2 charactes ISO639-1 language codes:

aa, ab, af, am, ar, as, ay, az, ba, be, bg, bh, bi, bn, bo, br, ca, co, cs, cy, da, de, dz, el, en, eo, es, et, eu, fa, fi, fj, fo, fr, fy, ga, gd, gl, gn, gu, ha, hi, hr, hu, hy, ia, ie, ik, in, is, it, iw, ja, ji, jw, ka, kk, kl, km, kn, ko, ks, ku, ky, la, ln, lo, lt, lv, mg, mi, mk, ml, mn, mo, mr, ms, mt, my, na, ne, nl, no, oc, om, or, pa, pl, ps, pt, qu, rm, rn, ro, ru, rw, sa, sd, sg, sh, si, sk, sl, sm, sn, so, sq, sr, ss, st, su, sv, sw, ta, te, tg, th, ti, tk, tl, tn, to, tr, ts, tt, tw, uk, ur, uz, vi, vo, wo, xh, yo, zh, zu

Flexibility of this JSON Data Structure

This EDN data structure is thus flexible enough to describe any relation within an RDF graph produced by a OSF Web Service.

The advantage of re-using the triple assertions of the RDF data model with types and URIs is that a data consumer can easily handle the data produced by any Web service, even without knowing the type of the subjects, predicates and objects returned by that Web service. The data consumer can always say: I have this thing that refers to this other thing with this given predicate. The data consumer can manipulate results in some ways even if it doesn't know much or anything about the types of those things.

This consistent abstraction is helpful since even if the Web services evolve and change over time, the data consumers of these Web services will be able to handle the things it knows, and only discard the new types that have been added that it may not know, all without having to change anything in the procedures that manage the resultsets returned by these Web services.