Instance Record and Object Notation (irON) Specification

Specification Document - 9 November 2011


 * Latest version:
 * http://openstructs.org/iron/iron-specification[[Image:iron_logo_235.png|right|irON Logo]]


 * Last update:
 * $Date: 2011/11/09 12:32:43 $


 * Revision:
 * Revision: 0.91


 * Editors:
 * Frédérick Giasson - Structured Dynamics
 * Michael Bergman - Structured Dynamics


 * Authors:
 * Michael Bergman - Structured Dynamics
 * Frédérick Giasson - Structured Dynamics

Copyright © 2009-2010 by Structured Dynamics LLC.

This irON: instance record and Object Notation by Structured Dynamics LLC is licensed under a Creative Commons Attribution 3.0 license. irON's parsers or converters are separately available under the Apache License, Version 2.0.

= Abstract =

irON (instance record and Object Notation) is a abstract notation and associated vocabulary for specifying RDF triples and schema in non-RDF forms. Its purpose is to allow users and tools in non-RDF formats to stage interoperable datasets using RDF. The notation supports writing RDF and schema in JSON (irJSON), XML (irXML) and comma-delimited (CSV) formats (commON). The notation specification includes guidance for creating instance records (including in bulk), linkages to existing ontologies and schema, and schema definitions. Profiles and examples are also provided for each of the irXML, irJSON and commON serializations.

= Status of This Document =

''NOTE: This section describes the status of this document at the time of its publication. Other documents may supersede this document.''

This specification is an evolving document. Via its code and vocabulary release site, the authors welcome suggestions on the irON notation or its various serializations, including irXML, irJSON and commON. Users and developers are also welcomed to participate in the Google discussion group for the irON notation. The current specification is also available in download as a PDF.

This document may be updated or added to based on implementation experience, but no commitment is made by the authors regarding future updates.

= BACKGROUND AND OVERVIEW =

This section provides background information on the specification and this document.

Purpose
irON (instance record and Object Notation) is a abstract notation and associated vocabulary for specifying RDF triples and schema in non-RDF forms. Its purpose is to allow users and tools in non-RDF formats to stage interoperable datasets using RDF. The notation supports writing RDF and schema in JSON (irJSON), XML (irXML) and comma-delimited (CSV) formats (commON). The notation specification includes guidance for creating instance records (including in bulk), linkages to existing ontologies and schema, and schema definitions. Profiles and examples are also provided for each of the irXML, irJSON and commON serializations.

irON is premised on these considerations and observations:


 * RDF (Resource Description Framework) is a powerful canonical data model for data interoperability
 * However, most existing data is not written in RDF and many authors and publishers prefer other formats for various reasons
 * Many formats that are easier to author and read than RDF are variants of the attribute-value pair construct, which can readily be expressed as RDF, and
 * A common abstract notation for converting to RDF would also enable non-RDF formats to become somewhat interchangeable, thus allowing the strengths of each to be combined.

The irON notation and vocabulary is designed to allow the conceptual structure ("schema") of datasets to be described, to facilitate easy description of the instance records that populate those datasets, and to link different structures for different schema to one another. In these manners, more-or-less complete RDF data structures and instances can be described in alternate formats and be made interoperable. irON provides a simple and naïve information exchange notation expressive enough to describe most any data entity.

The notation also provides a framework for extending existing schema. This means that irON and its various serializations can represent many existing, common data formats and standards, while also providing a vehicle for extending them.

For different reasons and for different audiences, the formats of XML, JSON and CSV (spreadsheets) were chosen as the representative formats across which to formulate the abstract irON notation. Further rationale for these choices is discussed under their respective profiles below.

The abstract irON notation is written in a pseudo-XML syntax. Specific syntax examples for each of the three irON serializations are also provided in the code example sections for each format profile.

Terminology of this Document
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC 2119].

Namespace URIs of the general form "http://www.example.com/." represents some application-dependent or context-dependent URI as defined in RFC 2396 [RFC 2396].

As used herein, the name and concept of attribute is used interchangeable with property. Both of these are equivalent to a predicate in an RDF triple. It is recommended that parsers for the various irON serializations recognize both terms (and their variants) interchangeably.

irON Concepts and Vocabulary
irON is based on a set of concepts that provide the common language of this specification:


 * Attribute - each record (and other instances) is characterized by one or more attributes, which provide descriptive characteristics for that record. Every attribute is matched with a value, which can range from descriptive text strings to lists or numeric values. These values optionally have specified formats
 * Type - "type" is a frequent term in many different language specifications. Here, type denotes a class of "things" that is a frequent container (or set) for classifying and describing things in relation to other things. Multiple entities or individuals may be members of a given type, such as Person or Book
 * Record - a structured format for describing the attribute(s) and associated metadata and identifiers for a single bibliographic entity; a record is often defined with a standard structure or scope for conveying the description of multiple records
 * Dataset - a set of one or more similar records that describe similar "things". The dataset is the normal unit ("container") of data exchange between different publishers and different consumers
 * Metadata - dataset-level or record-level metadata that describes broad characteristics of the data itself, such as creation date or author, version number, etc.
 * Schema (structure) - the conceptual relationships between types and for relating attributes, and
 * Linkage (schema) - the specified mapping to relate the nature of class types and attributes in a given dataset (set of records) to other schema or structures.

This section describes these concepts in some detail.

In addition, irON has a set of reserved keywords that comprise its standard irON vocabulary. This vocabulary is composed of the minimal set of attributes used to describe “any” instance record in the world. There is a core set of attributes shared amongst all instance records. Some of these core attributes have been introduced for some specific purposes such as: user interface display, instance record identification and accessibility, instance record data maintenance, etc.

The Attribute Concept
Recall that irON has an attribute:value pair orientation. Any argument that appears in the first part of this pair is an attribute. In other various systems and languages, an "attribute" may also be known as a property, predicate, field, feature, parameter, dimension, characteristic or independent variable. As used herein, the name and concept of attribute is used interchangeable with property and field. These are equivalent to a predicate in an RDF triple.

An attribute in irON is the human-readable description for what its following value means and how it is to be interpreted. Some attribute keywords are classificatory in nature (in which case they are called types), some attributes are processing in nature (signalling a new module and specification to the parser), and some are designed to describe the nature of the instance, either for records or other objects.

Each attribute may be declared with an allowedType. An allowedType must be an Object or its various sub-types. An Object or its sub-types may be: 1) the general designation of Object itself (separate from a literal or string; see below); 2) a reference to a given class or set separately provided via the type declaration; or, 3) a keyword of dataset, schema or Linkage that signals a particular processing module to the parser.

The use of allowedType tells the irON validator whether an incoming record is valid or not according to its schema specification. (Note the declaration is optional for irON schema extensions; not providing the declaration simply means the parser is unable to check the proper domain of the instance.) It is the way that only properly specified data and records are accepted by the system.

Thus, attributes can be used in many areas of irON. However, their principle use is to provide the descriptive fields for characterizing records.

The Type or Value Format Concepts
The second part of the attribute:value pair orientation is the value. Depending on the kind of attribute that is declared in a irON schema, only certain kinds of values may be accepted for the value argument.

For most record descriptors, the value format for the value is a string (which can be further qualified as to the acceptable format of that string). In other cases, only specified values may be allowed.

When the attribute is type it denotes a class of "things"; that is, a container (or set) for classifying and describing things in relation to other things. Types can be assigned as descriptors for records. Multiple entities or individuals may be members of a given type, such as Person or Book.

By convention, types in irON are initial capitalized, such as Person or Book.

irON provides the allowedValue attribute to enforce proper type assignments or to ensure correct string formatting. As with allowedType (see above), these declarations are provided for the standard irON vocabulary, but are optional for extensions.

The Record Concept
The aim of irON is to describe records. A record is the main concept in the irON notation. A record is simply a means to represent and convey the information (”attributes”) describing a given instance. An instance is the thing at hand. These instances may be individuals (such as a given person or book) or may represent groupings of things (such as the entire holdings or collection of books in a given library).

A record may convey information about multiple instances, but each block of information ("record") for each instance should only pertain to that instance. Thus, for example, if the instance is a paper citation, the instance is the paper. If that paper instance asserts multiple authors, each with different institutional affiliations, those are attributes of the authors, not of the paper.

When you find you are attempting to describe multiple entities in a given record, one option is to create a separate record for each entity and assert their relationships. But this best practice may not always be possible or desired. As alternatives, irON also provides facilities to assign metadata in separate files (see tge soecuak metaFile attribute below) or to annotate primary attributes with their own explanatory metadata (see Augmenting Attributes with Metadata below).

When multiple records are being conveyed, they are treated as an array object and designated with the recordList keyword in any serialization profile.array.

The Dataset Concept
Records do not exist in a vacuum. The dataset provides a container object for records that allows additional information (the metadata) about records in the aggregate. Dataset information is not limited to the descriptions or attributes within the records themselves. A dataset can also be used to describe information about the creation of instances records, and to link external resources to them (like the schema structures and linkages discussed below).

A dataset can be seen as an aggregation of records used to keep a reference between the records and their source (provenance). A dataset can be split into multiple dataset segments. Each segment is written to a file serialized in some way. Each segment of a dataset shares the same  of the dataset.

A dataset is not a database, though a database can be a dataset. As a conveyance of similar records (that is, describing the same basic "things"), datasets have a consistency of scope. A circumstance where there are heterogeneous records for quite disparate "things" would suggest creating multiple datasets to homogenize those representations.

The Schema Concept
The schema structure is used to describe the structural relationships amongst the class types and attributes used to describe records. This schema aims to create basic taxonomies of types and attributes that can be used as a simple graph or network structure to perform simple reasoning over the record instances. The schema structure is also used to define any structural features of a dataset: the class types and formats of the attributes used to describe the records of the dataset.

The Linkage Concept
The schema linkage object is a specification that links the types and attributes used to describe records to types and attributes of other formats and languages used to describe data. The linkage schema leads to transformation rules to convert records in other formats. A set of special attributes has been created to define a schema linkage. Another keyword used for this linkages is the mapTo attribute.

Relation to RDF
The pivotal premise of irON is the desirability of using the RDF data model as the canonical basis for interoperable data. RDF provides a data model capable of representing any extant data structure and any extant data format. This flexibility makes RDF a perfect data model for federating across disparate data sources.

RDF is a data model that is expressed as simple subject-predicate-object “triples”. A triple is also known as a “statement” and is the basic “fact” or asserted unit of knowledge in RDF. Multiple statements get combined together by matching the subjects or objects as “nodes” to one another, with the predicates acting as connectors or “edges” between those nodes. As these node-edge-node triple statements get aggregated, a network structure emerges, known as the RDF graph. When these connections are coherent, the graph becomes a conceptual and schematic representation of the domain at hand and its relationships, and can be reasoned over and have other useful analysis done to it.

In irON, basic instance data is represented as simple attribute-value pairs where the subject is the instance itself, the predicate is the attribute, and the object is the value. Such instance records are also known as the ABox. The structural relationships within RDF are defined in ontologies, also known as the TBox, which are basically equivalent to a schema. RDF vocabularies and schema guide how participating data can be represented and built up into more complex structures and conceptual world views.

The simple design of irON is in keeping with the limited roles and work associated with an ABox. Only attributes and metadata for an instance are being asserted. Conceptual relationships are dealt with separately via the Schema object (see the Structure Schema Object below). Specialized work, such as checking data validity, can be applied against these instance records, but is external to this specification.

The focus of irON, then, is the conveyance of these instance records (ABox) (though there are some limited provisions for communicating the TBox conceptual relationships and linkages to them; see below). The ability of irON to act as it does as an abstract notation across multiple, non-RDF data forms is based on this clean understanding of the roles of the ABox and TBox.

Role and Choice of the Three Profiles
RDF is not yet a common data model. And, in any case, RDF can be serialized with a number of formats such as XML, N3, N-triples, Turtle, or RDFa. However, despite these serialization options, and no matter the format, these RDF variants still are presented and organized around the "triples" construct of subject - predicate - object.

There are much more common data formats in the wild. In order to derive a properly inclusive abstract notation, then, it is important to select a number of these leading formats and to generalize around them. The derivation of the irON abstract notation and vocabulary is thus based on three leading data formats with a diversity of purposes, applications and user bases.

The first serialization selected is XML, or eXtensible Markup Language. XML has become the leading data exchange format and syntax for modern applications. It is frequently adopted by industry groups for standards and standard exchange formats. There is a rich diversity of tools that support the language, importantly including capable parsers and query languages. There is also a serialization of RDF in XML. As implemented in the irON notation, we call this serialization irXML.

The second serialization selected is JSON, JavaScript Object Notation. JSON has become very popular as a Web 2.0 data exchange format and is often the format of choice to drive JavaScript applications. There is a growing richness of tools that support JSON, including support from leading Web and general scripting languages such as JavaScript, Python, Perl, Ruby and PHP. JSON is relatively easy to read, and is also now growing in popularity with lightweight databases, such as CouchDB. As implemented in the irON notation, we call this serialization irJSON.

The third serialization is CSV, or comma-separated values. In existence for decades, but made famous by Microsoft as a spreadsheet exchange format, CSV is very useful since spreadsheets can be used as authoring front-ends and applications to the creation of datasets. CSV is less expressive and capable as a date format than the other irON serializations, yet still has a key-value pair orientation. And, via spreadsheets, datasets can be easily authored and inspected, while also providing a rich tools environment including sorting, formatting, data validation, calculations, etc. As implemented in the irON notation, we call this CSV serialization commON.

The following diagram shows how these three formats relate to irON and then the canonical RDF target data model:



We have used the unique differences amongst XML, JSON and CSV to guide the embracing abstract notation within irON. This design makes RDF the canonical choice for driving all internal tools and services. Via transforms from external forms (and vice versa) RDF becomes the data lingua franca at the core of data interoperability systems.

Once all external data is converted into RDF, this internal representation can then be used for reverse transforms into the original form, a process known as "round-tripping". However, because RDF is the more capable data model, some internal RDF capabilities can not be transformed into these external formats. However, it is possible to transform the exact external input data back to its original form.

After the definition of the abstract irON notation itself, each of these three serializations — irXML, irJSON, and commON — is discussed under its own profile with examples below.

= THE INSTANCE RECORD AND OBJECT NOTATION =

irON (instance record and object notation) is an abstract notation and vocabulary for specifying datasets and instance records that can be converted to RDF under various serializations. The serializations themselves contain the syntax and necessary conventions for that specific format.

In general, each irON serialization — currently available as XML (irXML), JSON (irJSON) and comma-delimited CSV form (commON) — includes most of the abstract notations and vocabulary within irON. However, because of format-specific differences, there are portions of the notation that are not available to a specific serialization profile. These are noted in the individual profile descriptions.

Introduction
There are a number of "objects" or sections possible within an irON specification. These include: datasets; records; attributes; classes or types (types); a (structure) schema; and a schema linkage. Each of these is described below with abstract examples.

The Three Profiles of irXML, irJSON and commON
The abstract irON notation is expressed in three different serializations — irXML, irJSON and commON. Each has a different audience and often slightly different purposes.

Because of these differences, not all generic capabilities of irON nor all of its vocabulary may be applied to each serialization. This section and the following one on Vocabulary and Reserved Keywords describe these differences.

Further, after completion of the discussion of the abstract irON notation, major sub-parts follow that describe each of the three serializations in detail and present code and syntax examples.

Modules or Sections
irON or its serialization specifications may occur in a number of modules, or sections. All are optional. Depending on the serialization, these modules or sections may also be provided or not in separate files.

The module components in irON are:


 * datasets — these are the controlling structures. They have some core attributes that define their linkages to the governing schema and may optionally include metadata or links to a metadata file (via the metaFile attribute) that more generally describes the dataset. Most importantly, the dataset is the wrapper for instance records. When datasets are provided with an already used identifier (id attribute), the dataset acts as a "slice", which could represent new incremental record additions to a previously defined dataset
 * records — instance records are the main vehicle for transmitting actual data. A recordList contains one or more records, each of which is described by a few or many attributes. Some of these attributes are reserved, but most can be freeform and define any useful data characteristic
 * schema — this is the most structured of the modules, and has the capability to describe both the relationships amongst major concepts (classes, called types herein) in the structure and the attributes that help describe those classes and the instances that populate them. The schema provides separate means to describe the overall schema structure (including outline or hierarchical or taxonomic relationships and equivalences and linkages) and defining the types and formats for attributes
 * linkage — this "bridging" module provides the vocabulary for mapping dataset attributes and class types to one or more (internal or external) structural schema. Such structural schemas could be OWL ontologies, a relational database schema, or any other vocabularies used by other systems to describe and exchange data
 * options — this module (not shown in the diagram) is for converter or parser instructions, and is not directly related to the actual data or values of a dataset or instance records.

The relationship between these modules is as follows:



In its current version, the irXML and irJSON serializations support all of these modules. The commON serialization does not at present support the schema module. In commON, one of the other serializations or RDF is necessary at this time to provide a structural schema definition.

Thus, here is the module coverage for the three irON formats:

Also note the xxxList entries in the table above. In irJSON, these entries are actually an array object. In irXML and commON, they are unordered lists processed in sequence until the listing ends.

Files and MIME Types
The MIME types for these serializations are,  , or  , respectively, for irXML, irJSON or commON.

Both irXML and irJSON can be packaged into single or multiple files, with keywords and conventions (see below) signaling the various modules. In commON, at present, all specifications must occur in a single file. For all three serializations, the file type should match the standard extension for that serialization. Namely, that is:


 * - irXML
 * - irJSON
 * - commON.

The Special metaFile Attribute
In addition to these modules, there is also a special metaFile attribute that enables descriptions such as creation specifics, author, etc., to be provided in a separate file. This facility can be helpful when multiple datasets or records re-use identical metadata descriptions. The value of the metaFile attribute must be a fully specified file reference.

Of course, such metadata may also be embedded directly in a dataset or instance record, avoiding the requirement for a separate file.

Vocabulary and Reserved Keywords
irON has a limited vocabulary. Each of the terms in this vocabulary is reserved from general use.

Standard, Reserved Vocabulary
The standard irON vocabulary consists of the following terms. Each term and its use is explained in later sections.

Avoid using any of these vocabulary or attribute names except for the purposes outlined herein.

Primitives
irON has two basic constructs for its assigned values: primitives and types. This section provides the primitive vocabulary.

Primitives are the most basic data structures of irON. In JSON Schema, we are referring to them as formats and in XML we are referring to them as DataTypes. These are the strings and integers of this world. At their cores, primitives are represented as a literal: a sequence of characters composing a whole. Each primitive is a value is a rule, or a set of rules patterns that tells how the primitive value should be formatted, presented or described. The only unrestricted literal is called the String primitive.

Primitives are different things depending on the irON serialization profile. For irJSON  with its JSON serialization, primitives are referred to as format. (In the XML serialization of irON the similar concept is referred to as datatypes.) In JSON, primitives can be be validated by using JSON Schemas, and in XML value formats can be validated using XML Schemas. Primitives are thus a general concept applicable to most common data serializations.

Note that the list of primitives can be extended by specialized BibJSON parsers.

Types
irON also has types of records. Each record has at least one type.

General Object Types
The more general type is the Object type. All other record types are sub-types of the Object type. The type of a record is asserted using the type attribute, or by the reserved processing keywords of dataset, schema or linkage.

Except for the three reserved sub-types, the predominate use of type is for classifying entity records via the type declaration. Inference can be performed on the hierarchy of types, which are themselves declared in the schema via the subTypeOf attribute (among other structural properties). This means that if we have a Magazine sub-type of Periodical that is a sub-type of Collection, then we can infer that a record of type Magazine is also a record of types Periodical or Collection.

Even if not defined in any schema, by convention all types are subTypeOf the Object root type.

Note the matrix also indicates whether the term is required or suggested (if not recommended, it is optional), what major module the term may belong to (shaded section), and whether the term applies to one of the three serializations:

Most all vocabulary applies to the irXML and irJSON serializations. The commON specification does not include vocabulary related to the (structure) schema and the metaFile attribute.

Another intent of the specification is to be sparse in terms of requirements. For instance, this reserved vocabulary is fairly minimal and optional in most all cases. The irON specification supports skeletal submissions.

Other Reserved Terms
Though not strictly prevented, it is best practice to avoid vocabulary in standard use for a given serialization. It is best to avoid target RDF or related standard vocabulary such as sameAs, seeAlso or equivalentClass, for example.

While the irON and related processors will accept these terms, they can prove problematic after ingest and conversion.

A Note on User Interface Attributes
As a general aid to user interfaces, we recommend some standard (shared amongst all datasets and instance records) attributes to describe datasets and instance records. These attributes have primary been introduced to help user interface systems to display, search, etc., these instance records.

Local or Global References
To make irON specifications easier to read, this specification allows either local or global references to the locations of the cross-referenced objects. This section explains these conventions and how, when used, the references are resolved to their full address. The local ID of a record is local to its dataset. The global ID of a record is a URI. Each global ID should be resolvable on the specified network.

Reference Resolution
We have two kind of ids:


 * 1) ID of a Dataset
 * 2) ID of an instance record

The ID of an instance record is a partial ID local to the dataset. The ID of the dataset is the base ID used to create a complete reference ID of the instance records of the dataset. A full ID is created by concatenating the (base) ID of a dataset with the ID of an instance record. The full ID created has to be a valid URI.

If the value of an attribute starts with "@", it means that the value is a reference to a local ID. If the value os an attribute starts with "@@", it means that the value of the attribute refers to a global ID (URI). If the value refers to a local ID, it means that the record it references is in the same dataset. If the value refers to a global ID, it means that this instance record can be local, or remote. If it is remote, its representation should be resolvable on the network specified by the URI. Here is the figure describing this resolution mechanism:



Dataset Object
A Dataset is used to document information about the creation of instances records, and to link external resources to them (like the linkage and structure schemas; more about this below).

A Dataset can be seen as an aggregation of instance records used to keep a reference between the instance records and their source (provenance). A dataset can be split into multiple dataset slices. Each slice can be written in a separate file. Each slice of a dataset shares the same  of the dataset.

The dataset attribute introduces the Dataset object. This object is composed of multiple "string": "value" references. Each string refers to an attribute. Each value can be a string, an array of strings, or an object. The meaning and usage of each attribute is described below.

Core Dataset Attributes
These are the core attributes recommended to be included with any dataset or dataset slice specification.

Abstract Dataset Specification Example
Here is an example of an abstract dataset specification, with additional attributes beyond the core.

Adding Metadata
The metaFile attribute refers to a separate instance record file. (Alternatively, the same attributes used for metadata be embedded in the dataset specification itself.) This dataset and slice design allows:


 * 1) Reuse of the metadata (and its file) if desired
 * 2) de minimis specification for a dataset slice as opposed to a fully specified dataset
 * 3) A means for separately tracking dataset slice metadata and provenance from the metadata of the dataset itself
 * 4) A simple dataset "wrapper" for dataset slices for adding or updating records to a dataset, and
 * 5) A flexible and expandable metadata framework that piggybacks on the structure of an instance record.

Suggested Metadata Attributes
Note these attributes follow the general instance record object specification (see below), and may contain any arbitrary attributeName attributes as desired. Alternatively, as noted, these same attributes and values may be embedded within the dataset specification above.

Instance Record Object
The recordList attribute refers to an array (irJSON) or unordered listing (irXML and commON) of instance records. An individual record is denoted by the attribute record.

Note: The names of the attributes to be used in the instance records specification must be equivalent to the keywords shown unless otherwise indicated.

Abstract Instance Record Specification Example
In pseudo-form, here is the set of core instance record attributes:

Multiple  objects can be wrapped in the   attribute.

Structure Schema Object
The structure schema is used to describe the structural relationships amongst the class types and attributes used to describe instance records. This schema aims to create basic taxonomies of types and attributes that can be used as a simple TBox to perform simple reasoning over the instance record instances. The structure schema is also used to define any structural features of a dataset: the class types and formats of the attributes used to describe the instance records of the dataset.

Note 1: The names of the attributes to be used in the schema specification must be equivalent to the keywords shown unless otherwise indicated.

Note 2: To enable the use of more complex TBoxes for the instances records that have been described, the schema linkage has to be used to link the types and attributes of the instance records to the types and attributes of the more complex TBox format/language.

Note 3: At this time, the structure schema object is NOT available for the commON serialization. commON records can still be linked to schema (see next section), but the schema specification must occur in a non-commON manner.

Structure Schema Attributes
Note 1: If a type is not specified the type is assumed as "any" and no validation can be performed against its values.

Note 2: The structure Schema is also used by the system to list all the class types and attributes used to describe instance records from a particular dataset.

Schema Linkage Object
The schema linkage object is a new kind of specification that links the types and attributes used to describe instance records to types and attributes of other formats and languages used to describe data. The linkage schema leads to transformation rules to convert instance records in other formats. A set of special attributes has been created to define a schema linkage as described in the table below.

Note: The names of the attributes to be used in the schema specification must be equivalent to the keywords shown unless otherwise indicated.

Linking a Dataset to a Structure Schema
There are two ways to link a dataset to a Linkage or structure Schema:


 * 1) By using the linkage or schema to link a dataset to the description of these schemas.
 * 2) By embedding the linkage or schema in specification.

Note: the list of "linkage" means that more than one linkage can be defined for a dataset. The goal is to enable data descriptors to be able to define schema linkages for multiple formats and languages if needed.

Processing Options
For the commON serialization only, some may prefer to present and manipulate their information with slightly different options for conventions. These processing choices are invoked via the options keyword. The current ones availalble in irON are the following:

The  attribute sets what the delimiter is for a list of values for a single attribute. The default value is the pipe characters ('|'), though these characters are also possible: comma (','), semi-colon (';'), and squiggle ('~'). The data publisher has to specify a value for the  attribute to tell the commON processor engine how to escape/unescape the list separator character.

The  attribute is a Boolean (yes, no) value. If set to yes, a value is added to the instance record listings that enables all values to be sorted in offline applications (especially spreadsheets for the commON serialization).

There are also a couple of instance record styles that are acceptable for commON, as described under its Profile below.

Augmenting Attributes with Metadata
The irON notation describes instance records using attribute/value key pairs. As we noted above, these can be mapped to subject-predicate-object triples in RDF since the subject is implied by the instance record itself. This means that all instance records are described using attribute/value statements. However, there are some cases where we could want to state something about these statements. These statements about statements are a form of metadata (which is called "reification" in the RDF realm .)

A reification statement is a statement about a statement. Generally, you can view a reification statement as being some kind of "meta" information about statements. However, since the "reification" terminology is a bit unusual for most people, we use "metadata" as a more understandable substitute.

Metadata can be used in multiple use-cases. It can be used to annotate information (information described using statements). It can be used to add specific information about a statement such as the date when it has been stated, the creator of the statement, etc. It can be used to describe information about how a specific statement should be rendered in some user interface. And, so forth.

One main metadata usecase for irON is its usage to specify how specific statements should be rendered in some user interfaces as described in the Note on User Interface Attributes section.

Description of Use Case
Another use case can be shown where the instance, for example, is a paper citation. This paper citation has multiple authors. Though the instance is about the paper and not its authors, we may also want to state the institutional affiliation of each of the paper's multiple authors. We do this via the metadata ("reification") convention in irON.

Metadata Example
For the example above, we have the abstract irON notation of a triple id-affiliated-ref where id is the ID of the subject record, and ref is a reference to the object record. Both records are related by the affiliated attribute (or relation). The  and   attributes are reification statements or metadata about this triple statement (that is, they follow immediately after the ref reference).

If we take as the example that Bob-affiliated-SomeUniversity then the  could be the name of this affiliation to display in some user interface (Web page), and the   could be a reference to a Web page that talks about this affiliation between Bob and http://someUniversity.edu.

This metadata that further describes the primary attributes (affiliated, in this case) with additional information can be an easy shorthand and provide immediately useful information to user interfaces (for example). This irON convention provides a mechanism, in essence, for expanding the depth and richness of the instance characterizations.

Limitations to Reified Metadata
Unfortunately, you are not always assured that systems ingesting such reification statements will actually recognize them or, more often, store them persistently. Further, once one begins describing metadata about primary attributes, the temptation is to nest those characterizations even further. If we can describe the author's institution, why not the city that institution is located in or whether it is public or private?

For these reasons, it may be safer (though more cumbersome) to devote a separate instance record to each author and more fully describe the author there. Still, when there is confidence in the processing application, it can be quite efficient to use the irON metadata shorthand.

The syntax for the irON attribute metadata shorthand differs by serialization.

Guidelines for Dataset Scoping
The irON notation is not applicable to all datasets nor all circumstances. However, there are a couple of guidelines that can extend the applicability and usefulness of this notation.

First, try to limit your datasets to relatively similar "things". If your problem domain at hand involves much data and relationships, try to segregate or cluster multiple contributing datasets according to the similarity of the instances (thus, people v products v organizations v localities v events, etc.).

Second, and related, try to scope each record within a dataset to the instance itself. References to external things or entities are fine and work great, but try to define the attributes of those external things into their own datasets.

In these manners you can keep attributes listings bounded and manageable. You will also keep datasets more understandable and without requiring massively dimensioned tables or structure. These considerations will also help bound the ability to create input templates with validation and controlled vocabularies based on existing applications.

= SUB-PART 1: irXML PROFILE =

This sub-part of the irON specification describes the eXtensible Markup Language (XML) serialization, irXML.

Role and Use
The purpose of irXML is to provide a standard syntax and interchange format for the irON notation. Based on the eXtensible Markup Language (XML), irXML provides a syntax and serialization well understood by most enterprise developers.

Via the shared irON notation, irXML also offers a pathway for moving appropriate XML data structures into either RDF, JSON or CVS. As such, irXML is likely less an authoring environment as a notation for cross-format conversions.

The specifications provided herein can be used in separate, modular ways in multiple files, or combined. The linkage and structure schema provide useful flexibility and extensibility to the basic XML notation.

The MIME type for irXML is ; files written in it should have the   extension.

No Current Processor
Unlike irJSON and commON, which have available parsers and processors, irXML has not yet been committed to code. As a result, its specification, while useful from an understanding and educational viewpoint, has not yet been tested with applications. This testing will likely result in changes.

We invite knowledgeable XML developers to tackle a conversion. The editors would be pleased to provide assistance to any group that wishes to incorporate this option.

Differences from Generic irON
The entire vocabulary and set of modules and objects in irON are available and used by irXML. For all  attributes, irXML treats them as unordered lists, rather than arrays as in irJSON.

As a result, attributes for individual items must be used in irXML. These two specific irON attributes are attribute and record.

The options keyword is not used in irXML, nor its three attributes of,  , and   (it is advisable to avoid use of these terms so that conversions to commON are not confused).

Augmenting Attributes with Metadata
The method for adding metadata to primary instance attributes is in line with the standard irON notation:

Altered Keyword Set
irXML has these vocabulary differences from the standard irON vocabulary:

It is recommended not to use the Not Used terms in a irXML specification as they might pose conflicts with other irON serializations. In addition, the meaning of format in irXML differs from irJSON in that the reference structure is the data types in XML Schema (XSD).

Summary of Conventions

 * 1) irXML strictly conforms to the XML syntax
 * 2) irON vocabulary keywords are reserved
 * 3) All standard irON conventions and usages are followed
 * 4) Listed items are provided as unordered lists, not arrays.
 * 5) irXML files can be validated using XML schemas or DTDs
 * 6) Any XML technologies can be used over irXML files such as XSL Transformations (XSLT)

XML File Structure
Each XML document is composed of a root dataset object. This root object wraps the standard dataset definition objects. The  element introduces a list of record elements which are records belonging to the dataset.

Dataset Object
The  keyword introduces the Dataset XML object. This element is composed of multiple sub-elements describing the attributes of a dataset. Each sub-element refers to an attribute. Each value of each attribute element can be a literal (or something else if defined in a XML schema or a DTD), or a local or global reference to another instance record. The meaning and usage of each attribute is as described for the generic irON notation.

Using the metaFile Attribute
Here is an irXML dataset example using the metaFile attribute.

Embedding the Metadata
Alternatively, rather than invoke a separate instance record file with the metadata information, it can also be included with the dataset specification:

... 

Instance Record Object
The record attribute wraps the specific attributes for each instance object. The recordList attribute wraps one or more records.

Instance Record Example
Note: The names of the attributes to be used in the instance records specification must be equivalent to the keywords shown unless otherwise indicated.

Structure Schema Object
XML Schema, DTDs and any other XML technologies such as XML Transformations (XSLT) can be used to define schemas (and processing tools) for describing instance records. They can be used to specify how attributes and type of objects should be described, using what attributes and what values.

The structure schema is used to describe the structural relationships amongst the types and attributes used to describe instances records, and is introduced via the schema keyword element. This schema aims to create basic taxonomies of types and attributes that can be used as a simple TBox [4] to perform simple reasoning over the instance record instances. The structure schema is also used to define any structural features of a dataset: the types and formats of the attributes used to describe the instance records of the dataset. The typeList element refers to a list of class types. Each class type is a XML element.

Structure Schema Example
Note 1: The names of the attributes to be used in the schema specification must be equivalent to the keywords shown unless otherwise indicated.

Note 2: To enable the use of more complex TBoxes for the instances records that have been described, the schema linkage (see next) has to be used to link the types and attributes of the instance records to the types and attributes of the more complex TBox format/language.

Linkage Object
The schema linkage is a new kind of specification that aims to link the class types and attributes used to describe instances records to class types and attributes of other formats and languages; it is introduced via the linkage keyword element. The schema linkage leads to transformation rules to convert instance records in other formats. A set of special attributes has been created to define this linkage as described in the table below.

Linkage Example
The names of the attributes to be used in the schema specification must be equivalent to the keywords shown unless otherwise indicated.

Linking a Dataset to a Structure or Schema Linkage
There are two ways to link a dataset to a Linkage or structure Schema:


 * 1) By using the linkage or schema to link a dataset to the description of these schemas.
 * 2) By embedding the linkage or schema in an XML object.

Here are two examples demonstrating each possibility:

Case #2
Note: The list of "linkage" means that more than one linkage can be defined for a dataset. The goal is to enable data descriptors to be able to define schema linkages for multiple formats and languages if needed.

Specific irXML Examples
Here is a set of examples that show you the irXML notation and vocabulary in action.

Example #1: First irXML Example
Here is an example of the description of a bibliographic record using irXML. This example demonstrates the publication of an article, in a book which is part of a series.

Note: The irXML schema related to this example could be changed so that the information about the series, the book and the publisher are part of the description of the article. This has to be decided by the data publisher (the one that creates the dataset).

 Jim Pitman http://www.stat.berkeley.edu/~pitman/  http://dataset.com/schema/linkage.js   http://dataset.com/schema/structure.js

 MR2276901 Article Two recursive decompositions of Brownian bridge related to the asymptotics of random mappings Aldous and Pitman (1994) studied asymptotic distributions, as n tends to infinity, of various functionals of a uniform random mapping of a set of n elements, by constructing a mapping-walk and showing these mapping-walks converge weakly to a reflecting Brownian bridge. Two different ways to encode a mapping as a walk lead to two different decompositions of the Brownian bridge, each defined by cutting the path of the bridge at an increasing sequence of recursively defined random times in the zero set of the bridge. The random mapping asymptotics entail some remarkable identities involving the random occupation measures of the bridge fragments defined by these decompositions. We derive various extensions of these identities for Brownian and Bessel bridges, and characterize the distributions of various path fragments involved, using the theory of Poisson processes of excursions for a self-similar Markov process whose zero set is the range of a stable subordinator of index between 0 and 1. 2006 Aldous, David</prefLabel> </metaData>  Jim Pitman</prefLabel> http://www.stat.berkeley.edu/~pitman/</prefURL> </metaData> <isPartOf>  In memoriam Paul-André Meyer: Séeminaire de Probabilités</prefLabel> </metaData> </isPartOf> jpitman</id> Person Jim Pitman</prefLabel> http://www.stat.berkeley.edu/~pitman/</prefURL> Jim Pitman http://www.stat.berkeley.edu/~pitman/ ustanford</id> Organization Stanford University</prefLabel> http://www.stanford.edu/</prefURL> book_id</id> Book In memoriam Paul-Andrée Meyer: Séminaire de Probabilités</prefLabel> In memoriam Paul-Andrée Meyer: Séminaire de Probabilités  Michel Émery</prefLabel>  <metaData> <prefLabel>Marc Yor</prefLabel> <metaData> <isPartOf> <metaData> <prefLabel>Lecture Notes in Math.</prefLabel> <metaData> </isPartOf> <id>series_id</id> Series <prefLabel>Lecture Notes in Math.</prefLabel> 1874         <metaData> <prefLabel>Springer</prefLabel> <metaData> </recordList>

Converting irXML into RDF
This section is not yet drafted. It will explain how a irXML-to-RDF converter can be written, and how it is expected to behave. We will explain how a linkage schema can be used to create transformation rules that will take irXML statements and then create RDF triples by applying the rules defined in the linkage schema.

= SUB-PART 2: irJSON PROFILE =

This sub-part of the irON specification describes the JavaScript Object Notation (JSON) serialization, irJSON. Its use and genesis was spurred by development of the BibJSON specification for the Bibliographic Knowledge Network project. Technically speaking, BibJSON is a specific instantiation of the irJSON specification.

Role and Use
The purpose of irJSON is to enable datasets, instance records, data structures and the linkages between them to be specified using the JavaScript Object Notation (JSON). JSON is the native data input form for JavaScript Web applications and widgets and has become a common data exchange format and framework in its own right. JSON is a little cumbersome to write by hand, but is readily supported by all leading scripting languages with many libraries, validators, converters and editors extant for reading and ingesting JSON data.

Even the simplest key-value pair representation of an instance record needs some syntactic grammar and some interpretation conventions. In irJSON, the full range of the JSON syntax is used to serialize instance records. Some conventions are added along with the vocabulary to properly interpret the JSON syntax in the context of an irON instance record.

The specifications provided herein can be used in separate, modular ways in multiple files, or combined. The linkage and structure schema provide unlimited flexibility and extensibility to the basic JSON notation.

The MIME type for irJSON is ; files written in it should have the   extension.

Differences from Generic JSON
The full range of the JSON syntax is used to serialize instance records. Some conventions are added along with the vocabulary to properly interpret the JSON syntax. This means that all the JSON rules and conventions are applied here: the data structures available, the encoding practices, etc. However, there are some terminology differences between the two notations. Here is a table that reifies the meaning of each concept:

Though not an official part of the JSON notation, the schema logic within irJSON also builds and is consistent with the emerging JSON Schema effort. JSON Schema is a specification for a JSON-based format for defining the structure of JSON data. Furthermore, the formats supported by irJSON under the format attribute are the same as those specified for JSON Schema.

Properly formatted irJSON files will validate with the standard JSON validator, JSLint. Though there are JSON Schema validators also available, none of those are yet at a sufficient state of maturity to validate irJSON schema.

Differences from Generic irON
The entire vocabulary and set of modules and objects in irON are available and used by irJSON. For all xxxList attributes, irJSON treats them as objects with arrays, the same as JSON.

The options keyword is not used in irJSON, nor its three attributes of,  , and   (it is advisable to avoid use of these terms so that conversions to commON are not confused).

Summary of Conventions

 * 1) irJSON strictly conforms to the JSON syntax
 * 2) irON vocabulary keywords are reserved
 * 3) All standard irON conventions and usages are followed.

JSON File Structure
Each JSON document is composed of a root JSON object. This root object is composed of a “dataset” string (attribute) and a “recordList” string (attribute). The "dataset" attribute introduces the Dataset object. The "recordList" attribute introduce an array of instance record objects.

Dataset Object
The "dataset" attribute introduces the Dataset JSON object. This object is composed of multiple "string": "value" references. Each string refers to an attribute. Each value can be a string an array or an object. The meaning and usage of each attribute is as described for the generic irON notation.

Using the metaFile Attribute
Here is an irJSON dataset example using the meta attribute:

Embedding the Metadata
Alternatively, rather than invoke a separate instance record file with the metadata information, it can also be included with the dataset specification directly:

Instance Record Object
The recordList attribute refers to an array of instance record(s). Each instance record is a JSON object.

Instance Record Example
Note: The names of the attributes to be used in the instance records specification must be equivalent to the keywords shown unless otherwise indicated.

Structure Schema Object
JSON schema can be used to define schemas for describing instance records (they also have a reference to common data type formats).They can be used to specify how attributes and type of objects should be described, using what attributes and what values.

The structure schema is used to describe the structural relationships amongst the types and attributes used to describe instances records. This schema aims to create basic taxonomies of types and attributes that can be used as a simple TBox [4] to perform simple reasoning over the instance record instances. The structure schema is also used to define any structural features of a dataset: the types and formats of the attributes used to describe the instance records of the dataset.

The typeList attribute refers to an array of class types. Each class type is a JSON object.

Structure Schema Example
Note 1: The names of the attributes to be used in the schema specification must be equivalent to the keywords shown unless otherwise indicated.

Note 2: To enable the use of more complex TBoxes for the instances records that have been described, the schema linkage (see next) has to be used to link the types and attributes of the instance records to the types and attributes of the more complex TBox format/language.

Linkage Object
The schema linkage is a new kind of specification that aims to link the class types and attributes used to describe instances records to class types and attributes of other formats and languages. The schema linkage leads to transformation rules to convert instance records in other formats. A set of special attributes has been created to define this linkage as described in the table below. In the example 3 and 4 below, we are demonstrating some linkage between a dataset and external formats such as BibTeX and RDFS/OWL ontologies properties and types.

Linkage Example
Note: The names of the attributes to be used in the schema specification must be equivalent to the keywords shown unless otherwise indicated.

Linking a Dataset to a Structure or Schema Linkage
There are two ways to link a dataset to a Linkage or structure Schema:


 * 1) By using the linkage or schema to link a dataset to the description of these schemas.
 * 2) By embedding the linkage or schema in a JSON object

Here are two examples demonstrating each possibility:

Case #2
Note: the list of "linkage" means that more than one linkage can be defined for a dataset. The goal is to enable data descriptors to be able to define schema linkages for multiple formats and languages if needed.

Specific irJSON Examples
Here is a set of examples that show you the irJSON notation and vocabulary in action.

Example #1: Bibliographic Record
Here is an example of the description of a bibliographic record using irJSON. This example demonstrates the publication of an article, in a book which is part of a series.

Note: the irJSON schema related to this example could be changed so that the information about the series, the book and the publisher are part of the description of the article. This has to be decided by the data publisher (the one that creates the dataset).

Example #3: A irJSON Bibliographic Vocabulary to BibTeX Schema Linkage
Here is an example of a schema linkage. This schema describes the relationships between a irJSON bibliographic vocabulary and BibTeX.

Example #4: A irJSON Bibliographic Vocabulary to RDF Schema Linkage
This other example demonstrates how we can create another linkage file to link this irJSON Bibliographic Vocabulary attributes and types to RDFS/OWL ontologies properties and classes. This example shows the flexibility of a irJSON linkage file and how it can be used to link the same simple instance record vocabulary to different format/languages.

Converting irJSON into RDF
This section is not yet drafted. It will explain how a irJSON-to-RDF converter can be written, and how it is expected to behave. We will explain how a linkage schema can be used to create transformation rules that will take irJSON statements and then create RDF triples by applying the rules defined in the linkage schema.

= SUB-PART 3: commON PROFILE =

This sub-part of the irON specification describes the comma-delimited or comma-separated values (CSV) serialization, commON.

Role and Use
The most common data authoring environment in the world is the spreadsheet. Spreadsheets are a ubiquitous tool for knowledge workers. And, CSV (comma-separated values), a very old file format that predates personal computers but was embraced by Microsoft as a spreadsheet representation, is a nearly ubiquitous data exchange format that is also easily read by humans.

Most simple data can be developed and provided as text datasets based on attribute-value pairs (also known as key-value pairs and many other variants). In spreadsheets, a tabular view of similar "things" (instances) can be readily presented where the records of those instances represent the rows in the table or spreadsheet, the attributes or properties or "fields" describing those things are listed in the columns. Indeed, this basic framework is also what is used in relational data tables.

When exported, CSV only contains the cell data values from a spreadsheet. While this has the disadvantage of losing formatting, formulas and other niceties of spreadsheets in their native form, it also makes the data exchanged clean and relatively uniform. During data development and preparation the spreadsheet can be used in all of its native capabilities to provide data validation, sorting, formatting, cell referencing, calculations of (say) totals and subtotals, etc. This means that templates with useful prompts and controlled vocabularies and rapid editing and entry functions can be quickly developed for a spreadsheet, then followed by clean data export using CSV for use by other tools and for data federation.

Moreover, with just a little bit of extra specification, the staging of this data and then its export using CSV can also achieve broader operability. These are the rationales for commON.

The purpose of commON it to provide an easy data authoring and dataset creation environment for knowledge workers. By following a few conventions and using common spreadsheet tools, knowledge workers with domain knowledge but little or no programming and scripting language skills can rapidly and effectively create small datasets (databases) or can extract information from existing spreadsheets for integration and interoperability.

In keeping with the irON notation, commON consists of a number of modules that define the various aspects of a full specification. These modules may be specified in a single CSV file or in multiple separate files. The start of a module is signaled by a process-based reserved keyword (see below) that the parser recognizes.

CSV as generally used is "schema-less". In order to embrace the irON notation, vocabulary and structure, commON introduces a number of conventions that must be followed in order to achieve the irON objective of staging data for RDF interoperability.

The MIME type for irJSON is.

Differences from Generic CSV
Though in use for decades, CSV only received a formal MIME-type specification in 2005. CSV has relatively few conventions and limited syntax.

commON is fully compliant with RFC 4180 and the parser and ingesters used by the structWSF framework. commON has been validated for Microsoft Excel CSV and Open Office CSV.

Of course, in addition to the core CSV specification, commON adds many conventions and constructs in keeping with the irON notation.

Differences from Generic irON
As a relatively "schema-less" framework, CSV presents a number of challenges to embrace the full slate of capabilities within the irON notation. While it is possible with many conventions and restrictions to achieve the full slate of irON capabilities, some have been dropped from commON to promote simplicity and ease-of-use. These capabilities may be added back in over time, particularly if better conventions can be discovered.

This section outlines these differences between irON and commON.

No Schema Module, Other Changes
As the most complex specification within irON, the schema module has been dropped from commON (at least for the present). This decision still allows useful datasets to be authored and linked to existing schema, but the actual schema specification must either occur through one of the other irON serializations (irXML or irJSON), or directly via RDF or OWL.

In commON, the ingest of attributes, class types and records has been standardized as a list process, with the operation signaled by keyword and convention. As a result, the irON attributes of attribute, type and record are not used in commON.

The irON attributes of format type and addMapping have also been dropped from commON to streamline the specification.

Augmenting Attributes with Metadata
The section within the main irON notation described the approach for Augmenting Attributes with Metadata. For the commON serialization, once a primary attribute is stated for an instance, metadata can be added for that attribute by:


 * 1) Listing the metadata in the next column to the right of the subject attribute, and
 * 2) Using the nested notation of   to designate the new metadata.

The key to the syntax is appending of the metadata attribute with the ampersand (' ') designator to the primary attribute (which has already been designated with the standard & designator), leading to the linked ampersand convention.

Here is an example of this syntax with paper as the primary attribute, and the source attribute as the metadata about the paper:

Specific Processing Options
For preference and readability reasons, some may prefer to present and manipulate their information with slightly different options for conventions. These processing choices are invoked via the &&options keyword. The current ones availalble in commON are:

The  attribute sets what the delimiter is for a list of values for a single attribute. The default value is the pipe characters ('|'), though these characters are also possible: comma (','), semi-colon (';'), and squiggle ('~'). The data publisher has to specify a value for the  attribute to tell the commON processor engine how to escape/unescape the list separator character.

The  attribute is a Boolean (yes, no) value. If set to yes, a value is added to the instance record listings that enables all values to be sorted in offline applications (especially spreadsheets for the commON serialization).

Note: It is recommended to set the  value to 'yes' if you are using the Stacked style (see below) for instance records.

Instance Record Presentation Styles
As discussed for the Instance Record Object below, there are two entry presentation styles available for instance records: Row and Stacked. See further that section.

Reduced Keyword Set
As a result, these existing irON attributes and keywords are not presently available for use within commON:

It is recommended not to use these terms in a commON specification as they might pose conflicts with other irON serializations.

General commON Design
The "schema-less" nature of CSV necessitates introducing some broad design considerations to commON. These are:


 * All keywords and modules are introduced by a standard character, with the ampersand (' ') chosen for this purpose
 * Module or processing sections require a further convention, with the double ampersand (' ') prefix chosen for this purpose
 * Minor, but set structural conventions in relation to specified rows and columns, requirements help instruct the commON ingesters and parsers, and
 * A design objective has been to limit the number of conventions and requirements to as small a set as possible.

Summary of Conventions
Thus, here are the specific commON conventions:


 * 1) irON vocabulary keywords are reserved
 * 2) All section (object) types begin with   (which may also signal slight differences to the parser depending on mode keywords)
 * 3) All attribute names begin with a single ampersand
 * 4) In the instance record layout :
 * 5) * the first row is restricted to the listing of attribute names (using the single ampersand prefix)
 * 6) *  is required and must be placed in the first column
 * , if provided (and is highly recommended), must be placed in the second column
 * 1) * attributes with a list of values must have the values separated by the pipe ('|') character (or the alternative character per the  attribute)
 * 2) Metadata about primary attributes for the instance are denoted by the linked ampersand notation of , so long as this metadata immediately follows the first separate listing of the primary attributeName
 * 3) The ,   and   sections list each entry sequentially by row with key:value in Cols 1 and 2
 * 4) Blank rows may be inserted anywhere for readability
 * 5) Comment rows begin with # (it might also appear as "# in CSV file) and are ignored during processing.

Dataset Object
The dataset object is signaled by the  keyword. The following example includes some metadata about the dataset, as well.

Instance Record Object
The instance record object is signaled by the &&recordList keyword. By convention, the next line must include the attribute names (& prefix) by column, followed by the instance records row-by-row until the file ends or a another processing keyword ('&&') is encountered.

Also, as noted, the first two columns in this tabular layout are for &id and &type, respectively.

There is no practical limit on the number of attributes (columns) that can be specified. For readability it is advisable to break such instance tables into digestible chunks. Also, attributes can be repeated when the schema allows multiple entries.

Instance Record Table Examples
As noted, there are two possible instance record table styles using commON. The parser recognizes both styles without further notation or instruction. Which style you use is a matter of your own preference.

Instance Record Row Style
Here is an example of the Row style, where all attributes are listed in columns in a single row. If there are multiple, same attributes, and you are not using the list separator convention, each duplicate attribute would be listed in its own column:

Instance Records Stacked Style
Here is an example of the Stacked style, where duplicate values for a single attribute are listed in a stacked manner row-by-row until the listing is completed. This style is useful when you want to see long attribute names, such as URIs, for example:

Note: The ID of the first column can be reitirated until it reachs the end of the description of the record if it is what the data publisher prefer.

Schema Linkage
The linkage to schema is signaled by the  keyword. Within that, there can be a number of sub-sections, such as  and   (both required), and then ,   or. Each of the &xxxList sub-sections conform to the name:value pair where the first (Col 1) entry is the name of the listed attribute and the next column is its referenced  value.

Linkage Example
Note in this example the prefix list is not used.

To invoke a prefix list, here is an example for the last entry above:

which, when referenced in a, would appear as follows:

Downloadable Examples
Because they are difficult to reproduce in this format, here are two examples of complete commON files that are available for inspection:


 * The BKN project dataset, which is also used to populate the templates on the BibKN.org Web site, and
 * Sweet Tools, Mike Bergman's listing of 800+ semantic Web and -related tools.

Additionally, to demonstrate how controlled vocabularies, validation tables and the like may be linked in with a commON CSV output, we also provide the entry spreadsheet for Sweet Tools that shows these options in action. Saving as a CSV creates the very same commON CSV input file noted directly above.

These files and uses are explained in the accompanying document to this specification, Annex: A commON Case Study using Sweet Tools.

=  ACKNOWLEDGEMENTS =

Work towards this specification was supported in part by the Bibliographic Knowledge Network Project (NSF Award 0835851). The support was specifically for the BibJSON version of irJSON and the parser and converter designs.

= REFERENCES =