Dataset Specifications and Metadata

Within the open semantic framework, datasets are objects in their own right and accessed and managed as such. This article describes how to characterize and provide metadata for a dataset. Note: Much of the information herein is drawn from the irON specification.

A dataset is used to document information about the creation of instances records, and to link external resources to them (like the linkage and structure schemas; more about this below).

A dataset can be seen as an aggregation of instance records used to keep a reference between the instance records and their source (provenance). A dataset can be split into multiple dataset slices. Each slice can be written in a separate file. Each slice of a dataset shares the same  of the dataset.

Dataset Description
A dataset description, or what is known in irON as Core Dataset Attributes, is what is suggested to be included with any dataset or dataset slice specification. Note some of these attributes are required, some are recommended, and others are optional.

There are a couple of important points regarding this listing:


 * 1) If the attributes or resources are already in the ontology, only the linkage information is necessary to match the source data to the OSF ontology(ies)
 * 2) Any attribute of your own choosing may be added to this list to accommodate your own organization's requirements and workflows.

Abstract Dataset Specification Example
Here is an example of an abstract dataset specification, with additional attributes beyond the core.

Metadata
Metadata may be added to the dataset specification via the optional metaFile attribute (see above) or by embedding in the dataset specification itself.

Suggested Metadata Attributes
Note these attributes follow the general instance record object specification for irON and may contain any arbitrary attributeName attributes as desired. Alternatively, as noted, these same attributes and values may be embedded within the dataset specification or in the separate MetaFile.

Upon definition of the dataset and its metadata, it is now time to prepare and import the datasets.