Dataset Concept

A dataset (or data set) is a collection of data, usually presented in tabular or text form. When in the form of a table, each column represents a particular variable and each row corresponds to a given member of the dataset in question. In text form, the data values are most often presented in some key-value pair format.

The central, organizing basis for managing structured data within structWSF and OSF-Drupal is the dataset.

A dataset is used to document information about the creation of instance records, and to link external resources to them (like the linkage and structure schema). A dataset can be seen as an aggregation of instance records used to keep a reference between the instance records and their source (provenance). A dataset can be split into multiple dataset slices. Each slice can be written in a separate file. Each slice of a dataset shares the same  of the dataset.

Datasets normally contain one or more data records from a single source representing the same type of instance(s). However, the flexibility of a dataset can accommodate any other less-usual usecases. Datasets may reside on the Web as well as be stored locally. Each dataset is uniquely identified with standard metadata characterizations.

At minimum, datasets have a simple structure of attribute-value pairs for each instance record. However, they may also have more complex structure via schemas (ontologies) that also describe the relationships between concepts and attributes and may even relate those to external schema.

All structWSF web service endpoints and OSF-Drupal tools operate against one or more datasets, which can be selected for these operations. Individual users may be assigned access rights or not to each of these datasets, and whether they have CRUD (create-read-update-delete) permissions or not.

The combination of access rights and permissions then defines which tools and what operations are available to a given user for each dataset. The permissions can be defined by interacting with the Auth Registrar: Access web service endpoint, or the OSF-Drupal structDataset module.

As used within this offering, dataset often has a more specific in terms of the irON format bases used for many ingests.