Archive 1.x:Adding a New Dataset

From OSF Wiki
Jump to: navigation, search

This guide provides basic steps for how to add and then integrate a new dataset into your local instance using various open semantic framework tools.

The Example Case

We will take as our example adding a new dataset of single-family dwelling housing starts for the year 2006 by hypothetical neighborhood. The introduced data values are submitted under the "foo" namespace. We also introduce at the conclusion of the example that the attributes introduced by this data are new, and need to also be accommodated in the governing ontology of the instance.

Preparing Up the Dataset

The dataset is prepared up as a standard instance record object notation (irON), using the comma-delimited (spreadsheet) CSV format called commON. For more information on commON and how to define datasets in it, see the separate commON case study.

The example dataset can be downloaded (SFD_housing_starts.csv)for local inspection; it appears as this:

C csv.png

The basic layout begins with a definition of the dataset and its metadata (&&dataset), and then presents the actual data records(&&recordList). See further the commON case study.

Import a Dataset

To submit a new dataset to the system, you use the OSF-Drupal Import tool available to system administrators.

The example screen shots below are based on the Citizen Dan local government community indicator system. Your own installation likely has a much different style for its user interface and a different placement of the Tools options. Further, the example screen shots for Citizen Dan require system administrator privileges, which are not viewable by standard users.

To conduct an import, you first must have system administrator privileges.

Assuming you do, then pick the Tools option (highlighted; though link location may vary by interface style), and then the Import tool.

That will bring up the Import tools screen, wherein you should point to the dataset already prepared on your local machine, and give it a name and description:

C dataset import.png

If there are errors in the import, you will be signaled as such. Likely problems are a mis-specification in the commON file, which when corrected, can be re-imported.

Successful import of a dataset then registers it within the system. As the submitter, you are automatically identified as the dataset creator, and the time of submission is used to time stamp the dataset. Here is how a successfully submitted dataset appears:

C dataset registered.png

Append to a Dataset

Sometimes, the data which you desire to submit represents additional records for an existing dataset. This might occur, for example, when you gain more customers, add a product to a product line, get another year's worth of data, or similar.

In such cases, the Append tool is the proper one to use. Prior to using it, you may want to review this documentation.

After creating your new records in proper format, submit those records as a new dataset (see above). Because your intent is to append to an existing dataset, you may want to name your dataset to signal such. In our example, we have named the new dataset 'SFD Housing Starts - APPEND'.

C dataset append1.png

The Append tool actually progresses through a number of steps. After picking the source dataset, you then need to specify the target:

C dataset append2.png

In this case, our target was the original 'SFD Housing Starts' dataset. Now, having specified both our source and target, we proceed to append:

C dataset append3.png

The append process will then begin. At its conclusion, you will be given the choice of deleting the appended dataset or not (not shown). If you have any doubts the process has worked properly, do not delete. Inspect your dataset to see if the append has occurred, and, if so, you can delete it at that time.

Conceptual Implications of the Dataset

Now that the dataset has been added, we need to make sure that it is properly modeled and linked into the domain ontology guiding your specific instance. (If you are not already familiar with them, you may want to see the other background material regarding ontologies on this wiki.)

You may find, for example, that to properly include your new data in your system, that you are missing a "bridging concept" between an existing concept ("parent") already in the ontology, as well as some attributes (data) that describe that concept.

Let's say, for example, that our existing ontology has the concept of housing, but not the concept of single-family dwellings or the specific data attributes captured by our 'SFD Housing Starts' data. The basic conceptual gap this represents appears as follows, with housing representing the "parent" concept and single-family dwellings the "child":


Integrating the Dataset with the Ontology

Because of the conceptual implications noted above, some changes to the existing ontology need to be made in order to effect this integration. Please see the following guide on Adding an Ontology Concept using Protégé for the next steps in this process.

Dealing with Missing Attributes and Types

If you are using structImport to import a dataset and that the option "Check for missing attributes and types in the imported dataset." is enable, or if your importation script support that functionality, then each time you import a new dataset, the system will tell you which attributes, or types, used in the dataset are missing in the ontologies structure currently used by the OSF instance.

Read more about what should be done once such attributes and/or types are detected at importation time.