Adding a New Dataset

From OSF Wiki
Jump to: navigation, search

This guide provides basic steps for how to add and then integrate a new dataset into your local instance using various Open Semantic Framework tools.

The Example Case

We will take as our example adding a new dataset of single-family dwelling housing starts for the year 2006 by hypothetical neighborhood. The introduced data values are submitted under the "foo" namespace. We also introduce at the conclusion of the example that the attributes introduced by this data are new, and need to also be accommodated in the governing ontology of the instance.

Preparing Up the Dataset

The dataset is prepared up as a standard instance record object notation (irON), using the comma-delimited (spreadsheet) CSV format called commON. For more information on commON and how to define datasets in it, see the separate commON case study.

The example dataset can be downloaded (SFD_housing_starts.csv)for local inspection; it appears as this:

C csv.png

The basic layout begins with a definition of the dataset and its metadata (&&dataset), and then presents the actual data records(&&recordList). See further the commON case study.

Import a Dataset

Click on the top Configuration menu item. Then, you have to click the Configure OSF for Drupal modules.

OSF for Drupal configurations

To import a dataset, you simply have to click the + Import Dataset link on the DATASETS & NETWORKS tab.

Import a new dataset

The Import Dataset page will let you import a dataset serialized in one of the following formats:

What you have to specify to import a new dataset is:

  • Dataset file to import
    • Select the RDF file you want to import from your local computer
  • Content type
    • Select the type of RDF file you are trying to import
  • Dataset name
    • Define the name of the Dataset you are importing
  • Dataset description (Optional)
    • Optionally define the description of that dataset
  • Custom Dataset URI (Optional)
    • Define the URI of the dataset. If you don't provide any URI, then OSF for Drupal will create one for you
  • Save dataset on this network
    • Choose on which OSF Web Services endpoint you want to import that dataset
  • Which role should have full permissions on this dataset
Note: you may be limited in term of the size of the dataset file you may want to import. If you want to use the Datasets Management Tool if you want to import bigger datasets into the system.


Then you only have to click the Import button to start the dataset importation process.

Form for importing a new dataset

At this point, the dataset got created into the OSF instance. All the content of the dataset file you imported as been indexed in that newly created dataset.

Once the dataset is imported, you will get redirected to a new page. If you checked the Check attributes and types existence option, then you would be seeing the possible warnings on that page. If you didn't, then the user interface is asking you to click the Expose Imported Dataset button. The only thing you have to do is to click on that button to get redirected to the form you have to fill to expose the dataset to Drupal.

Exposing the newly imported dataset in Drupal

The last step is to expose the dataset you just imported into the OSF instance to Drupal. If you skip this step, then the dataset will be on the OSF instance, but it won't be usable to any OSF for Drupal module.

  • Administrative title
    • This is the name you want to give to this dataset. This name is local to this Drupal instance. It will be used to refer to the dataset within the user interface of this Drupal portal
  • Dataset is searchable
    • This specifies if you want to have this dataset searchable by the OSF SearchAPI module. If this option is unchecked then the content of this dataset won't participate into the seaches performed by the OSF SearchAPI module

Once you are done, you simply have to click the Save button to expose this newly imported dataset to Drupal.

Form for exposing the new dataset

Now you can see the newly imported dataset in the list of accessible datasets.

The new dataset appears into the list of available datasets

Conceptual Implications of the Dataset

Now that the dataset has been added, we need to make sure that it is properly modeled and linked into the domain ontology guiding your specific instance. (If you are not already familiar with them, you may want to see the other background material regarding ontologies on this wiki.)

You may find, for example, that to properly include your new data in your system, that you are missing a "bridging concept" between an existing concept ("parent") already in the ontology, as well as some attributes (data) that describe that concept.

Let's say, for example, that our existing ontology has the concept of housing, but not the concept of single-family dwellings or the specific data attributes captured by our 'SFD Housing Starts' data. The basic conceptual gap this represents appears as follows, with housing representing the "parent" concept and single-family dwellings the "child":

AddingAttributeSchematic.png

Integrating the Dataset with the Ontology

Because of the conceptual implications noted above, some changes to the existing ontology need to be made in order to effect this integration. Please see the following guide on Adding an Ontology Concept using Protégé for the next steps in this process.

Dealing with Missing Attributes and Types

If you are using structImport to import a dataset and that the option "Check for missing attributes and types in the imported dataset." is enable, or if your importation script support that functionality, then each time you import a new dataset, the system will tell you which attributes, or types, used in the dataset are missing in the ontologies structure currently used by the OSF instance.

Read more about what should be done once such attributes and/or types are detected at importation time.