Single Authoring Environment: the OSF FieldStorage Connector

From OSF Wiki
Jump to: navigation, search

Introduction

This document is a technical overview of the OSF FieldStorage module. It is complementary to the Saving Local Content in OSF user manual page.

The goal of a single authoring Drupal environment is to treat all content, local and external, the same such that all the portals can have access to the same information via the same set of APIs into a single OSF instance. "Single authoring" is a shorthand for indicating that all content, local and external, to be authored and managed in the same way within Drupal. The realization of a "single authoring environment" requires that "bi-directionality" be in place that enables changes made in Drupal to be reflected in the OSF instance and changes made in the OSF instance to be reflected in Drupal. Thus, "single authoring" and "bi-directionality" should be assumed as talking largely about the same set of requirements.

OSF Entities and OSF FieldStorage

The goal of OSF Entities and OSF FieldStorage is the same: using OSF content, directly in Drupal, using Drupal's core API. This means that everything that is edited within Drupal gets automatically modified into OSF without any synchronization mechanism. However, though their goal is the same, their usage and purpose (raison d'être) is different.

OSF Entities is used to expose OSF datasets content to Drupal as a new Entity type: Entity Resource Type. This new entity type expands what can be done with entities by Drupal developers and administrators. More OSF/Ontologies features are exposed via this new entity type. For example, the forms created from this new entity type are more flexible and validated according to the description of the fields and bundles in the ontologies. It also automatically enables autocompletion mechanisms for the values of these fields depending on the ontology metadata as well.

OSF FieldStorage is used as a new storage system for all the Content Types. By default, the entity storage system for Content Type entities in a standard Drupal distribution is MySQL. However, by using OSF FieldStorage, we are switching the entity storage system of the content types from MySQL to OSF. This means that every time a content type is being read, saved, modified or deleted, it gets read, saved, modified or deleted in OSF and not in MySQL. However, exactly the same experience should be provided to the Drupal users and administrators: the UI experience should be exactly the same. The user interface should not be changed via altering hooks.

The main distinctions between OSF Entities and the OSF FieldStorage modules are:

  • OSF FieldStorage changes the entity storage system of content types, however the content types are exactly the same, and their UI remains unchanged. The only thing that changes with this module is that all the "local" information of these content types get accessed from OSF instead of the local MySQL database.
  • OSF Entities creates a new kind of entity which exposes more OSF features via the Entity's class instance. The user experience is a bit different than what they would experience with conventional content types because more OSF related features are exposed. Also, OSF Entities is used to synchronize all of OSF classes and properties as bundles and fields in Drupal. These synchronized bundles and fields are then used by OSF FieldStorage to map the content type bundles and fields to RDF classes and properties (exactly like the RDF UI sub-module of the RDFx Drupal project).

Additionally:

  • OSF FieldStorage only interface local content (Content Types) that uses fields which uses the osf_fieldstorage field storage system. This content is saved to OSF, and is interfaced with Drupal via the OSF FieldStorage module
  • OSF Entities exposes external data contained within OSF datasets natively within Drupal
    • This means that if we have two Drupal instances A and B which uses the same OSF instance. All the local content managed by OSF FieldStorage from the instance A can be exposed as external data on instance B by using OSF Entities

Another big difference between OSF FieldStorage and OSF Entities is that OSF Entities has to comply with the content that is in OSF. This means that at the Drupal level, we only change the references to the properties and classes that changed into OSF. We synchronize the ontologies as opposed to synchronizing the data. With OSF FieldStorage, it is the opposite situation: it is OSF's content that has to comply with Drupal's content type descriptions. This means that if the Content Type schema (bundles and fields) changes, then we have to accordingly synchronize the data mapping changes within OSF.

This difference exists because we don't want to change any behaviors related to the content types. This is the reason why OSF content has to comply with the content type descriptions when we use OSF FieldStorage.

The cost of synchronizing data instead of ontologies is much greater computationally. As a general rule of thumb, OSF Entities should be the module of choice in most instances; OSF FieldStorage is best reserved for necessary Content Types (like the ones required to use specific modules that don't work with Resource Type Entities).

OSF FieldStorage

Architecture

Here is the general architecture for the OSF FieldStorage module. In Drupal, all Content Types Entities use the Field Storage API to read, create, update and delete content into the default storage system. The default storage system that is used in a vanilla Drupal instance is MySQL. Each content type is a bundle and each bundle has one or multiple fields attached to it. In Drupal 7, each field has its own storage system. This means that a single content type item can have fields that use different storage systems.

The following schema shows how the Drupal Field Storage API Works, and shows the flexibility that resides into the fields, and how multiple fields, part of the same bundle, can use different storage systems:

Osf fieldstorage arc 2.png

Then, by default, on a vanilla Drupal instance, the same bundle with the same fields would look like this:

Osf fieldstorage arc 3.png

What the OSF FieldStorage module does, is to change Drupal's fields configuration to result in the following interaction:

Osf fieldstorage arc 4.png

This other schema shows you the interaction between Drupal's core API, OSF FieldStorage and the OSF web services endpoints:

Osf fieldstorage arc 1.png

Each time a content type form is generated, it calls the Field Storage API which invoke the hooks implementations into the OSF FieldStorage module which calls the CRUD: Read web service endpoint which then populates the content type entity instance with the proper information to display within the form that gets edited (except if the form is created to create a new entity).

If the form is used to create or update an entity, then CRUD: Create and CRUD: Update get called. If the "delete" button is used to delete this content type entity, then the CRUD: Delete web service endpoint is called.

Reading Content Type

Note: These steps only occur when the content type entity instance being loaded is not cached. If it is cached, then the cache system will be run to get the entity instance's description from MySQL.


  1. User loads a content type entity page, or loads an edit form that get populated with the entity's value(s)
  2. a lot of internal drupal form processing functions get called
  3. node_load() gets called
  4. entity_load() gets called from node_load()
  5. The default entity controller gets called (the one used to load content type entities). If a Node entity is being loaded, then the NodeController is called
  6. This controller calls the attachLoad() function to "attach" data to the entity instance
    1. The Field Storage API is then used to get data about this entity from the different storage system(s) to create the actual entity instance to return with entity_load()
  7. field_attach_load() get called by the attachLoad() function to load the data from the storage system(s)
  8. osf_fieldstorage_field_storage_load() get invoked by field_attach_load()

What the osf_fieldstorage_field_storage_load() function does is:

  1. Check if the current entity version needs to be loaded. If a revision is being requested, it takes the good revision version of that entity
  2. It queries the CRUD: Read OSF web service endpoint to get the description of the entity to load
  3. It reads the resultset and creates the entity class instance using information describing that entity
  4. Since the entity that is being populated is passed by reference in the hook, then nothing gets returned.

Creating/Updating (saving) Content Type

If OSF FieldStorage is installed on a Drupal instance, the following internal functions are called when the user clicks the "save" button in a Content Type form:

  1. User clicks "save"
  2. a lot of internal drupal form processing functions get called
  3. node_form_submit() gets called
  4. node_save() gets called from node_form_submit()
  5. field_attach_update() gets called from node_save(). This function invokes the hook_field_storage_write() for each storage type that is used for the fields that compose the content type form.
  6. osf_entitites_field_storage_write() gets invoked by field_attach_update()

What the osf_fieldstorage_field_storage_write() function does is:

  1. Check if the content type gets created or updated. If it is created, OSF FieldStorage creates the record using the CRUD: Create OSF web service endpoint. If it is updated, then it uses the CRUD: Update OSF web service endpoint to update its description
  2. It calls osf_fieldstorage_get_rdf_entity() to serialize the content type into RDF according to the RDF mapping that has been performed between the content type fields, and the ontologies properties and classes
  3. It use this RDF serialization to send to the CRUD: Create or CRUD: Update web service endpoint to save information about this entity
    1. If a save error occurs, the error is reported to the user and logged into the log
    2. Otherwise if the entity gets updated, its previously cached entity instance gets cleared from the cache.

Deleting Content Type

If OSF FieldStorage is installed on a Drupal instance, the following internal functions are called when the user clicks the "delete" button in a Content Type form:

  1. User clicks "delete"
  2. a lot of internal drupal form processing functions get called
  3. node_delete_form_submit() get called
  4. node_delete() get called from node_delete_form_submit()
  5. field_attach_delete() get called from node_delete(). This function will invoke the hook_field_storage_delete() for each storage types that are used for the fields that compose the content type form.
  6. osf_entities_field_storage_delete() get invoked by field_attach_delete()

What the osf_fieldstorage_field_storage_delete() function does is:

  1. It uses the CRUD: Delete OSF web service endpoint to delete the entity's description from the storage system.

Synchronization Usecases

Similar to the default MySQL field_sql_storage system, we have to take into account a few synchronization usecases when dealing with the osf_fieldstorage storage system for the Drupal content types.

Note: These usecases may happen. However, once the portals are properly configured, these situation should happen rarely.


  • What happens when a field gets delete in a content type?
    • This field is marked as deleted in the osf_fieldstorage_pending_opts_fields table
    • The field's data remains in OSF
    • The field's data is not populated when entities that were using this field are loading using entity_load()
      • This means that even if the data is not changed in OSF, the expected behavior is experienced in Drupal
  • What happens when a field's RDF mapping changes for a new property?
    • This field is marked as changed in the osf_fieldstorage_pending_opts_fields table.
    • The field's data remains in OSF, but uses the old RDF mapping
    • The field's data is populated when entities that were using this field are loading using entity_load() using the new property
      • This happens because osf_fieldstorage_field_storage_load()takes care of this situation. It checks if there is a pending operation related to this field, if there is, then it does the mapping automatically until the operation get changed, in bulk, within OSF
        • This means that even if the data is not changed in OSF, the expected behavior is experienced in Drupal when using the Entity API
        • This means that when querying OSF directly, different behaviors may happen until the bulk synchronization happened
  • What happens when a bundle's type RDF mapping changes for a new one?
    • This bundle is marked as changed in the osf_fieldstorage_pending_opts_bundles table
    • The bundle's data remains in OSF, but uses the old RDF mapping
    • The bundle's type is populated when entities that were using this field are loading using entity_load()using the new class
      • This means that even if the data is not changed in OSF, the expected behavior is experienced in Drupal when using the Entity API
      • This means that when querying OSF directly, different behaviors may happen until the bulk synchronization happened
  • What happens when a new field is added?
    • Nothing happens. Since this is a new field, there is necessarily no data for this field in OSF, so we just wait until people start using this new field to commit new data in OSF

Tables

Here are the new tables that have to be created in order to support the bulk synchronization heuristic outlined in the section below.

osf_fieldstorage_pending_opts_fields

The possible operations can be:

  • changed: which means that a rdfmapping changed
  • deleted: which means that a field/property/predicate got deleted
id bundle field rdfmapping prev_rdfmapping operation date executed
1 article image foaf:image foo:image changed 01/05/2013 1
2 article publisher bibo:publisher    deleted 01/05/2013 0
  1. This row tells the system that the field instance image, of the bundle article, gets its rdfmapping changed from foo:image to foaf:image
  2. This row tells the system that the field instance publisher, on the bundle article, gets deleted
osf_fieldstorage_pending_opts_bundles

The only possible operation is:

  • changed: which means that a rdfmapping changed
id bundle rdfmapping prev_rdfmapping operation date executed
1 article foo:Article bibo:Article changed 01/05/2013 1

Synchronization Heuristic

When we want, we apply all these changes, in bulk, within OSF according to these operations.

Every time a RDF Mapping gets modified, or a field gets created or deleted, these changes appear into one of the two tables specified above. Then, eventually a bulk synchronization operation will be run to synchronize OSF content according to the Content Type changes that have been specified in Drupal.

  1. Create the internal structure of changes from the osf_fieldstorage_pending_opts_fields table (un-executed operations)
  2. For each un-executed change:
    1. Get 20 records within the local content dataset from the Search endpoint. Filter the results to get only the ones that would be affected by the current change
    2. Do until the Search query returns 0 results
      1. For each record within that list
        1. Apply the current change using the Subject() class's API
        2. Save that changed record into OSF using the CRUD: Update web service endpoint
    3. When the Search query returns 0 results, it means that this change got fully applied to OSF. We then mark this change as executed.
  3. Create the internal structure of changes from the osf_fieldstorage_pending_opts_bundles table (un-executed operations)
  4. For each un-executed change:
    1. Get 20 records within the local content dataset from the Search endpoint. Filter the results to get only the ones that would be affected by the current change
    2. Do until the Search query returns 0 results
      1. For each record within that list
        1. Apply the current change using the Subject() class's API
        2. Save that changed record into OSF using the CRUD: Update web service endpoint
    3. When the Search query returns 0 results, it means that this change got fully applied to OSF. We then mark this change as executed.

The advantage of this algorithm is that even if the process breaks or is interrupted in the middle, we can re-run it to finish the changes without losing any information, or without forgetting about any changes that have been specified in Drupal. Depending on the scenario, synchronization may take a bit longer because more CRUD: Update queries would be sent, but the process would be safer because the process can be resumed anytime.

Field Types and their Widgets Analysis

These are field widgets analysis to check if all the core fields widgets are properly working with the OSF FieldStorage module.

Field Type Field Widget RDF Property Type Mapping Notes Implemented?
Text Text Field datatype/annotation properties    Fully operational
   Autocomplete for existing field data N/A Cannot be handled since this widget is using the field_data_field_text_1 table which is not existing for this storage system. This means that if this widget is required, that we have to update it to handle the osf_fiedlstorage field storage system. Disabled
   Autocomplete for predefined suggestions datatype/annotation properties    Fully operational
   Autocomplete for existing field data and some node titles N/A Cannot be handled since this widget is using the field_data_field_text_1 table which is not existing for this storage system. This means that if this widget is required, that we have to update it to handle the osf_fiedlstorage field storage system. Disabled
   OSF Entity Reference object properties    Fully operational
   OSF Concept Reference (Tagging) object properties    Fully operational
Term Reference Autocomplete term widget (tagging) N/A We may want to keep all the fields that are defined using the Term Reference field widget with the field_sql_storage system. The rational is that all the taxonomies that are being used by these kind of fields are local to the Drupal portal. These taxonomies are not shared between the Drupal portals, and so all their information (tid (tag or taxonomy ID)) are local to the Portal.

For now, it is not working since the Term reference field type uses the Revisioning module. That module, via its API call revisioning_get_tids() tries to have access to the table field_revision_field_text_1 which is not existing with this storage system.

<code>Disabled</code>
   Select list N/A For now, it is not working since the Term reference field type uses the Revisioning module. That module, via its API call revisioning_get_tids() tries to have access to the table field_revision_field_text_1 which is not existing with this storage system <code>Disabled</code>
   Check boxes/radio buttons N/A For now, it is not working since the Term reference field type uses the Revisioning module. That module, via its API call revisioning_get_tids() tries to have access to the table field_revision_field_text_1 which is not existing with this storage system <code>Disabled</code>
Long text and summary Text area with a summary datatype/annotation properties    Fully operational
Long text Text area (multiple rows) datatype/annotation properties    Fully operational
List (text) Autocomplete for allowed values list N/A This is using a controlled list on each local Drupal instance and local array indexes. This means that if this widget is required, that we have to update it to handle the osf_fiedlstorage field storage system. <code>Disabled</code>
   Select list datatype/annotation properties    Fully operational
   Check boxes/radio buttons datatype/annotation properties    Fully operational
List (integer) Autocomplete for allowed values list N/A This is using a controlled list on each local Drupal instance and local array indexes. This means that if this widget is required, that we have to update it to handle the osf_fiedlstorage field storage system. <code>Disabled</code>
   Select list datatype/annotation properties Additionally:
  1. the values are saved as integer in Virtuoso
  2. the values are saved as integer in Solr if the RDF property linked to this field is defined such that its value is an integer. This would enable integer range filters in Search
Fully operational
   Check boxes/radio buttons datatype/annotation properties  Additionally:
  1. the values are saved as integer in Virtuoso
  2. the values are saved as integer in Solr if the RDF property linked to this field is defined such that its value is an integer. This would enable integer range filters in Search
Fully operational
List (float) Autocomplete for allowed values list N/A This is using a controlled list on each local Drupal instance and local array indexes. This means that if this widget is required, that we have to update it to handle the osf_fiedlstorage field storage system. <code>Disabled</code>
   Select list datatype/annotation properties Additionally:
  1. the values are saved as float in Virtuoso
  2. the values are saved as float in Solr if the RDF property linked to this field is defined such that its value is a float. This would enable float

range filters in Search

Fully operational
   Check boxes/radio buttons datatype/annotation properties  Additionally:
  1. the values are saved as float in Virtuoso
  2. the values are saved as float in Solr if the RDF property linked to this field is defined such that its value is a float. This would enable float range filters in Search
Fully operational
Link Link datatype/annotation properties    Fully operational
Integer Text field datatype/annotation properties Additionally:
  1. the values are saved as integer in Virtuoso
  2. the values are saved as integer in Solr if the RDF property linked to this field is defined such that its value is an integer. This would enable integer range filters in Search
Fully operational
Float Text field datatype/annotation properties Additionally:
  1. the values are saved as float in Virtuoso
  2. the values are saved as float in Solr if the RDF property linked to this field is defined such that its value is a float. This would enable float range filters in Search
Fully operational
Image Image datatype/annotation properties    Fully operational
File File datatype/annotation properties    Fully operational
Entity Reference Select list object properties Note that the value that is saved in OSF is a URI. So fields that uses this field type will reference entities using entities URIs. Fully operational
   Check boxes/radio buttons object properties    Fully operational
   Autocomplete object properties    Fully operational
   Autocomplete (Tags style) object properties    Fully operational
Decimal Text field datatype/annotation properties    Fully operational
Date (Unix timestamp) Text field datatype/annotation properties    Fully operational
   Select list datatype/annotation properties    Fully operational
   Pop-up calendar datatype/annotation properties    Fully operational
Date (ISO format) Text field datatype/annotation properties Additionally:
  1. the values are saved as xsd:dateTime in Virtuoso
  2. the values are saved as DateTime in Solr if the RDF property linked to this field is defined such that its value is a date. This would enable date range filters in Search
Fully operational
    Select list datatype/annotation properties    Fully operational
   Pop-up calendar datatype/annotation properties    Fully operational
Date Text field datatype/annotation properties Additionally:
  1. the values are saved as xsd:dateTime in Virtuoso
  2. the values are saved as DateTime in Solr if the RDF property linked to this field is defined such that its value is a date. This would enable date range filters in Search
Fully operational
    Select list datatype/annotation properties    Fully operational
   Pop-up calendar datatype/annotation properties    Fully operational
Boolean Check boxes/radio buttons datatype/annotation properties Additionally:
  1. the values are saved as xsd:integer in Virtuoso (it is how it get interpreted internally, a 1 or 0 integer).
Fully operational
   Single on/off checkbox datatype/annotation properties    Fully operational

Implementation Notes

  • Some of the widgets are related to specific datatypes such as intfloatdate, etc. These datatypes are used to properly index these values, with the property types, into OSF
  • Most of the field type widget does have their own internal array of data that is used by the widget. This information need to be saved in OSF somehow. Here is how we do this, we base our example on the link_field widget:
    • The value array of the link_field widget is:
      • array('url' => '...', 'title' => '...', 'attributes' => '...')
    • Internally, within OSF, the real value is the URL part of this widget. This is the value we will index for the mapped property. However, to properly reconstruct the value required by the widget within Drupal, we have to save that structure as well. We do this by reifying the serialization of this array by using the drupal::value reification property.
      • Then, when we read the value of that field to re-construct the entity instance, we use that serialized array.
      • The value of the mapped property is used within OSF for search and other reading purposes

Contributed Modules Interaction Analysis

Views 3

The first thing to understand is that it looks like that the Views 3 doesn't play nice with the Field Storage API. In fact, its default behaviors appears to be are hardcoded with the field_sql_storage field storage system. In the following section we will outline the current state of the core Views default functionalities.

Core Default Functionalities Analysis
Functionality Notes
Adding a new field This functionality is about being able to add a new field to get displayed in the Views results. When we click to add a new field to the field, we get a list of fields we would like to add to the View. This is is created by the views_fetch_fields() function. The data processed from that function come from the views_fetch_data() function call which take it from cache. If the cache is empty  _views_fetch_data_build() is called to populate the cache.

The problem with the later function call is that by default, it checks for MySQL tables, and more particularly fields tables to create the list of possible fields to select. Since only the field_sql_storage field storage system creates tables into the MySQL database, then all the non-field_sql_storage fields are being ignored in the process.

However, new fields could be added using the hook_views_data_alter() hook. This hook has been implemented by multiple contributed modules. However, a closer look at them shows how everything is intimately tied to a relational database management system.

It may be possible to do something with this, but more analysis would be required, and particularly more testing.

Adding filtering criterias This is the same issue as with the Adding a new field functionality.

Basically, all the default core functionalities in Views 3, related to Content Type, are not working with fields that doesn't use the field_sql_storage field storage system.

Implementation Options
OSF Views

OSF Views is a Views query plugin for querying a OSF backend. It interfaces the Views 3 UI and generate OSF Search queries for searching and filtering all the content it contains. In the section above, we discussed how the default Views 3 capabilities are tied to the field_sql_storage field storage system. This is normal, but we are not stuck in this position. Views 3 design has been created such that new Views querying engine could be implemented, and used, with the Views 3 user interface. This is no different than how the Field Storage API works for example. This is exactly what OSF Views is, and this is exactly how we can use Views on all the fields that uses the osf_fieldstorage field storage system.

This is not different than what is required for the mongodb Drupal module. The mongodb Field Storage API implementation is not working with the default Views 3 functionalities neither, as shown by this old, and very minimal, mongodb Views 3 integration module.

OSF Views is already working because all the information that is defined in fields that uses the osf_fieldstorage storage system is indexed into OSF. What OSF Views does is just to expose this OSF information via the Views 3 user interface. All the fields that define the local content can be added to a OSF Views view, all the fields can participate into filters criterias, etc.

What that means is that the OSF FieldStorage module doesn't break the Views 3 module. It does not because OSF Views take care to expose that entity storage system to Views 3 via the API that this module is re-implementing.

efq_views

efq_views is another contributed module that expose the EntityFieldQuery API to Views 3. What that means is that all the Field Storage Systems that implement the EntityFieldQuery API should be able to get interfaced with Views 3 via this efq_views Views 3 querying engine.

Right now, the OSF FieldStorage module does not implement the EntityFieldQueryAPI. However, it could implement it by implementing the hook_field_storage_query() hook.

Diff

Fully operational

Revisioning

Fully operational