Single Authoring Environment: the OSF FieldStorage Connector

Introduction
This document is a technical overview of the OSF FieldStorage module. It is complementary to the Saving Local Content in OSF user manual page.

The goal of a single authoring Drupal environment is to treat all content, local and external, the same such that all the portals can have access to the same information via the same set of APIs into a single OSF instance. "Single authoring" is a shorthand for indicating that all content, local and external, to be authored and managed in the same way within Drupal. The realization of a "single authoring environment" requires that "bi-directionality" be in place that enables changes made in Drupal to be reflected in the OSF instance and changes made in the OSF instance to be reflected in Drupal. Thus, "single authoring" and "bi-directionality" should be assumed as talking largely about the same set of requirements.

OSF Entities and OSF FieldStorage
The goal of  and   is the same: using OSF content, directly in Drupal, using Drupal's core API. This means that everything that is edited within Drupal gets automatically modified into OSF without any synchronization mechanism. However, though their goal is the same, their usage and purpose (raison d'être) is different.

is used to expose OSF datasets content to Drupal as a new Entity type: Entity Resource Type. This new entity type expands what can be done with entities by Drupal developers and administrators. More OSF/Ontologies features are exposed via this new entity type. For example, the forms created from this new entity type are more flexible and validated according to the description of the fields and bundles in the ontologies. It also automatically enables autocompletion mechanisms for the values of these fields depending on the ontology metadata as well.

is used as a new storage system for all the Content Types. By default, the entity storage system for Content Type entities in a standard Drupal distribution is MySQL. However, by using, we are switching the entity storage system of the content types from MySQL to OSF. This means that every time a content type is being read, saved, modified or deleted, it gets read, saved, modified or deleted in OSF and not in MySQL. However, exactly the same experience should be provided to the Drupal users and administrators: the UI experience should be exactly the same. The user interface should not be changed via altering hooks.

The main distinctions between  and the   modules are: Additionally: Another big difference between  and   is that   has to comply with the content that is in OSF. This means that at the Drupal level, we only change the references to the properties and classes that changed into OSF. We synchronize the ontologies as opposed to synchronizing the data. With, it is the opposite situation: it is OSF's content that has to comply with Drupal's content type descriptions. This means that if the Content Type schema (bundles and fields) changes, then we have to accordingly synchronize the data mapping changes within OSF.
 * changes the entity storage system of content types, however the content types are exactly the same, and their UI remains unchanged. The only thing that changes with this module is that all the "local" information of these content types get accessed from OSF instead of the local MySQL database.
 * creates a new kind of entity which exposes more OSF features via the Entity's class instance. The user experience is a bit different than what they would experience with conventional content types because more OSF related features are exposed. Also,  is used to synchronize all of OSF classes and properties as bundles and fields in Drupal. These synchronized bundles and fields are then used by   to map the content type bundles and fields to RDF classes and properties (exactly like the RDF UI sub-module of the RDFx Drupal project).
 * only interface local content (Content Types) that uses fields which uses the  field storage system. This content is saved to OSF, and is interfaced with Drupal via the   module
 * exposes external data contained within OSF datasets natively within Drupal
 * This means that if we have two Drupal instances A and B which uses the same OSF instance. All the local content managed by  from the instance A can be exposed as external data on instance B by using

This difference exists because we don't want to change any behaviors related to the content types. This is the reason why OSF content has to comply with the content type descriptions when we use.

The cost of synchronizing data instead of ontologies is much greater computationally. As a general rule of thumb,  should be the module of choice in most instances;   is best reserved for necessary Content Types (like the ones required to use specific modules that don't work with Resource Type Entities).

Architecture
Here is the general architecture for the  module. In Drupal, all Content Types Entities use the Field Storage API to read, create, update and delete content into the default storage system. The default storage system that is used in a vanilla Drupal instance is MySQL. Each content type is a bundle and each bundle has one or multiple fields attached to it. In Drupal 7, each field has its own storage system. This means that a single content type item can have fields that use different storage systems.

The following schema shows how the Drupal Field Storage API Works, and shows the flexibility that resides into the fields, and how multiple fields, part of the same bundle, can use different storage systems:

Then, by default, on a vanilla Drupal instance, the same bundle with the same fields would look like this: What the  module does, is to change Drupal's fields configuration to result in the following interaction: This other schema shows you the interaction between Drupal's core API,  and the OSF web services endpoints: Each time a content type form is generated, it calls the Field Storage API which invoke the hooks implementations into the  module which calls the   web service endpoint which then populates the content type entity instance with the proper information to display within the form that gets edited (except if the form is created to create a new entity).

If the form is used to create or update an entity, then  and   get called. If the "delete" button is used to delete this content type entity, then the  web service endpoint is called.

Reading Content Type
What the  function does is:
 * 1) User loads a content type entity page, or loads an edit form that get populated with the entity's value(s)
 * 2) a lot of internal drupal form processing functions get called     
 * 3)   gets called
 * 4)   gets called from
 * 5) The default entity controller gets called (the one used to load content type entities). If a Node entity is being loaded, then the   is called
 * 6) This controller calls the   function to "attach" data to the entity instance
 * 7) The Field Storage API is then used to get data about this entity from the different storage system(s) to create the actual entity instance to return with
 * 8)   get called by the   function to load the data from the storage system(s)
 * 9)   get invoked by
 * 1) Check if the current entity version needs to be loaded. If a revision is being requested, it takes the good revision version of that entity
 * 2) It queries the   OSF web service endpoint to get the description of the entity to load
 * 3) It reads the resultset and creates the entity class instance using information describing that entity
 * 4) Since the entity that is being populated is passed by reference in the hook, then nothing gets returned.

Creating/Updating (saving) Content Type
If  is installed on a Drupal instance, the following internal functions are called when the user clicks the "save" button in a Content Type form: What the  function does is:
 * 1) User clicks "save"
 * 2) a lot of internal drupal form processing functions get called     
 * 3)   gets called
 * 4)   gets called from
 * 5)   gets called from  . This function invokes the   for each storage type that is used for the fields that compose the content type form.
 * 6)   gets invoked by
 * 1) Check if the content type gets created or updated. If it is created,   creates the record using the   OSF web service endpoint. If it is updated, then it uses the   OSF web service endpoint to update its description
 * 2) It calls   to serialize the content type into RDF according to the RDF mapping that has been performed between the content type fields, and the ontologies properties and classes
 * 3) It use this RDF serialization to send to the   or   web service endpoint to save information about this entity
 * 4) If a save error occurs, the error is reported to the user and logged into the log
 * 5) Otherwise if the entity gets updated, its previously cached entity instance gets cleared from the cache.

Deleting Content Type
If  is installed on a Drupal instance, the following internal functions are called when the user clicks the "delete" button in a Content Type form: What the  function does is:
 * 1) User clicks "delete"
 * 2) a lot of internal drupal form processing functions get called
 * 3)   get called
 * 4)   get called from
 * 5)   get called from  . This function will invoke the   for each storage types that are used for the fields that compose the content type form.
 * 6)   get invoked by
 * 1) It uses the   OSF web service endpoint to delete the entity's description from the storage system.

Synchronization Usecases
Similar to the default MySQL  system, we have to take into account a few synchronization usecases when dealing with the   storage system for the Drupal content types.


 * What happens when a field gets delete in a content type?
 * This field is marked as  in the   table
 * The field's data remains in OSF
 * The field's data is  not populated when entities that were using this field are loading using
 * This means that even if the data is not changed in OSF, the expected behavior is experienced in Drupal
 * What happens when a field's RDF mapping changes for a new property?
 * This field is marked as  in the   table.
 * The field's data remains in OSF, but uses the old RDF mapping
 * The field's data is populated when entities that were using this field are loading using  using the new property
 * This happens because  takes care of this situation. It checks if there is a pending operation related to this field, if there is, then it does the mapping automatically until the operation get changed, in bulk, within OSF
 * This means that even if the data is not changed in OSF, the expected behavior is experienced in Drupal when using the Entity API
 * This means that when querying OSF directly, different behaviors may happen until the bulk synchronization happened
 * What happens when a bundle's type RDF mapping changes for a new one?
 * This bundle is marked as  in the   table
 * The bundle's data remains in OSF, but uses the old RDF mapping
 * The bundle's type is populated when entities that were using this field are loading using  using the new class
 * This means that even if the data is not changed in OSF, the expected behavior is experienced in Drupal when using the Entity API
 * This means that when querying OSF directly, different behaviors may happen until the bulk synchronization happened
 * What happens when a new field is added?
 * Nothing happens. Since this is a new field, there is necessarily no data for this field in OSF, so we just wait until people start using this new field to commit new data in OSF

Tables
Here are the new tables that have to be created in order to support the bulk synchronization heuristic outlined in the section below.

osf_fieldstorage_pending_opts_fields
The possible operations can be:
 * changed: which means that a rdfmapping changed
 * deleted: which means that a field/property/predicate got deleted


 * 1) This row tells the system that the field instance image, of the bundle article, gets its rdfmapping changed from   to
 * 2) This row tells the system that the field instance publisher, on the bundle article, gets deleted

osf_fieldstorage_pending_opts_bundles
The only possible operation is:
 * changed: which means that a rdfmapping changed

Synchronization Heuristic
When we want, we apply all these changes, in bulk, within OSF according to these operations.

Every time a RDF Mapping gets modified, or a field gets created or deleted, these changes appear into one of the two tables specified above. Then, eventually a bulk synchronization operation will be run to synchronize OSF content according to the Content Type changes that have been specified in Drupal. The advantage of this algorithm is that even if the process breaks or is interrupted in the middle, we can re-run it to finish the changes without losing any information, or without forgetting about any changes that have been specified in Drupal. Depending on the scenario, synchronization may take a bit longer because more  queries would be sent, but the process would be safer because the process can be resumed anytime.
 * 1) Create the internal structure of changes from the   table (un-executed operations)
 * 2) For each un-executed change:
 * 3) Get 20 records within the local content dataset from the   endpoint. Filter the results to get only the ones that would be affected by the current change
 * 4) Do until the Search query returns 0 results
 * 5) For each record within that list
 * 6) Apply the current change using the   class's API
 * 7) Save that changed record into OSF using the   web service endpoint
 * 8) When the Search query returns 0 results, it means that this change got fully applied to OSF. We then mark this change as executed.
 * 9) Create the internal structure of changes from the   table (un-executed operations)
 * 10) For each un-executed change:
 * 11) Get 20 records within the local content dataset from the   endpoint. Filter the results to get only the ones that would be affected by the current change
 * 12) Do until the   query returns 0 results
 * 13) For each record within that list
 * 14) Apply the current change using the   class's API
 * 15) Save that changed record into OSF using the   web service endpoint
 * 16) When the   query returns 0 results, it means that this change got fully applied to OSF. We then mark this change as executed.

Field Types and their Widgets Analysis
These are field widgets analysis to check if all the core fields widgets are properly working with the  module.

Implementation Notes

 * Some of the widgets are related to specific datatypes such as  ,  , , etc. These datatypes are used to properly index these values, with the property types, into OSF
 * Most of the field type widget does have their own internal array of data that is used by the widget. This information need to be saved in OSF somehow. Here is how we do this, we base our example on the  widget:
 * The value array of the  widget is:
 * Internally, within OSF, the real value is the URL part of this widget. This is the value we will index for the mapped property. However, to properly reconstruct the value required by the widget within Drupal, we have to save that structure as well. We do this by reifying the serialization of this array by using the  reification property.
 * Then, when we read the value of that field to re-construct the entity instance, we use that serialized array.
 * The value of the mapped property is used within OSF for search and other reading purposes
 * The value of the mapped property is used within OSF for search and other reading purposes

Views 3
The first thing to understand is that it looks like that the Views 3 doesn't play nice with the Field Storage API. In fact, its default behaviors appears to be are hardcoded with the  field storage system. In the following section we will outline the current state of the core Views default functionalities.

Core Default Functionalities Analysis
Basically, all the default core functionalities in Views 3, related to Content Type, are not working with fields that doesn't use the  field storage system.

OSF Views
OSF Views is a Views query plugin for querying a OSF backend. It interfaces the Views 3 UI and generate OSF  queries for searching and filtering all the content it contains. In the section above, we discussed how the default Views 3 capabilities are tied to the  field storage system. This is normal, but we are not stuck in this position. Views 3 design has been created such that new Views querying engine could be implemented, and used, with the Views 3 user interface. This is no different than how the Field Storage API works for example. This is exactly what OSF Views is, and this is exactly how we can use Views on all the fields that uses the  field storage system.

This is not different than what is required for the  Drupal module. The mongodb Field Storage API implementation is not working with the default Views 3 functionalities neither, as shown by this old, and very minimal, mongodb Views 3 integration module.

OSF Views is already working because all the information that is defined in fields that uses the  storage system is indexed into OSF. What OSF Views does is just to expose this OSF information via the Views 3 user interface. All the fields that define the local content can be added to a OSF Views view, all the fields can participate into filters criterias, etc.

What that means is that the  module doesn't break the Views 3 module. It does not because OSF Views take care to expose that entity storage system to Views 3 via the API that this module is re-implementing.

efq_views
is another contributed module that expose the EntityFieldQuery API to Views 3. What that means is that all the Field Storage Systems that implement the EntityFieldQuery API should be able to get interfaced with Views 3 via this  Views 3 querying engine.

Right now, the  module does not implement the EntityFieldQueryAPI. However, it could implement it by implementing the  hook.

Diff
Fully operational

Revisioning
Fully operational