Archive 1.x:StructWSF Web Services Tutorial

Tutorial - 3 March 2010


 * Author:
 * Frédérick Giasson - Structured Dynamics

Copyright © 2009-2010 by Structured Dynamics LLC.

This tutorial for structWSF by Structured Dynamics LLC is licensed under a Creative Commons Attribution-Share Alike 3.0 license. The structWSF software is separately available under the Apache License, Version 2.0.

= ABOUT structWSF =

structWSF is a platform-independent Web services framework for accessing and exposing structured RDF data. Its central organizing perspective is that of the dataset. These datasets contain instance records, with the structural relationships amongst the data and their attributes and concepts defined via ontologies (schema with accompanying vocabularies).

The structWSF middleware framework is generally RESTful in design and is based on HTTP and Web protocols and open standards. The initial structWSF framework comes packaged with a baseline set of about twenty Web services in CRUD, browse, search and export and import. All Web services are exposed via APIs and SPARQL endpoints. Each request to an individual Web service returns an HTTP status and optionally a document of resultsets. Each results document can be serialized in many ways, and may be expressed as either RDF or pure XML.

In initial release, structWSF has direct interfaces to the Virtuoso RDF triple store (via ODBC, and later HTTP) and the Solr faceted, full-text search engine (via HTTP). However, structWSF has been designed to be fully platform-independent. Support for additional datastores and engines are planned. The design also allows other specialized systems to be included, such as analysis or advanced inference engines

= INTRODUCTION =

This tutorial explains how structWSF users can use the Web services endpoints to manage their data in datasets hosted on one or multiple structWSF instances. The tutorial is composed of a series of general use cases that are likely to be encountered by most users. With a series of about 20 different Web services, users are able to manage their data in an effective and secure way. At the end of this tutorial, users will have a general overview of all existing Web service endpoints to date. They will know how each Web service can interact with others and what Web service endpoint should be used for what purpose. Users will be able to import and export, partial or complete, datasets from any structWSF instance. They will know how to query the Web services, and how to interpret the resultsets returned by the Web services. Finally they will understand the security considerations related to the usage of the Web services endpoints.

There are a couple of concepts to understand before starting working on these use cases. After an overview of the 20 Web services, the next four sections introduce the general structWSF architecture, the datasets and access rights concepts, how a user should query the Web service endpoint using HTTP queries and finally how to interpret returned resultsets by the Web service endpoints. After reading these sections, the user will be able to understand the use of any structWSF Web service endpoint.

= OVERVIEW OF structWSF WEB SERVICES =

structWSF is composed of 20 Web services. In this section, we will shortly introduce each of them. However, the full documentation of each of these Web services is accessible on the Individual Web Services Documentation pages.

1. Auth Registrar: Access
The Auth Registrar: Access Web service is used to register (create, update and delete) an access for a given IP address, to a specific dataset and all the registered Web services endpoints registered to the WSF (Web Services Framework) with given CRUD (Create, Read, Update and Delete) permissions in the WSF.

2. Auth Registrar: WS
The Auth Registrar: WS Web service is used to register a Web service endpoint to the WSF (Web Services Framework). Once a Web service is registered to a WSF, it can then be used by other Web services, become accessible to users, etc.

3. Auth: Lister
The Auth: Lister Web service is used to list all of the datasets accessible to a given user, list all of the datasets accessible to a given user with all of its CRUD permissions, to list all of the Web services registered to the WSF (Web Services Framework) and to list all of the CRUD permissions, for all users, for a given dataset created on a WSF.

This Web service is used to list all the things that are registered/authenticated in a Web Service Framework network.

4. Auth: Validator
The Auth: Validator Web service is used by all other Web service endpoints to validate/invalidate all queries that are sent against them.

5. Ontology: Create
The Ontology: Create Web service is used to index in the ontological space (a place where all ontologies are indexed) within the WSF. This set of ontologies can be used by any Web service registered to the WSF. They can use the ontologies to describe things, to drive their interfaces, to infer facts, etc. Each time a new ontology is indexed, the internal ontologies structures are updated at the same time to take into account all the knowledge represented in the new ontology.

6. Dataset: Create
The Dataset: Create Web service is used to create a new dataset in a WSF (Web Services Framework). When a dataset is created, it gets described and registered to the WSF and accessible to the other Web services.

7. Dataset: Read
The Dataset: Read Web service is used to get information (title, description, creator, contributor(s), creation date and last modification date) for a dataset belonging to the WSF (Web Services Framework).

8. Dataset: Update
The Dataset: Update Web service is used to update the description of an existing dataset in a WSF (Web Services Framework).

9. Dataset: Delete
The Dataset: Delete Web service is used to delete an existing dataset in a WSF (Web Services Framework). When a dataset gets deleted, all of the information archived in it is deleted as well. There is no way to recover any data once this query is issued.

10. CRUD: Create
The CRUD: Create Web service is used to create a new instance record in a target dataset registered to a WSF (Web Services Framework). When a new instance record is created, it becomes accessible to the users that have access to this dataset.

11. CRUD: Read
The CRUD: Read Web service is used to get the description of a target instance record indexed in a dataset belonging to a WSF (Web Services Framework).

12. CRUD: Update
The CRUD: Update Web service is used to update an existing instance record indexed in a target dataset part of a WSF (Web Services Framework).

13. CRUD: Delete
The CRUD: Delete Web service is used to delete an existing instance record indexed in some target dataset of a WSF. When the instance record gets deleted, all of the information archived in the dataset is deleted as well. There is no way to recover any data once this query is issued.

14. Search
The Search Web service is used to perform full text searches on the structured data indexed on a structWSF instance. Each search query can be applied to all, or a subset of, datasets accessible by the requester. Additionally, the requester can specify the types of things he wants to search for (example: searching for all people with the search string "Bob"). All of these full text queries comply with the Lucene querying syntax.

15. Browse
The Browse Web service is used to get slices of data according to different criteria as indexed on a structWSF instance. Each browse query can be applied to all, or a subset of, datasets accessible by the requester. Additionally, the requester can specify the types of things he wants to browse (example: browsing for all people) and the attributes used to describe these things (example: browsing all the people described using the birthday attribute (so; browsing all the people which have a brithday date recorded in the dataset)).

16. SPARQL
The SPARQL Web service is used to send custom SPARQL queries against the structWSF data structure. This is a general purpose querying Web service.

17. Converter: irJSON
The Converter: irJSON Web service is used to convert irJSON data into RDF+XML, RDF+N3 or XML (the internal structWSF DTD structure), or to convert XML (internal DTD) into irJSON data. In other words, this Web service provides the import and export functionality for the irJSON format, which is a common text representation for many external datasets.

All converter Web services endpoints are used to convert multiple kinds of data and to make the bridge between a structWSF and other existing systems that understand these formats.

18. Converter: BibTeX
The Converter: BibTeX Web service is used to convert BibTeX data into RDF+XML, RDF+N3, BibTeX or XML (the internal structWSF DTD structure), or to convert XML (internal DTD) into BibTeX data. In other words, this Web service provides the import and export functionality for the BibTeX format, which is a common text representation for citations.

19. Converter: TSV/CSV
The Converter: TSV Web service is used to convert TSV/CSV data into RDF+XML, RDF+N3, TSV, CSV or XML (the internal structWSF DTD structure), or to convert XML (internal DTD) into TSV or CSV data. In other words, this Web service provides the import and export functionality for the TSV (tab separated values) and CSV (comma separated values), which are common text representations of spreadsheets, among other sources.

The TSV and CSV format used by this converter Web service is a three-column file that represent a triple. The first column of the format represents the subject of the triple, the second column the property (or predicate) of the triple and the third column represents the object of the triple.

= GENERAL structWSF ARCHITECTURE =

structWSF is based on what is known as a Web-oriented architecture (WOA), which can be defined as:

WOA = SOA + WWW + REST Concept

WOA is an architectural foundation of the Web, and is a subset of the service-oriented architectural (SOA) style, wherein discrete functions are packaged into modular and shareable elements ("services") that are made available in a distributed and loosely coupled manner. WOA uses the representational state transfer (REST) architectural style defined by Roy Fielding in his 2000 doctoral thesis; Fielding is also one of the principal authors of the Hypertext Transfer Protocol (HTTP) specification.

REST provides principles for how resources are defined and used and addressed with simple interfaces without additional messaging layers such as SOAP or RPC. The principles are couched within the framework of a generalized architectural style and are not limited to the Web, though they are a foundation to it.

REST and WOA stand in contrast to earlier Web service styles that are often known by the WS-* acronym (such as WSDL, etc.). WOA has proven itself to be highly scalable and robust for decentralized users since all messages and interactions are self-contained (convey "state").

structWSF abstracts its WOA services into simple and compound ones (which are combinations of the simple). All Web services (WS) have uniform interfaces and conventions and share the error codes and standard functions of HTTP. We further extend the WOA definition and scope to include linked data, which is also RESTful. Thus, our WOA also sits atop an [[RDF Concept|RDF] (Resource Description Framework) database ("triple store") and full-text search engine.

These Web services then become the middleware interaction layer for general access and querying ("endpoints") and for tying in external software ("clients"), portals or content management systems (CMS). This design provides maximum flexibility, extensibility and substitutability of components.

As you can notice in the schema below, the general architecture of this system is split in four main areas:


 * 1) The structWSF Web Services Framework
 * 2) CMS interacting with the structWSF
 * 3) External Users interacting with the structWSF
 * 4) External datastore systems used by the structWSF.



The Web Services Framework Middleware
The core to the system is the Web services middleware layer, or structWSF. This Web services framework (WSF) is the abstraction layer that provides the Web service endpoints and services for external use. It also provides the direct hooks into the underlying RDF triple stores and full-text search engines that drive these services. At initial release, these pre-configured hooks are to the Virtuoso RDF triple store (via ODBC, and later HTTP) and the Solr faceted text search engine (via HTTP). However, the design also allows other systems to be substituted if desired or for other specialized systems to be included (such as an analysis or advanced inference engine).

Authentication/Registration WS
The controlling Web service in structWSF is the Auth: Validator Web service (see below). The initial version uses registered IP addresses as the basis to grant access and privileges to datasets and functional Web services. Later versions may be expanded to include other authentication methods such as OpenID, keys (à la Amazon EC2), foaf+ssl or oauth. A secure channel (HTTPS, SSH) could also be included.

Other Core Web Services
The other core Web services provided with structWSF are the CRUD functional services (create - read - update - delete), import and export, browse and search, and a basic templating system. These are viewed as core services for any structured dataset.

In initial release, the import and export formats include RDF/XML, RDF/N3, RDF/Turtle, JSON and XML.

A Structured Data Foundation
Fundamentally, this "data-driven application" works because of its structured data foundation. structWSF employs an innovative design that exposes all RDF and record aspects to full-text search and is able to present contextual ("faceted") results at all times in the interface. In addition, the Virtuoso universal server provides RDF triple store and related structured data services.

The actual "driver" for the structured data system is one or more schema ("ontologies") setting all of these structured data conditions. These ontologies are also managed by the triple store. The definition of these ontologies is specified in such a way with accompanying documentation to enable new scopes or domains to easily drive the system.

Interactions with CMSs and External Clients
As so far described by the diagram above, all interactions with the system have been mediated either by Web service APIs or external endpoints, such as SPARQL.

For external clients or any HTTP-accessible system, this is sufficient. Programmatically, external clients (software) may readily interact with each structWSF Web service and obtain results via parametric requests.

However, the framework is also designed to be embedded within existing content management systems (CMSs). For this purpose, an additional layer is provided.

The architecture of the system can support interactions with standard open source CMSs or app frameworks such as Django, Drupal, Joomla!, Liferay, Ruby on Rails, or WordPress, as examples. CMS interactions first occur via specific modules or plug-ins written for that system. These are very lightweight wrappers that conform to the registry and hooks of the host CMS system. The actual modules or plug-ins provided are also geared to the management style of the governing CMS and what it offers. Each module or plug-in wrapper is a packaging decision of how to bound the structWSF Web services in a configuration appropriate to the parent CMS.

This design keeps the actual tie-ins to the CMS as a very thin wrapper layer, which can embrace an open source licensing basis consistent with the host CMS. Because all of the underlying functionality has been abstracted in the structWSF framework, licensing integrity across all deployment layers is maintained while allowing broad CMS interoperability. The design also allows networks to be established of multiple portals or nodes with different CMSs, perfect for broad-scale collaboration.

In initial release, structWSF has been linked to the Drupal CMS as the OSF-Drupal distribution. OSF-Drupal uses the contributed Drupal Organic Group module to manage user and dataset permissions.

The Central Auth: Validator Web Service
The core of structWSF is the Auth: Validator Web service. This service validates all queries sent to any Web service endpoint within structWSF.

The CMS is intended to use some of the Web services endpoints to manage users subscriptions and such. Otherwise, all users are expected to use the Web services endpoints oriented to the user.



Benefits of the structWSF Design
This design has some key benefits:


 * Broad suitability to a diversity of deployment architectures, operating systems and scales
 * Substitutability of underlying triple stores and text engines
 * Substitutability of preferred content management systems
 * Access and use of Web service endpoints independent of CMS (external clients)
 * Performant Web-oriented architecture (WOA) design
 * Common, RESTful interface for all Web services and functions in the framework
 * Easy registration of new Web services, inclusion with authorization system
 * Ability to share and interoperate data independent of client CMSs or portals
 * Use of the common lingua franca of RDF and its general advantages.

= DATASETS AND ACCESS RIGHTS =

structWSF is guided by a flexible access and rights system that is linked to individual datasets. Different datasets can have different users and permissions. When combined with an external CMS system, these access rights can also be governed by profiles (patterns of users and access rights) and assignments to one or more user groups.

In this way, data can be made public or private; can be read only or altered; or data can reserved for internal work group purposes. Moreover, by using different profiles, a smaller group of owners or curators, defined in groups or not, may have creation, update or deletion rights different than various classes of general users or the public.

This simple but elegant system works as follows. First, every Web service is characterized as to whether it supports one or more of the CRUD (create - read - update - delete) actions. Second, each user is characterized as to whether they first have access rights to a dataset and, if they do, which of the CRUD permissions they have. We can thus characterize the access and use protocol simply as A + CRUD.

Thereafter, a simple mapping of dataset access and CRUD rights determines whether a user sees a given dataset and what Web services ("tools") are presented to them and how they might manipulate that data. When expressed in standard user interfaces this leads to a simple contextual display of datasets and tools. Note that if a user is not granted access rights to a tool or dataset, they are not shown in the user interface nor participate in search or browse functions.

Here is the basic matrix of these possible assignments:

At the Web service layer, these access values are set parametrically. The system, however, is designed to more often be driven by user and group management at the CMS level via a lightweight plug-in or module layer.

Groups, Datasets & Profiles
In actuality, the simple matrix above is a bit more involved in practice. That is because each cell in the matrix above is divided into a number of possible groups or assignments: owner, group, registered user, and the general public (anonymous). The "group" assignment can be from as many different groups as desired.

Owners (or curators) generally have the broadest rights including creation, update and deletions. But, these same rights can also be enabled for trusted colleagues as members of defined groups.

Of course, if desired, any and all restrictions by action, dataset or user type can be removed. This might be desired, for example, for demos and sandbox applications.

= WEB SERVICE QUERYING =

The way to communicate with any structWSF Web service is by using the HTTP protocol. Any Web service endpoint can be accessed via its URL, which may combine an overall service name (e.g., 'crud') with its specific implementation or focus (e.g., 'read'). A URL example is:

http://yoursite.org/ws/crud/read/

The standard text reference to this example Web service then becomes:

Crud: Read Web Service

Communication Methods
As we explained in the architecture section above, all Web services endpoints are compliant with a generally RESTful architecture. This means that all queries sent to Web services endpoints, and all answers from these endpoints, are performed using the HTTP protocol. All modern programming language have an API that can be used to send HTTP queries to specific servers. In this tutorial, we will use a tool named Curl that can be installed on most operating systems (Linux, Windows, OSX, etc).

There are two HTTP methods that can be used to communicate with a Web service endpoint: GET or POST. The method to use differs from one service to another. This method is specified in the API documentation of each Web service.

Below is a description of the HTTP header that a data consumer should use to request a resultset to Web services that returns resultsets information. For example, the Crud: Read Web service will return a list of properties describing an instance record in a resultset. However, Crud: Create will only return a HTTP status (200) to say that the instance record has been successfully created, and return another status if there has been an error. In any case, Crud: Create won't return any resultset in the body of its HTTP query.

GET Method
HTTP header sent by the data consumer:

GET /ws/crud/read/?uri=http://yoursite.org/drupal/conStruct/datasets/260/resource/test2& dataset=http://yoursite.org/wsf/datasets/260/&include_linksback=true HTTP/1.1

User-Agent: curl/7.16.3 (powerpc-apple-darwin9.0) libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3 Host: yoursite.org Accept: application/rdf+xml

The HTTP header sent by back by the Web service endpoint:

HTTP/1.1 200 OK Date: Wed, 20 May 2009 15:33:16 GMT Server: Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.4 with Suhosin-Patch X-Powered-By: PHP/5.2.4-2ubuntu5.4 Content-Language: en Content-Encoding: identity Content-Type: application/rdf+xml; charset=utf-8 Content-Language: en Content-Encoding: identity Content-Length: 236

Additionally the data consumer can specify different charsets, language and encoding in its request. However only the "utf-8" charset, the "en" language and the "identity" encoding are supported by the standard baseline structWSF Web services endpoints described in this tutorial.

The actual description of the record returned by the Crud: Read Web service endpoint will appear in the body of the query sent back to the user.

POST Method
HTTP header sent by the data consumer:

POST /ws/crud/create/ HTTP/1.1 User-Agent: curl/7.16.3 (powerpc-apple-darwin9.0) libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3 Host: yoursite.org Accept: text/xml Content-Length: 223 Content-Type: application/x-www-form-urlencoded

The HTTP header sent back by the Web service endpoint is:

HTTP/1.1 200 OK Date: Wed, 20 May 2009 16:00:39 GMT Server: Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.4 with Suhosin-Patch X-Powered-By: PHP/5.2.4-2ubuntu5.4 Content-Language: en Content-Encoding: identity Content-Length: 0 Content-Type: text/xml; charset=utf-8

Additionally the data consumer can specify different charsets, language and encoding in its request. However only the "utf-8" charset, the "en" language and the "identity" encoding are supported by the standard baseline structWSF Web services endpoints described in this tutorial.

Note that the parameters to pass to the Web service endpoint are added in the body of the query.

For a complete description of HTTP/1.1 header fields, please consult the RFC 2616 document.

We will demonstrate how to send these queries to the Web service endpoints using Curl in the use case tutorials below.

= RESULTSET DATA FORMAT =

A resultset format has been developed to create and communicate the resultsets returned by any Web service endpoint (when a resultset is included in the body of a HTTP query). This format could be compared to a "pseudo" RDF format. It takes the basic contructs of a RDF triple: (1) subject, (2) predicate and (3) object, and it serializes them in JSON or XML. We have a list of simple key elements that are used to create a resultset.

The goal of any Web service is to return results. The root element of any structWSF Web service is the "resultset" element where all results in a given results document are nested. It is only used for structural purposes.

Subjects, predicates and objects can be typed. A type is represented by a URI (see below), and URIs are normally resolvable on the Web (so, they are full URLs). Prefixes are short strings used to abbreviate the URI of the types of subjects, predicates and objects of a resultset. That way, resultset files are smaller to transmit, and simpler to read.

The "entity" attribute of a prefix element defines the prefix to use in the resultset.

The "uri" attribute of a prefix element defines the full URI referred by the prefix.

When a system consumes resultset data, before doing anything with the resultset, it has to resolve the prefixes to re-create the full URIs of the entities. In the example below, you will see types that will be defined as "foaf:Person". When a data consumer encounters such a type, it has to check if the prefix "foaf:" has been defined for this resultset. If it has, then it replaces the value of the type by "http://xmlns.com/foaf/0.1/Person".

Any query to a Web service refers to a "subject" (consistent with the understanding of subject within the standard subject-predicate-object RDF triple). A subject can be referred to a record too. This subject can be a resource referred to by its ID, it can be one or multiple keywords of a search string, it can be text of one thousand words to be analyzed, or other options. The subject is a record that refers to something.

A resultset is composed of one or multiple subject(s) depending on the Web service. This means that the subject element represents the subject of a query to a Web service endpoint.

A subject has a type and a URI. The type of a subject can be seen as its kind. The URI of a subject is its unique identifier (identifier of a record).

Any Web service takes the subject of a query and processes it according to the procedures set by other input parameters to the query. The result of this processing is to relate a subject to other things (objects; see below) using different predicates (a predicate defines the relationship between those two things; you can also think of it as a verb). Any subject has zero, one or multiple predicate(s) relationships with other objects. The predicates are what defines the subject, so it is what defines the record.

Every predicate has a type. The type of a predicate can be seen as the kind of relationship between two things (a subject and an object).

Any predicate refers to one or multiple objects. An object is a thing such as a subject with a type and an optional URI. An object could be the subject of another query.

Subject and object are the same in that both refer to a "thing" (or collections of things, in which case they are a class; you can also think of them as nouns). The only difference between the two elements is the way in which they are referenced by a predicate, as either a subject or an object.

An object has a type and a URI. The type of an object can be seen as its kind. The URI of an object is its unique identifier. It is optional if the object reference is a literal, such as a string name or a number.

A special kind of object exists: rdfs:Literal. The characteristics of this kind of object will be discussed in a special section below.

Sometimes it is useful to be able to assert facts about a given triple statement .

The example below means: we have a subject that is a bibo:Document. This document has a predicate relationship umbel:isAbout with the thing, that itself is a umbel:SubjectConcept, referred as http://.../War. Basically, this triple relationship means: "I have a document that is about War".

However we can also assert a certain ratio that shows the confidence level in asserting that statement. By using the umbel:withLikelihood reification property, we can assign a confidence level regarding the "fact" (assertion) of the initial triple statement, as follows: , umbel:withLikelihood, "0. 873">.

This reification gets expressed in the XML data structure as:

So, basically, the reify element helps us to assert a fact about another fact (triple statement).

Data consumers should thus parse the XML document in this following way:

If there is a  element within the body of a   element, the data consumer must check the three parent nodes of the   element to compose the assertion fact about   comprising the three nodes of the triple.

Unique Identifiers: URIs
Nearly all resources of subject, predicate or object has a unique identifier called a URI. (Subjects and predicates must have a URI; objects most frequently do, but sometimes may optionally be assigned a literal value.)

These URIs are unique to each resource. Since these IDs are unique, if a Web service A refers to a resource X and another Web service B also refers to a resource X, then both Web services A and B refer to the same thing. This understanding must hold true for the reason that atomic Web services can easily interact together to create compound Web services.

However, sometimes, the subjects or the objects of a resultset may not have a defined URI (the attribute). If such a case happens, the consumer of this Web service data must itself define a unique identifier for that thing.

Literals
A literal is a special kind of object. Unlike any other object, a literal object can not be a subject of a predicate. (Technically, a resource could describe a literal, but the literal itself can't be described; but this fact is out of the scope of this document). An example of a resource that describes a literal is the bibo:Document use case described above.

A literal object may not also have a URI attribute.

XML Resultset Example
Resultsets can be serialized in different formats. XML and JSON are the two serialization formats used to serialize the resultset structure described above. Here is a complete resultset example that uses all elements described above. In this example, the resultset is serialized using XML. We will explain how to get a resultset with different serialization formats in another of our use case examples below.

JSON Resultset Example
Here is exactly the same resultset example as the one above, except that it has been serialized in JSON.

The major difference between the two serializations comes from their serialization rules. While the data in the two examples are identical, they are represented differently in the JSON serialization from the resultset in XML. Here is a list of differences:


 * 1) In JSON, you can't define multiple times the same key for the same object. This means that we can't have multiple "subject" elements. Because of this JSON particularity, we have to introduce a list of subjects. In this list, all  object (introduced by the curly brackets) represent a different subject.
 * 2) As you can notice, there is no "object" element. This is because they are implicit. In JSON, an object is introduced by curly brackets, this means that the object of a predicate is introduced by the curly brackets.
 * 3) Also, as you can notice, there is no mention of the special type "rdfs:Literal". This is also implicitly introduced by the JSON syntax. If the value of a predicate key is not an object (curly brackets), it means it is a literal (of type rdfs:Literal).

= GENERAL USE CASES =

The best way to understand the interaction between all existing structWSF Web service endpoints is to learn how to accomplish certain specific data management tasks. This section shows how to accomplish certain data maintenance tasks such as importing complete datasets, exporting complete datasets and searching/browsing datasets to update or delete specific records.

For this tutorial, we assume that the reader has access to a running structWSF instance with the proper credentials and permissions.

Accessing Data
As discussed in previous sections, each query to a Web service endoint of a structWSF instance is done based on the IP address of the requester. Once the requester's IP is authenticated, we check to see if she has the permissions (CRUD) to perform the requested action (Web service query) on the dataset.

In this section, we will see how a structWSF node administrator can create new accounts on an instance, and how a structWSF user can check the datasets he has access to, with what CRUD permissions.

When we install a brand new structWSF instance (see the installation guide), the only IP address that has full access to the instance is the locahost. This means that only queries sent from the server that host the structWSF instance can setup new user accounts on the instance. Once the structWSF instance is set up, the system administrator can then start creating new user accounts by using the AuthRegistrar: Access Web service endpoint.

In this section, we will see how to create and manage user accounts by using the AuthRegistar: Access and AuthLister Web services endpoints.

Managing User Accounts using AuthRegistrar: Access
The Auth Registrar: Access Web service is used to register (create, update and delete) an access for a given IP address, to a specific dataset and all the registered Web services endpoints registered to the WSF (Web Services Framework) with given CRUD (Create, Read, Update and Delete) permissions in the WSF.

To be able to use that Web service, the requester has to have full CRUD access to the http://[...]/wsf/ dataset of the target structWSF instance. This dataset is the dataset where all Web services endpoints of the network, where all the datasets and where all the user accesses are defined. By convention, only the localhost has these permissions, on that dataset, with a default installation.

Let's assume that you can send queries to a structWSF instance from a registered account that has full CRUD permissions on the http://[...]/wsf/ dataset. The first step is to try to register a new account (IP address) to a given dataset, so that he can use a series of Web services to query and manage that dataset.

In this use case, we want to:


 * 1) Register the IP address 192.168.0.1
 * 2) So that he has access to the dataset http://localhost/wsf/datasets/1/
 * 3) So that he can use the Web services endpoints: Search, Browse, CrudRead on it
 * 4) With CRUD perission: False, True, False, False

The Curl query we have to send to the AuthRegistar: Access Web service endpoint is:

curl -H "Accept: application/json" "http://localhost/ws/auth/registrar/access/" -d "registered_ip=192.168.0.1&crud=False;True;False;False&ws_uris=http://localhost/wsf/ws/search/;http://localhost/wsf/ws/browse/;http://localhost/wsf/ws/crud/read/&dataset=http://localhost/wsf/datasets/1/&action=create" -v '' Note: In these use case examples, we will always use the IP address 192.168.0.1 and the structWSF instance domain name locahost. However, you should change these values for your IP address and the domain name (or IP address) for where you are running your actual structWSF instance.''

The AuthRegistrar: Access Web service will return this HTTP header if the query has been successfully performed (or an error otherwise):

HTTP/1.1 200 OK Date: Mon, 15 Feb 2010 16:21:56 GMT Server: Apache/2.2.12 (Ubuntu) X-Powered-By: PHP/5.2.10-2ubuntu6.3 Content-Language: en Content-Encoding: identity Content-Length: 0 Content-Type: application/json; charset=utf-8

It returned a "200 OK" message. This means that the query has been successfully performed by the Web service. However, the HTTP body of this answer is empty. This is normal since users should never expect anything in the body of the AuthRegistrar: Access Web service, except if an error occured. This means that users of this Web service endpoint have to rely on the returned HTTP header to know if the query has been successfully performed by the service (200).

If the query return a 200 response, this means that any query from the IP address 192.168.0.1 to the dataset http://localhost/wsf/datasets/1/ can be performed using Web service endpoints Search, Browse and CrudRead.

Now, let's try to find that access in the sytem. For this purpose, we will use the AuthLister Web service.

curl -H "Accept: application/json" "http://localhost/ws/auth/lister/?mode=access_user&registered_ip=192.168.0.1"

'' Note: The difference between this Curl query and the previous one is that it uses the GET method instead of the POST method. Check the documentation of each Web service endpoint to know how to communicate with each of them.''

Note: In many of the examples that follow there may be a header above the code example that shows either XML, JSON or irJSON. Clicking on one of these live links will cause the code example to change to the selected serialization.

The Web service returned this as the response to the query:

XML Example
The vocabulary used in this resultset is as described in this tutorial. As you can see, this Web service will output all the accesses defined for this IP address. Each access is defined as a set of CRUD permissions, for a set of Web services endpoints to a database. The granualiry of this access system is such that you can define multiple accesses for the same IP address to the same dataset. The only thing that will change are the CRUD permissions for a given set of Web services endpoints.

Now, let's change the CRUD permissions of the access we just created. So, let's assume we had initially made a mistake; now we want full CRUD on that dataset, for this IP address, for the Browse, Search, CrudRead and CrudUpdate Web services. Let's re-use the AuthRegistrar: Access Web service to make this change to this access.

The AuthLister Web service gave us the URI (identifier) of the access we created previously. Now, we have to use this URI for our next update query to the AuthRegistrar: Access Web service.

Now, let's re-query the AuthLister Web service with exactly the same Curl query to make sure the changed have been made to the access:

XML Example
Finally, let's remove the dummy access we just created by using this Curl query to the AuthRegistrar: Access Web service:

curl -H "Accept: application/json" "http://localhost/ws/auth/registrar/access/" -d "registered_ip=192.168.0.1&action=delete_target&dataset=http://localhost/wsf/datasets/1/"

Now that we know how the access and permissions works on a structWSF instance, we will check how we can import and export data on a node, and how we can manage the data we imported, or the data we have access to.

Importing Data in a Dataset
The main purpose of a structWSF instance is to manage structured data. However, before being able to manage any data, we have to import sources of data into the instance. There are many ways to import data in a structWSF instance. Depending on the size of the data source to import, one method may be better than another. In this tutorial we will focus on importing entire datasets using the available Web services endpoints only.

The general workflow to import a data source in a structWSF dataset is as follow:


 * 1) Create a new dataset on the target structWSF instance
 * 2) (If needed) convert the initial data sourcing in one of the supported format
 * 3) RDF/XML or RDF/N3
 * 4) irJSON, BibTeX or CSV/TSV by using one of the Converter Web service
 * 5) Import slices of records (50 to 100 per query depending on the size of the records) using the CrudCreate Web service.

As you can notice in this workflow, the general idea is to convert-> import slices of the dataset. Each slice is composed of one or multiple records description. A script could easily be created to iterate over a dataset, to create these slices of data to import in the node using the CrudCreate Web service. Now, let's take a look how to perform each of these steps.

For all the remaining use cases, we will use this data source which is composed of the description of three records serialized using irJSON (see the irON specification).

Creating a New Dataset
The first step is to create the dataset in the structWSF instance. The process of creating a dataset is like creating a folder on your local desktop. You create a container where you will put record descriptions in the future. You also assign it a title, a description, and some other meta-information. The creation of a new dataset is performed using the DatasetCreate Web service.

The HTTP response will be "200 OK" if the query as been successful.

Now, lets check if the Dataset has been properly created using the DatasetRead Web service and by checking for the dataset URI we created "http://localhost/wsf/dataset/1/"

curl -H "Accept: application/json" "http://localhost/ws/dataset/read/?uri=http://localhost/wsf/datasets/1/"

The DatasetRead Web service will return the description of the dataset in the body of the HTTP response:

XML Example
The last step before importing any data into that dataset is to create an access to that dataset exactly like what we did in the section Accessing Data above. The only thing we will need to change is to add some more Web services that we will need for the examples below, and to specify the URI of the dataset we just created above. Also remember that you need to have full CRUD access on the http://[...]/wsf/ dataset in order to be able to create dataset accesses on a structWSF instance.

Converting irJSON Data into RDF/XML
Since the CrudCreate Web service only takes RDF/XML or RDF/N3 data as input, we have to convert the irJSON data in RDF/XML using the Converter: irJSON conversion Web service. This Web service takes irJSON data as input, and outputs different data formats. RDF/XML is one of the options.

curl -H "Accept: application/rdf+xml" "http://localhost/ws/converter/irjson/" -d "docmime=application%2Firon%2Bjson&document=%7B%22dataset%22%3A%7B%22linkage%22%3A%5B%7B%22linkedType%22%3A%22application%2Frdf%2Bxml%22%2C%22attributeList%22%3A%7B%22advisor%22%3A%7B%22mapTo%22%3A%22http%3A%2F%2Fpurl.org%2Fontology%2Fbkn%23advisor%22%7D%2C%22thesis%22%3A%7B%22mapTo%22%3A%22http%3A%2F%2Fpurl.org%2Fontology%2Fbkn%23thesis%22%7D%2C%22mgpWebPage%22%3A%7B%22mapTo%22%3A%22http%3A%2F%2Fpurl.org%2Fontology%2Fbkn%23mgpWebPage%22%7D%2C%22degree%22%3A%7B%22mapTo%22%3A%22http%3A%2F%2Fpurl.org%2Fontology%2Fbkn%23degree%22%7D%2C%22institution%22%3A%7B%22mapTo%22%3A%22http%3A%2F%2Fpurl.org%2Fontology%2Fbkn%23institution%22%7D%2C%22date%22%3A%7B%22mapTo%22%3A%22http%3A%2F%2Fpurl.org%2Fdc%2Fterms%2Fdate%22%7D%2C%22title%22%3A%7B%22mapTo%22%3A%22http%3A%2F%2Fpurl.org%2Fdc%2Fterms%2Ftitle%22%7D%2C%22student%22%3A%7B%22mapTo%22%3A%22http%3A%2F%2Fpurl.org%2Fontology%2Fbkn%23student%22%7D%2C%22name%22%3A%7B%22maptTo%22%3A%22http%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2Fname%22%7D%7D%2C%22typeList%22%3A%7B%22Person%22%3A%7B%22mapTo%22%3A%22http%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2FPerson%22%7D%2C%22Thesis%22%3A%7B%22mapTo%22%3A%22http%3A%2F%2Fpurl.org%2Fontology%2Fbibo%23Thesis%22%7D%7D%7D%2C%22http%3A%2F%2Flocalhost%2Fdrupal%2Fbibjson%2Firon_linkage.json%22%5D%2C%22id%22%3A%22http%3A%2F%2Flocalhost%2Fwsf%2Fdatasets%2F119%2Fresource%2F%22%7D%2C%22recordList%22%3A%5B%7B%22id%22%3A%22101%22%2C%22type%22%3A%22Person%22%2C%22name%22%3A%22JamesDouglasWatson%22%2C%22advisor%22%3A%7B%22ref%22%3A%22%40268%22%2C%22name%22%3A%22ClairGeorgeMaple%22%7D%2C%22thesis%22%3A%7B%22ref%22%3A%22%40thesis%2F101%22%7D%2C%22mgpWebPage%22%3A%22http%3A%2F%2Fwww.genealogy.ams.org%2Fid.php%3Fid%3D101%22%7D%2C%7B%22id%22%3A%22268%22%2C%22type%22%3A%22Person%22%2C%22prefLabel%22%3A%22ClairGeorgeMaple%22%2C%22name%22%3A%22ClairGeorgeMaple%22%2C%22student%22%3A%5B%7B%22ref%22%3A%22%40101%22%2C%22name%22%3A%22JamesWatson%22%7D%5D%2C%22thesis%22%3A%7B%22ref%22%3A%22%40thesis%2F268%22%7D%2C%22mgpWebPage%22%3A%22http%3A%2F%2Fwww.genealogy.ams.org%2Fid.php%3Fid%3D268%22%7D%2C%7B%22id%22%3A%22thesis%2F268%22%2C%22type%22%3A%22Thesis%22%2C%22degree%22%3A%22D.Sc.%22%2C%22institution%22%3A%22CarnegieMellonUniversity%22%2C%22date%22%3A%221948%22%2C%22title%22%3A%22TheDirichletProblem%3ABoundsataPointfortheSolutionandItsDerivative%22%7D%5D%7D"

The converted irJSON document into RDF/XML is returned in the body of the HTTP response from the structWSF instance:

Importing Records
Once we have the RDF/XML representation to import into the node, we now can take that converted document and import it into the system by feeding it to the CrudCreate Web service endpoint:

To create this query, we used the URI of the dataset we created above to tell the CrudCreate Web service where to import all the records. We can include one or multiple records in the document we give as input to the CrudCreate Web service. It is for that reason that we can split a big data source into multiple slices that we can then import in a structWSF instance using the CrudCreate Web service.

Finally, you can easily notice that we can create somewhat complex workflows by taking what a given Web service endpoint outputs, and by using it as the input of another Web service. Here is the final workflow that we created to import a data source into a structWSF instance:



Note that we would have to perform steps 2 and 3 multiple times if we have to split the data source into multiple slices.

Exporting a Dataset
The next logical step is to try to export the dataset we just imported. That way, we will make sure that we indeed correctly imported the dataset in the first place. The same steps can be performed to export any dataset hosting on any structWSF instance for which you have access.

There are multiple ways to export data from a structWSF node. In this section, we will use a simple one where you don't need any additional knowledge than querying the Web service endpoint, just like what we have done so far. There are other ways, such as using the SPARQL Web service endpoint, but you would need some SPARQL knowledge to be able to use the service to export data. So, to export data slices from a dataset you need two Web services: Browse and CrudRead.

The Browse Web service returns slices of records. The full description of the records is returned, but not the reification statements. Also, the Browse Web service is limited in the number of content types it can return. It is why we will use the CrudRead Web service to demonstrate this use case. We will use the Browse Web service to get slices of a dataset, and then we will use the CrudRead Web service to get the full description of records, in the format we specify.

What is also interesting with the Browse Web service is that you can use it to export filtered slices of records. You can leverage the attributes types and datasets parameters to filter a dataset with some attribute, type and dataset filtering criteria. This way, you could only export people data, or records that have a street address, etc.

Now, let's use the Browse Web service to get the IDs of the records we previously imported in the structWSF instance. It is quite simple since we only have 3 records in the dataset. However, the strategy is exactly the same as the one we used to import big datasets: we have to do the exportation slice by slice by leveraging the items and the page parameters of the Browse Web service.

curl -H "Accept: application/json" "http://localhost/ws/browse/" -d "datasets=http://localhost/wsf/datasets/1/&items=10&page=0&include_aggregates=true" -v

XML Example
What we want from this resultset are the IDs of the records. From these, we will create a list of IDs to use to query the CrudRead Web service endpoint:

XML Example
As you can notice with these resultsets, the same information can be formatted in multiple different formats/serializations. The irJSON that we initially used is the same except that it has been automatically generated by the system. Only some serialization shortcuts have been dropped, but all the information is there, ready to be parsed with a irJSON parser.

Now, let's take look at the dataset export workflow:



The dataset slices are coming from the Browse Web service. Then, the CrudRead Web service gets all information (description + reification statements + linkbacks) about all of these records. Finally, we will export data in JSON, XML directly from the CrudRead Web service, or will use the Converter: irJSON to convert the CrudRead resultset into irJSON. Such a workflow can easily be implemented in a single script that does these steps to export full datasets from structWSF nodes, in any supported format.

Searching a Dataset
There are basically three ways to search records in a dataset; it is by using the Web services Browse, Search or SPARQL. We saw how it was working with the Browse Web service above. Now we will perform full-text searches (searches with keywords) by using the Search Web service. We won't discuss the SPARQL Web service in this tutorial because of the SPARQL knowledge that is needed.

The Search Web service let you run keyword searches on the dataset you imported. Additionally, you can filter these results with some attributes, types or datasets criteria. This is a really flexible service that can be used just like the Browse Web service.

Here is a simple search query for the keyword "George" that we will query against the dataset we created in this tutorial:

curl -H "Accept: application/json" "http://localhost/ws/search/" -d "query=George&datasets=http://localhost/wsf/datasets/1/" -v

XML Example
As you can notice, two records have been returned by the service: Clair George Maple, the person, and an article he wrote. Like what we have done with the exportation use case above, we could use the CrudRead Web service endpoint to get more information about these records (linkbacks, records in a different format, etc).

Updating Records in a Dataset
But what happens if you made a mistake in one of the records description you previously imported in a dataset? The only thing you have to do is to correct the description of tha record, and to use the new description as input of the CrudUpdate Web service endpoint. With this service, you can update one or multiple records at the same time. Like the CrudCreate Web service, you will use a valid input file that describes all the records to import/update in a dataset.

The CrudUpdate Web service is like the CrudCreate Web service, but it does some more processing to update record descriptions according to what you use as input. This service won't simply remove the existing record and replace it by the new one.

Let's take a look with a real world example. Let's say that James Douglas Watson contacted us to mention that we shouldn't refer to him with his middle name "Douglas". Since we want to make sure to have the latest version of this record from the structWSF node, we will first use the CrudRead Web service to get the complete description of that record.

Now that we removed James' middle name, let's use this new record document and use it as input to the CrudUpdate Web service endpoint:

curl -H "Accept: application/rdf+xml" "http://localhost/ws/crud/update/" -d "dataset=http://localhost/wsf/datasets/1/&mime=application%2Frdf%2Bxml&document=%3C%3Fxml+version%3D%221.0%22%3F%3E%3Crdf%3ARDF++xmlns%3Aowl%3D%22http%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%22xmlns%3Ardf%3D%22http%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%22xmlns%3Ardfs%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%22xmlns%3Awsf%3D%22http%3A%2F%2Fpurl.org%2Fontology%2Fwsf%23%22xmlns%3Ans0%3D%22http%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%22xmlns%3Ans1%3D%22http%3A%2F%2Fpurl.org%2Fontology%2Fbkn%23%22xmlns%3Ans2%3D%22http%3A%2F%2Flocalhost%2Fwsf%2Fontology%2Fproperties%2F%22%3E%3Cns0%3APerson+rdf%3Aabout%3D%22http%3A%2F%2Flocalhost%2Fwsf%2Fdatasets%2F119%2Fresource%2F101%22%3E%3Cns1%3Aadvisor+rdf%3Aresource%3D%22http%3A%2F%2Flocalhost%2Fwsf%2Fdatasets%2F119%2Fresource%2F268%22+%2F%3E%3Cns1%3Athesis+rdf%3Aresource%3D%22http%3A%2F%2Flocalhost%2Fwsf%2Fdatasets%2F119%2Fresource%2Fthesis%2F101%22+%2F%3E%3Cns1%3AmgpWebPage%3Ehttp%3A%2F%2Fwww.genealogy.ams.org%2Fid.php%3Fid%3D101%3C%2Fns1%3AmgpWebPage%3E%3Cns2%3Aname%3EJames+Watson%3C%2Fns2%3Aname%3E%3C%2Fns0%3APerson%3E%3C%2Frdf%3ARDF%3E"

If this command is successful, you will receive a "200 OK" answer from the endpoint. If we re-run the CrudRead command above, we can see that the modification has been made in the system:

= FINAL WORD =

This tutorial demonstrated the usage of structWSF Web service endpoints with a few key use cases. These examples show the many possibilities designed into these Web services. We also saw some general workflow patterns of Web service interaction. You should be able to take these workflows and to modify them to create new ones in order to implement your specific workflows and needs in your own software and services.

= REFERENCES =