Archive 2.x:Search/2

The Search Web service is used to perform full text searches on the structured data indexed on a OSF Web Service instance. A search query can be as simple as querying the data store for a single keyword, or to query it using a series of complex filters. Each search query can be applied to all, or a subset of, datasets accessible by the requester. All of the full text queries comply with the Lucene querying syntax.

Each Search query can be filtered by these different filtering criteria:


 * 1) Type of the record(s) being requested
 * 2) Dataset where the record(s) got indexed
 * 3) Presence of an attribute describing the record(s)
 * 4) A specific value, for a specific attribute describing the record(s)
 * 5) A distance from a lat/long coordinate (for Web Service geo-enabled OSF Web Service instance)
 * 6) A range of lat/long coordinates (for Web Service geo-enabled OSF Web Service instance)

Developers communicate with the Search Web service using the HTTP POST method. You may request one of the following mime types: (1) text/xml, (2) application/rdf+xml, (3) application/rdf+n3 or (4) application/json. The content returned by the Web service is serialized using the mime type requested and the data returned depends on the parameters selected.

Version
This documentation page is used for the version 2 of this endpoint. Check at the top of this page to see the documentation pages for the other versions of this endpoint.

Usage
This Web service is intended to be used to perform full text searches, and filtered searches, on all the datasets hosted on a OSF Web Service instance.

Web Service Endpoint Information
This section describes all the permissions you need in the WSF (Web Service Framework) to send a query to this Web service endpoint, and it describes how to access it.

To access this Web service endpoint you need the proper CRUD (Create, Read, Update and Delete) permissions on a specific graph (dataset) of the WSF. Without the proper permissions on this graph you won't be able to send any queries to the endpoint.

Needed registered CRUD permission:


 * Create: False
 * Read: True
 * Update: False
 * Delete: False

As shown on the graph URI:


 * URIs of the datasets to be queried

Here is the information needed to communicate with this Web service's endpoint. Descriptions of the parameters are included below.

Note: if a parameter has a default value, the requester can omit it and the default value will be used. Also, some baseline Web services may not offer other values than the default.

HTTP method:


 * POST

Possible "Accept:" HTTP header field value:


 * text/xml (structXML)
 * application/json (structJSON)
 * application/rdf+xml (RDF+XML)
 * application/rdf+n3 (N3/Turtle)
 * application/iron+json (irJSON)
 * application/iron+csv (commON)

URI:


 * http://[...]/ws/search/ ?query=param1&types=param2&datasets=param3&attributes=param4&attributes_boolean_operator=param5&include_attributes_list=param6&items=param7&page=param8&inference=param9&include_aggregates=param10&aggregate_attributes=param11&aggregate_attributes_object_type=param12&aggregate_attributes_object_nb=param13&distance_filter=param14&range_filter=param15&registered_ip=param16&interface=param17&lang=param18&sort=param19&results_location_aggregator=param20&extended_filters=param21&types_boost=param22&datasets_boost=param23&attributes_boost=param24&spellcheck=param25

URI dynamic parameters description:

Note: All parameters have to be URL-encoded


 * param1. Full text query. This query should comply with the Lucene Querying Syntax.
 * param2 (default: all). List of types of the records to be searched. Each type is separated by the ";" character. an example of such a list is: "type-a;type-b;type-c" meaning: I want to search for all the records with these types.
 * param3 (default: all). List of dataset URIs to be searched. Each dataset URI is separated by the ";".
 * param4.' (default: all'). List of filtering attributes (property) of (encoded) URIs separated by ";". Additionally, the URI can end with a (un-encoded) double-colon "::". What follows this double colons is a possible value restriction to be applied as a filter to this attribute to perform attribute/value filtered searches. The |Lucene query syntax can be used for that filtering value. The value also has to be encoded. An example of this "attribute" parameter is: "http%3A%2F%2Fsome-attribute-uri::some%2Bfiltering%2Bvalue". There is a special markup used with the   attribute when the attribute/value filtering is used in this parameter. It is the double stars " " that introduces an auto-completion behavior on the prefLabel core attribute. It should be used like: " "; this will tells the search endpoint that the requester is performing an auto-completion task. That way, the endpoint will ensure that the autocompletion task can be performed for more than one word, including spaces. If the target attribute is defined in the ontology with the   datatype in its range, then   queries can be used in this filter. If a single date is specified, such as , then all the records from that date until now will be returned by the query. If a range of date is specified such as  , then all the records between these two dates will be returned. A range of dates has to be between double brackets. Also, the seperator of the two dates has to be   (space, the word "to" and another space). The format of a date description is about any English textual datetime description. If the target attribute is defined in the ontology with the   or the   datatype in its range, then   queries can be used in this filter. If a single number is specified, such as  , then all the records with that attribute/value will be returned. If a range of numbers is specified such as  , then all the records between these two numbers will be returned. A range of numbers has to be between double brackets. Also, the seperator of the two dates has to be   (space, the word "to" and another space). When a range is defined for an attribute/value filter, the star character  can be used to denote "any" (so, any number, any date, etc) like.
 * param5. (default: and). Tells the endpoint what boolean operator to use ("or" or "and") when doing attribute/value filtering. One of:
 * "or": Use the OR boolean operator between all attribute/value filters. This means that if the user filter with 3 attributes, then the returned records will be described using one of these three.
 * "and": Use the AND boolean operator between all attribute/value filters. this means that if the user filter with 3 attributes, then the returned records will be described using all the three. This parameter affects all the attribute/value filters.
 * param6. (optional) A list of attribute URIs to include into the resultset. Sometime, you may be dealing with datasets where the description of the entities are composed of thousands of attributes/values. Since the Search web service endpoint returns the complete entities descriptions in its resultsets, this parameter enables you to restrict the attribute/values you want included in the resultset which considerably reduce the size of the resultset to transmit and manipulate.  Multiple attribute URIs can be added to this parameter by splitting them with ";". If "none" is specified for this parameter, only the "uri" and the "type" of the results will be returned. If one or more property URI(s) is specified for this parameter, then these properties, the "uri", the dataset provenance and the "type" will be returned for this search query.
 * param7. (default: 10)). The number of items to return in a single resultset
 * param8. (default: 0). The offset of the resultset to return. By example, to get the item 90 to 100, this parameter should be set to 90.
 * param9. (default: on). One of:
 * "on": Inference is enabled
 * "off": Inference is disabled
 * param10.(default: false)'' One of:
 * "true": Aggregation data included in the resultset
 * "false": Aggregation data not included in the resultset
 * param11. Specify a set of attributes URI for which we want their aggregated values. The URIs should be url-encoded. Each attribute for which we want the aggregated values should be separated by a semi-colon ";". This is used to get a list of values, and their counts for a given attribute.
 * param12. (default: literal). Determines what kind of object value you are want the search endpoint to return as aggregate values for the list of attributes for which you want their possible values. This list of attributes is determined by the  parameter.
 * "literal": The aggregated value returned by the endpoint is a literal. If the value is a URI (a reference to some record), then the literal value will be the preferred label of that referred record.
 * "uri": If the value of the attribute(s) is a URI (a reference to some record) then that URI will be returned as the aggregated value.
 * "uriliteral": If the value of the attribute(s) is a URI (a reference to some record) then that URI and its preferred label will be returned as the aggregated value.
 * param13. (default: 10). Determines the number of value to aggregate for each  for this query. If the value is , then it means that all possible values for the target   have to be returned.
 * param14. The distance filter is a series of parameter that are used to filter records of the dataset according to the distance they are located from a given  point. The values are separated by a semi-column ";". The format is as follow:  . The distanceType can have two values   or  :   means that the distance specified is in kilometers and   means that the distance specified is in miles. An example is: , which means getting all the results that are at maximum 5 kilometers from the lat/long position.
 * param15. The range filter is a series of parameter that are used to filter records of the dataset according to a rectangle bounds they are located in given their  position. The values are separated by a semi-column ";". The format is as follow:  . Returned results will be compromised in that region.
 * param16.Target IP address registered in the WSF.
 * param17. Source interface used for this web service query. The interface is a different way to process a query (different algorithms, different data management system, etc. The default interface is 'default'
 * param18. (default: en) Language of the records to be returned by the search endpoint. Only the textual information of the requested language will be returned to the user. If no textual information is available for a record, for a requested language, then only non-textual information will be returned about the record.
 * param19. Sorting criterias for this query. Sort can be used for "type", "dataset", "uri", "preflabel", "score" or any other url-encoded attribute URIs that are defined with a maximum cardinality of 1. Sorting fields needs to be followed by a space character and a direction "desc" or "asc". Multiple sorting criterias can be added by splitting them with ";". Here is an example of query using sort to sort by type: "type desc". Here is an example of sort that sort by type and dataset: "type desc; dataset asc". Here is an example of a sort that sort with a custom attribute: "http%3A%2F%2Fpurl.org%2Fontology%2Firon%23prefURL desc". By default the sorting order is "asc".
 * param20. Specify a lat/long location where all the results should be aggregated around. For example, if we have a set of results compromised within a region. If we don't want the results spread everywhere in that region, we have to specify a location for this parameter such that all results get aggregated around that specific location within the region. The value should be: "latitude,longitude". By example: "49.92545999127249,-97.14934608459475"
 * param21. Extended filters are used to define more complex search filtered searches. This parameter uses a more complex syntax which enable the grouping of filter criterias and the usage of the AND, OR and NOT boolean operators. The grouping is done with the parenthesis. Each filter is composed of a url-encoded attribute URI to use as filters, followed by a colomn and the value to filter with. The full lucene syntax can be used to define the value to filter. If all values are required, the "*" (star) operator should be used as the value. If the value of an attribute needs to be considered a URI, then the "[uri]" syntax should be added at the end of the attribute filter like: "http%3A%2F%2Fpurl.org%2Fontology%2Ffoo%23friend[uri]:http%3A%2F%2Fbar.com%2Fmy-friend-uri". That way, the value of that attribute filter will be handled as a URI. There are a series of core attributes that can be used without specifying their full URI: dataset, type, inferred_type, prefLabel, altLabel, lat, long, description, polygonCoordinates, polylineCoordinates and located in. The extended filters are not a replacement to the attributes, types and datasets filtering parameters, they are an extension of it. Subsequent filtering criterias can be defined in the extended filtering parameter. The resolution logic by the Search endpoint is: attributes AND datasets AND types AND extended-filters. An example of such an extended query is: (http%3A%2F%2Fpurl.org%2Fontology%2Firon%23prefLabel:cancer AND NOT (breast OR ovarian)) AND (http%3A%2F%2Fpurl.org%2Fontology%2Fnhccn%23useGroupSignificant[uri]: (http%3A%2F%2Fpurl.org%2Fontology%2Fdoha%23liver_cancer OR http%3A%2F%2Fpurl.org%2Fontology%2Fdoha%23cancers_by_histologic_type)) AND dataset:"file://localhost/data/ontologies/files/doha.owl". Note: both the URI and the value (all kind of values: literals and URIs) need to be URL encoded before being sent to the Search endpoint.
 * param22. Modifying the score of the results returned by the Search endpoint by boosting the results that have that type, and boosting it by the modifier weight that boost the overall scoring algorithm. The types URI to boost are url-encoded and separated by semi-colomns. The boosting factor is delemited with a "^" character at the end of the encoded type's URI followed by the boosting factor. Boosting a type only impacts the scoring/relevancy of the returned results. This doesn't affect what is returned by the endpoint in any ways, so this won't restrict results to be returned by the endpoint. Here is an example of two boosted types:
 * param23. Modifying the score of the results returned by the Search endpoint by boosting the results that belongs to that dataset, and boosting it by the modifier weight that boost the overall scoring algorithm. The datasets URI to boost are url-encoded and separated by semi-colomns. The boosting factor is delemited with a "^" character at the end of the encoded dataset's URI followed by the boosting factor. Boosting a type only impacts the scoring/relevancy of the returned results. This doesn't affect what is returned by the endpoint in any ways, so this won't restrict results to be returned by the endpoint.Here is an example of two boosted datasets:
 * param24. Modifying the score of the results returned by the Search endpoint by boosting the results that have these attribute(s) or these attribute(s)/value(s), and boosting it by the modifier weight that boost the overall scoring algorithm. This parameter is used to boost the relevancy of the returned records if they are described with a particular attribute URI, or if they are described with a particular attribute URI and a particual value for that attribute. The attributes URI to boost are url-encoded and separated by semi-colomns. If a value is specified for this attribute, then it will be seperated with the attribute URI by two colomns "::" followed by the url-encoded value. Then the boosting factor is delemited with a "^" character at the end of the encoded attribute's URI, or the encoded value followed by the boosting factor. Boosting a attribute/value only impacts the scoring/relevancy of the returned results. This doesn't affect what is returned by the endpoint in any ways, so this won't restrict results to be returned by the endpoint. Here is an example of a boosted attribute URI and another booster attribute URI with a particular value:
 * param25. Includes the spellchecking suggestions to the resultset in the case that the resultset is empty. The search endpoint will create a resultset with a single result. This result will be of type . The suggested query words will be returned with the property   and the   and the collated search would be returned with the property  . Suggested terms can be ordered based on their frequency.

Available Sources Interfaces
A source interface is a way to process a web service query. Different sources interfaces can be implemented for the same OSF Web Service endpoint. Each interface will process the query differently, but all the queries to the web service endpoint will be the same, at the exception of the  parameter. Each interface shares the same API (the one defined by the web service endpoint), but their processing may differ (like using different algorithms, using different data management systems, etc.)

This is a list of the core interfaces for this endpoint. Organizations that hosts a OSF Web Service network could create their own interface and make it available to the users. However such private source interface won't be part of this list, but should be publicized by the organization.

Example of Returned XML Document
This is an example of the XML document returned by this Web service endpoint for a given URI. This example returns a list of datasets accessible by a given user IP.

Query:


 * http://[...]/ws/search/parameters: query=rdf&types=all&datasets=http%3A%2F%2F[...]%2Fwsf%2Fdatasets%2F283%2F%3Bhttp%3A%2F%2F[...]%2Fwsf%2Fdatasets%2F160%2F&items=10&page=0&inference=on&include_aggregates=true&registered_ip=self%3A%3A1

"Accept:" HTTP header field value:


 * text/xml

Result:

HTTP Status Codes
Here are the possible HTTP status (error) codes returned by this Web service endpoint.

On error code and the specific error, a different message description can be issued (meaning a different error has been returned).


 * Code:200
 * Message: OK


 * Code:400
 * Message: Bad Request
 * Message description: The Search web service endpoint is not geo-enabled. Please modify your query such that it does not use any geo feature such as the distance_filter and the range_filter parameters.
 * Message description: No query specified for this request
 * Message description: The number of items returned per request has to be greater than 0 and lesser than 128
 * Message description: No dataset accessible by that user
 * Message description: No requester IP available
 * Message description: No Web service URI available
 * Message description: Target Web service XYZ not registered to this Web Services Framework
 * Message description: No access defined for this requester IP XYZ, dataset (XYZ) and Web service (XYZ)
 * Message description: The target Web service (XYZ) needs create access and the requested user (XYZ) doesn't have this access for that dataset (XYZ)
 * Message description: The target Web service (XYZ) needs read access and the requested user (XYZ) doesn't have this access for that dataset (XYZ)
 * Message description: The target Web service (XYZ) needs update access and the requested user (XYZ) doesn't have this access for that dataset (XYZ)
 * Message description: The target Web service (XYZ) needs delete access and the requested user (XYZ) doesn't have this access for that dataset (XYZ)


 * Code:406
 * Message: Not Acceptable
 * Message description: Unacceptable mime type requested


 * Code:500
 * Message:Internal Error