Archive 1.x:Search/1.1

From OSF Wiki
Jump to: navigation, search
Search endpoint version:
1.1
2
3

The Search Web service is used to perform full text searches on the structured data indexed on a OSF Web Service instance. A search query can be as simple as querying the data store for a single keyword, or to query it using a series of complex filters. Each search query can be applied to all, or a subset of, datasets accessible by the requester. All of the full text queries comply with the Lucene querying syntax.

Each Search query can be filtered by these different filtering criteria:

  1. Type of the record(s) being requested
  2. Dataset where the record(s) got indexed
  3. Presence of an attribute describing the record(s)
  4. A specific value, for a specific attribute describing the record(s)
  5. A distance from a lat/long coordinate (for Web Service geo-enabled OSF Web Service instance)
  6. A range of lat/long coordinates (for Web Service geo-enabled OSF Web Service instance)

Developers communicate with the Search Web service using the HTTP POST method. You may request one of the following mime types: (1) text/xml, (2) application/rdf+xml, (3) application/rdf+n3 or (4) application/json. The content returned by the Web service is serialized using the mime type requested and the data returned depends on the parameters selected.

Version

This documentation page is used for the version 1.1 of this endpoint. Check at the top of this page to see the documentation pages for the other versions of this endpoint.

Usage

This Web service is intended to be used to perform full text searches, and filtered searches, on all the datasets hosted on a OSF Web Service instance.

Web Service Endpoint Information

This section describes all the permissions you need in the WSF (Web Service Framework) to send a query to this Web service endpoint, and it describes how to access it.

To access this Web service endpoint you need the proper CRUD (Create, Read, Update and Delete) permissions on a specific graph (dataset) of the WSF. Without the proper permissions on this graph you won't be able to send any queries to the endpoint.

Needed registered CRUD permission:
  • Create: False
  • Read: True
  • Update: False
  • Delete: False

As shown on the graph URI:

  • URIs of the datasets to be queried

Here is the information needed to communicate with this Web service's endpoint. Descriptions of the parameters are included below.

Note: if a parameter has a default value, the requester can omit it and the default value will be used. Also, some baseline Web services may not offer other values than the default.

HTTP method:
  • POST

Possible "Accept:" HTTP header field value:

  • text/xml (structXML)
  • application/json (structJSON)
  • application/rdf+xml (RDF+XML)
  • application/rdf+n3 (N3/Turtle)
  • application/iron+json (irJSON)
  • application/iron+csv (commON)

URI:

  • http://[...]/ws/search/ ?query=param1&types=param2&datasets=param3&attributes=param4&attributes_boolean_operator=param5&include_attributes_list=param6&items=param7&page=param8&inference=param9&include_aggregates=param10&aggregate_attributes=param11&aggregate_attributes_object_type=param12&aggregate_attributes_object_nb=param13&distance_filter=param14&range_filter=param15&registered_ip=param16&results_location_aggregator=param17

URI dynamic parameters description:

Note: All parameters have to be URL-encoded

  • param1. Full text query. This query should comply with the Lucene Querying Syntax.
  • param2 (default: all). List of types of the records to be searched. Each type is separated by the ";" character. an example of such a list is: "type-a;type-b;type-c" meaning: I want to search for all the records with these types .
  • param3 (default: all). List of dataset URIs to be searched. Each dataset URI is separated by the ";".
  • param4.' (default: all'). List of filtering attributes (property) of (encoded) URIs separated by ";". Additionally, the URI can end with a (un-encoded) double-colon "::". What follows this colon is a possible value restriction to be applied, as a filter to this attribute. The Lucene query syntax can be used for that filtering value. The value also has to be encoded. An example of this "attribute" parameter is: "http%3A%2F%2Fsome-attribute-uri::some%2Bfiltering%2Bvalue". There is a special markup used with the prefLabel attribute when the attribute/value filtering is used in this parameter. It is the double stars "**" that introduces an auto-completion behavior on the prefLabel core attribute. It should be used like: "attributes=prefLabel:te**"; this will tells the search endpoint that the requester is performing an auto-completion task. That way, the endpoint will ensure that the autocompletion task can be performed for more than one word, including spaces.
  • param5. (default: and). Tells the endpoint what boolean operator to use ("or" or "and") when doing attribute/value filtering. One of:
    • "or": Use the OR boolean operator between all attribute/value filters. This means that if the user filter with 3 attributes, then the returned records will be described using one of these three.
    • "and": Use the AND boolean operator between all attribute/value filters. this means that if the user filter with 3 attributes, then the returned records will be described using all the three. This parameter affects all the attribute/value filters.
  • param6. (optional) A list of attribute URIs to include into the resultset. Sometime, you may be dealing with datasets where the description of the entities are composed of thousands of attributes/values. Since the Search web service endpoint returns the complete entities descriptions in its resultsets, this parameter enables you to restrict the attribute/values you want included in the resultset which considerably reduce the size of the resultset to transmit and manipulate. Multiple attribute URIs can be added to this parameter by splitting them with ";".
  • param7. (default: 10)). The number of items to return in a single resultset
  • param8. (default: 0). The offset of the resultset to return. By example, to get the item 90 to 100, this parameter should be set to 90.
  • param9. (default: on). One of:
    • "on": Inference is enabled
    • "off": Inference is disabled
  • param10.(default: false) One of:
    • "true": Aggregation data included in the resultset
    • "false": Aggregation data not included in the resultset
  • param11. Specify a set of attributes URI for which we want their aggregated values. The URIs should be url-encoded. Each attribute for which we want the aggregated values should be separated by a semi-colon ";". This is used to get a list of values, and their counts for a given attribute.
  • param12. (default: literal). Determines what kind of object value you are want the search endpoint to return as aggregate values for the list of attributes for which you want their possible values. This list of attributes is determined by the aggregate_attributes parameter.
    • "literal": The aggregated value returned by the endpoint is a literal. If the value is a URI (a reference to some record), then the literal value will be the preferred label of that referred record.
    • "uri": If the value of the attribute(s) is a URI (a reference to some record) then that URI will be returned as the aggregated value.
  • param13. (default: 10). Determines the number of value to aggregate for each aggregated_attributes for this query. If the value is -1, then it means that all possible values for the target aggregated_attributes have to be returned.
  • param14. The distance filter is a series of parameter that are used to filter records of the dataset according to the distance they are located from a given lat;long point. The values are separated by a semi-column ";". The format is as follow: lat;long;distance;distanceType. The distanceType can have two values 0 or 1: 0 means that the distance specified is in kilometers and 1 means that the distance specified is in miles. An example is: -98.45;10.4324;5;0, which means getting all the results that are at maximum 5 kilometers from the lat/long position.
  • param15. The range filter is a series of parameter that are used to filter records of the dataset according to a rectangle bounds they are located in given their lat;long position. The values are separated by a semi-column ";". The format is as follow: top-left-lat;top-left-long;bottom-right-lat;bottom-right-long. Returned results will be compromised in that region.
  • param16. Target IP address registered in the WSF.
  • param17. Specify a lat/long location where all the results should be aggregated around. For example, if we have a set of results compromised within a region. If we don't want the results spread everywhere in that region, we have to specify a location for this parameter such that all results get aggregated around that specific location within the region. The value should be: "latitude,longitude". By example: "49.92545999127249,-97.14934608459475"

Example of Returned XML Document

This is an example of the XML document returned by this Web service endpoint for a given URI. This example returns a list of datasets accessible by a given user IP.

Query:
  • http://[...]/ws/search/parameters: query=rdf&types=all&datasets=http%3A%2F%2F[...]%2Fwsf%2Fdatasets%2F283%2F%3Bhttp%3A%2F%2F[...]%2Fwsf%2Fdatasets%2F160%2F&items=10&page=0&inference=on&include_aggregates=true&registered_ip=self%3A%3A1

"Accept:" HTTP header field value:

  • text/xml

Result:

  1. <?xml version="1.0" encoding="utf-8"?>
  2. <!DOCTYPE resultset PUBLIC "-//Structured Dynamics LLC//Search DTD 0.1//EN" "http://constructscs.com:8890/ws/dtd/search/search.dtd">
  3. <resultset>
  4.    <prefix entity="aggr" uri="http://purl.org/ontology/aggregate#"/>
  5.    <subject type="http://purl.org/ontology/swt#Ontology" uri="http://constructscs.com/conStruct/datasets/122/resource/mopy">
  6.       <predicate type="http://purl.org/dc/terms/isPartOf">
  7.          <object type="http://rdfs.org/ns/void#Dataset" uri="http://constructscs.com/wsf/datasets/122/"/>
  8.       </predicate>
  9.       <predicate type="http://usefulinc.com/ns/doap#name">
  10.          <object type="rdfs:Literal">mopy</object>
  11.       </predicate>
  12.       <predicate type="http://usefulinc.com/ns/doap#homepage">
  13.          <object type="rdfs:Literal">http://www.sourceforge.net/projects/motools</object>
  14.       </predicate>
  15.       <predicate type="http://usefulinc.com/ns/doap#programming-language">
  16.          <object type="rdfs:Literal">Python</object>
  17.       </predicate>
  18.       <predicate type="http://purl.org/ontology/swt#status">
  19.          <object type="rdfs:Literal">Existing
  20.          </object>
  21.       </predicate>
  22.       <subject type="aggr:Aggregate" uri="http://constructscs.com/wsf/ws/search/aggregate/8d4746ea554cfec324b0a740fbbc9be6/6ff6595d838e72f230b1b88974705166/">
  23.       <predicate type="aggr:property">
  24.          <object uri="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
  25.       </predicate>
  26.       <predicate type="aggr:object">
  27.          <object uri="http://purl.org/ontology/swt#SearchEngine"/>
  28.       </predicate>
  29.       <predicate type="aggr:count">
  30.          <object type="rdfs:Literal">5
  31.          </object>
  32.       </predicate>
  33.    </subject>
  34. </resultset>

HTTP Status Codes

Here are the possible HTTP status (error) codes returned by this Web service endpoint.

On error code and the specific error, a different message description can be issued (meaning a different error has been returned).

  • Code:200
    • Message: OK
  • Code:400
    • Message: Bad Request
    • Message description: The Search web service endpoint is not geo-enabled. Please modify your query such that it does not use any geo feature such as the distance_filter and the range_filter parameters.
    • Message description: No query specified for this request
    • Message description: The number of items returned per request has to be greater than 0 and lesser than 128
    • Message description: No dataset accessible by that user
    • Message description: No requester IP available
    • Message description: No Web service URI available
    • Message description: Target Web service XYZ not registered to this Web Services Framework
    • Message description: No access defined for this requester IP XYZ, dataset (XYZ) and Web service (XYZ)
    • Message description: The target Web service (XYZ) needs create access and the requested user (XYZ) doesn't have this access for that dataset (XYZ)
    • Message description: The target Web service (XYZ) needs read access and the requested user (XYZ) doesn't have this access for that dataset (XYZ)
    • Message description: The target Web service (XYZ) needs update access and the requested user (XYZ) doesn't have this access for that dataset (XYZ)
    • Message description: The target Web service (XYZ) needs delete access and the requested user (XYZ) doesn't have this access for that dataset (XYZ)
  • Code:406
    • Message: Not Acceptable
    • Message description: Unacceptable mime type requested
  • Code:500
    • Message:Internal Error