Search

From OSF Wiki
Jump to: navigation, search
Search endpoint version:
1.1
2
3

The Search Web service is used to perform full text searches on the structured data indexed on an OSF Web Service instance. A search query can be as simple as querying the data store for a single keyword, or to query it using a series of complex filters. Each search query can be applied to all, or a subset of, datasets accessible by the requester. All of the full text queries comply with the Lucene querying syntax.

Each Search query can be filtered by these different filtering criteria:

  1. Type of the record(s) being requested
  2. Dataset where the record(s) got indexed
  3. Presence of an attribute describing the record(s)
  4. A specific value, for a specific attribute describing the record(s)
  5. A distance from a lat/long coordinate (for Web Service geo-enabled OSF Web Service instance)
  6. A range of lat/long coordinates (for Web Service geo-enabled OSF Web Service instance)

Developers communicate with the Search Web service using the HTTP POST method. You may request one of the following mime types: (1) text/xml, (2) application/rdf+xml, (3) application/rdf+n3 or (4) application/json. The content returned by the Web service is serialized using the mime type requested and the data returned depends on the parameters selected.

Version

This documentation page is used for the version 3 of this endpoint. Check at the top of this page to see the documentation pages for the other versions of this endpoint.

Usage

This Web service is intended to be used to perform full text searches, and filtered searches, on all the datasets hosted on a OSF Web Service instance.

Web Service Endpoint Information

This section describes all the permissions you need in the WSF (Web Service Framework) to send a query to this Web service endpoint, and it describes how to access it.

To access this Web service endpoint you need the proper CRUD (Create, Read, Update and Delete) permissions on a specific graph (dataset) of the WSF. Without the proper permissions on this graph you won't be able to send any queries to the endpoint.

Needed registered CRUD permission:
  • Create: False
  • Read: True
  • Update: False
  • Delete: False

As shown on the graph URI:

  • URIs of the datasets to be queried

Here is the information needed to communicate with this Web service's endpoint. Descriptions of the parameters are included below.

Note: if a parameter has a default value, the requester can omit it and the default value will be used. Also, some baseline Web services may not offer other values than the default.

HTTP method:
  • POST

Possible "Accept:" HTTP header field value:

  • text/xml (structXML)
  • application/json (structJSON)
  • application/rdf+xml (RDF+XML)
  • application/rdf+n3 (N3/Turtle)
  • application/iron+json (irJSON)
  • application/iron+csv (commON)

URI:

  • http://[...]/ws/search/ ?query=&types=&datasets=&attributes=&attributes_boolean_operator=&include_attributes_list=&items=&page=&inference=&include_aggregates=&aggregate_attributes=&aggregate_attributes_object_type=&aggregate_attributes_object_nb=&distance_filter=&range_filter=&interface=&lang=&sort=&results_location_aggregator=&extended_filters=&types_boost=&datasets_boost=&attributes_boost=&spellcheck=&version=&search_restrictions=

URI dynamic parameters description:

Note: All parameters have to be URL-encoded

  • query. Full text query. This query should comply with the Lucene Querying Syntax.
  • types. (default: all). List of types of the records to be searched. Each type is separated by the ";" character. an example of such a list is: "type-a;type-b;type-c" meaning: I want to search for all the records with these types .
  • datasets. (default: all). List of dataset URIs to be searched. Each dataset URI is separated by the ";".
  • attributes. (default: all). List of filtering attributes (property) of (encoded) URIs separated by ";". Additionally, the URI can end with a (un-encoded) double-colon "::". What follows this double colons is a possible value restriction to be applied as a filter to this attribute to perform attribute/value filtered searches. The query syntax can be used for that filtering value. The value also has to be encoded. An example of this "attribute" parameter is: "http%3A%2F%2Fsome-attribute-uri::some%2Bfiltering%2Bvalue". There is a special markup used with the prefLabel attribute when the attribute/value filtering is used in this parameter. It is the double stars "**" that introduces an auto-completion behavior on the prefLabel core attribute. It should be used like: "attributes=prefLabel::te**"; this will tells the search endpoint that the requester is performing an auto-completion task. That way, the endpoint will ensure that the autocompletion task can be performed for more than one word, including spaces. If the target attribute is defined in the ontology with the xsd:dateTime datatype in its range, then date queries can be used in this filter. If a single date is specified, such as 2001-05-24, then all the records from that date until now will be returned by the query. If a range of date is specified such as [1999 to 2010], then all the records between these two dates will be returned. A range of dates has to be between double brackets. Also, the seperator of the two dates has to be " to " (space, the word "to" and another space). The format of a date description is about any English textual datetime description. If the target attribute is defined in the ontology with the xsd:int or the xsd:float datatype in its range, then numeric queries can be used in this filter. If a single number is specified, such as 235, then all the records with that attribute/value will be returned. If a range of numbers is specified such as [235 to 900], then all the records between these two numbers will be returned. A range of numbers has to be between double brackets. Also, the seperator of the two dates has to be " to " (space, the word "to" and another space). When a range is defined for an attribute/value filter, the star character (*) can be used to denote "any" (so, any number, any date, etc) like [235 to *].
  • attributes_boolean_operator. (default: and). Tells the endpoint what boolean operator to use ("or" or "and") when doing attribute/value filtering. One of:
    • "or": Use the OR boolean operator between all attribute/value filters. This means that if the user filter with 3 attributes, then the returned records will be described using one of these three.
    • "and": Use the AND boolean operator between all attribute/value filters. this means that if the user filter with 3 attributes, then the returned records will be described using all the three. This parameter affects all the attribute/value filters.
  • include_attributes_list. (optional) A list of attribute URIs to include into the resultset. Sometime, you may be dealing with datasets where the description of the entities are composed of thousands of attributes/values. Since the Search web service endpoint returns the complete entities descriptions in its resultsets, this parameter enables you to restrict the attribute/values you want included in the resultset which considerably reduce the size of the resultset to transmit and manipulate. Multiple attribute URIs can be added to this parameter by splitting them with ";". If "none" is specified for this parameter, only the "uri" and the "type" of the results will be returned. If one or more property URI(s) is specified for this parameter, then these properties, the "uri", the dataset provenance and the "type" will be returned for this search query.
  • items. (default: 10)). The number of items to return in a single resultset
  • page. (default: 0). The offset of the resultset to return. By example, to get the item 90 to 100, this parameter should be set to 90.
  • inference. (default: on). One of:
    • "on": Inference is enabled
    • "off": Inference is disabled
  • include_aggregates.(default: false) One of:
    • "true": Aggregation data included in the resultset
    • "false": Aggregation data not included in the resultset
  • aggregate_attributes. Specify a set of attributes URI for which we want their aggregated values. The URIs should be url-encoded. Each attribute for which we want the aggregated values should be separated by a semi-colon ";". This is used to get a list of values, and their counts for a given attribute.
  • aggregate_attributes_object_type. (default: literal). Determines what kind of object value you are want the search endpoint to return as aggregate values for the list of attributes for which you want their possible values. This list of attributes is determined by the aggregate_attributes parameter.
    • "literal": The aggregated value returned by the endpoint is a literal. If the value is a URI (a reference to some record), then the literal value will be the preferred label of that referred record.
    • "uri": If the value of the attribute(s) is a URI (a reference to some record) then that URI will be returned as the aggregated value.
    • "uriliteral": If the value of the attribute(s) is a URI (a reference to some record) then that URI and its preferred label will be returned as the aggregated value.
  • aggregate_attributes_object_nb. (default: 10). Determines the number of value to aggregate for each aggregated_attributes for this query. If the value is -1, then it means that all possible values for the target aggregated_attributes have to be returned.
  • distance_filter. The distance filter is a series of parameter that are used to filter records of the dataset according to the distance they are located from a given lat;long point. The values are separated by a semi-column ";". The format is as follow: lat;long;distance;distanceType. The distanceType can have two values 0 or 1: 0 means that the distance specified is in kilometers and 1 means that the distance specified is in miles. An example is: -98.45;10.4324;5;0, which means getting all the results that are at maximum 5 kilometers from the lat/long position.
  • range_filter. The range filter is a series of parameter that are used to filter records of the dataset according to a rectangle bounds they are located in given their lat;long position. The values are separated by a semi-column ";". The format is as follow: top-left-lat;top-left-long;bottom-right-lat;bottom-right-long. Returned results will be compromised in that region.
  • interface. Source interface used for this web service query. The interface is a different way to process a query (different algorithms, different data management system, etc. The default interface is 'default'
  • lang. (default: en) Language of the records to be returned by the search endpoint. Only the textual information of the requested language will be returned to the user. If no textual information is available for a record, for a requested language, then only non-textual information will be returned about the record.
  • sort. Sorting criterias for this query. Sort can be used for "type", "dataset", "uri", "preflabel", "score" or any other url-encoded attribute URIs that are defined with a maximum cardinality of 1. Sorting fields needs to be followed by a space character and a direction "desc" or "asc". Multiple sorting criterias can be added by splitting them with ";". Here is an example of query using sort to sort by type: "type desc". Here is an example of sort that sort by type and dataset: "type desc; dataset asc". Here is an example of a sort that sort with a custom attribute: "http%3A%2F%2Fpurl.org%2Fontology%2Firon%23prefURL desc". By default the sorting order is "asc".
  • results_location_aggregator. Specify a lat/long location where all the results should be aggregated around. For example, if we have a set of results compromised within a region. If we don't want the results spread everywhere in that region, we have to specify a location for this parameter such that all results get aggregated around that specific location within the region. The value should be: "latitude,longitude". By example: "49.92545999127249,-97.14934608459475"
  • extended_filters. Extended filters are used to define more complex search filtered searches. This parameter uses a more complex syntax which enable the grouping of filter criterias and the usage of the AND, OR and NOT boolean operators. The grouping is done with the parenthesis. Each filter is composed of a url-encoded attribute URI to use as filters, followed by a colomn and the value to filter with. The full lucene syntax can be used to define the value to filter. If all values are required, the "*" (star) operator should be used as the value. If the value of an attribute needs to be considered a URI, then the "[uri]" syntax should be added at the end of the attribute filter like: "http%3A%2F%2Fpurl.org%2Fontology%2Ffoo%23friend[uri]:http%3A%2F%2Fbar.com%2Fmy-friend-uri". That way, the value of that attribute filter will be handled as a URI. There are a series of core attributes that can be used without specifying their full URI: dataset, type, inferred_type, prefLabel, altLabel, lat, long, description, polygonCoordinates, polylineCoordinates and located in. The extended filters are not a replacement to the attributes, types and datasets filtering parameters, they are an extension of it. Subsequent filtering criterias can be defined in the extended filtering parameter. The resolution logic by the Search endpoint is: attributes AND datasets AND types AND extended-filters. An example of such an extended query is: (http%3A%2F%2Fpurl.org%2Fontology%2Firon%23prefLabel:cancer AND NOT (breast OR ovarian)) AND (http%3A%2F%2Fpurl.org%2Fontology%2Fnhccn%23useGroupSignificant[uri]: (http%3A%2F%2Fpurl.org%2Fontology%2Fdoha%23liver_cancer OR http%3A%2F%2Fpurl.org%2Fontology%2Fdoha%23cancers_by_histologic_type)) AND dataset:"file://localhost/data/ontologies/files/doha.owl". Note: both the URI and the value (all kind of values: literals and URIs) need to be URL encoded before being sent to the Search endpoint.
  • types_boost. Modifying the score of the results returned by the Search endpoint by boosting the results that have that type, and boosting it by the modifier weight that boost the overall scoring algorithm. The types URI to boost are url-encoded and separated by semi-colomns. The boosting factor is delemited with a "^" character at the end of the encoded type's URI followed by the boosting factor. Boosting a type only impacts the scoring/relevancy of the returned results. This doesn't affect what is returned by the endpoint in any ways, so this won't restrict results to be returned by the endpoint. Here is an example of two boosted types: urlencode(type-uri-1)^30;urlencode(type-uri-2)^300
  • datasets_boost. Modifying the score of the results returned by the Search endpoint by boosting the results that belongs to that dataset, and boosting it by the modifier weight that boost the overall scoring algorithm. The datasets URI to boost are url-encoded and separated by semi-colomns. The boosting factor is delemited with a "^" character at the end of the encoded dataset's URI followed by the boosting factor. Boosting a type only impacts the scoring/relevancy of the returned results. This doesn't affect what is returned by the endpoint in any ways, so this won't restrict results to be returned by the endpoint.Here is an example of two boosted datasets: urlencode(dataset-uri-1)^30;urlencode(dataset-uri-2)^300
  • attributes_boost. Modifying the score of the results returned by the Search endpoint by boosting the results that have these attribute(s) or these attribute(s)/value(s), and boosting it by the modifier weight that boost the overall scoring algorithm. This parameter is used to boost the relevancy of the returned records if they are described with a particular attribute URI, or if they are described with a particular attribute URI and a particual value for that attribute. The attributes URI to boost are url-encoded and separated by semi-colomns. If a value is specified for this attribute, then it will be seperated with the attribute URI by two colomns "::" followed by the url-encoded value. Then the boosting factor is delemited with a "^" character at the end of the encoded attribute's URI, or the encoded value followed by the boosting factor. Boosting a attribute/value only impacts the scoring/relevancy of the returned results. This doesn't affect what is returned by the endpoint in any ways, so this won't restrict results to be returned by the endpoint. Here is an example of a boosted attribute URI and another booster attribute URI with a particular value: urlencode(attribute-uri-1)^30;urlencode(attribute-uri-2)::urlencode(some values)^300
  • spellcheck. Includes the spellchecking suggestions to the resultset in the case that the resultset is empty. The search endpoint will create a resultset with a single result. This result will be of type wsf:SpellSuggestion. The suggested query words will be returned with the property wsf:suggestion and the wsf:frequency and the collated search would be returned with the property wsf:collation. Suggested terms can be ordered based on their frequency.
  • version. (default: 3.0) Version of the interface to query
  • search_restrictions. Restrict the search to be performed on the list of attributes listed in this parameter. The attributes URI to restrict to are url-encoded and separated by semi-colomns. Aditionally, the score of the results can be modified by specifying a boosting factor that will change the scoring of the returned results. The boosting factor is delemited with a "^" character at the end of the encoded attribute's URI followed by the boosting factor. Here is an example of two restricted attributes and their boosting modifier: urlencode(attribute-uri-1)^30;urlencode(attribute-uri-2)^300

Available Sources Interfaces

A source interface is a way to process a web service query. Different sources interfaces can be implemented for the same OSF Web Service endpoint. Each interface will process the query differently, but all the queries to the web service endpoint will be the same, at the exception of the interface parameter. Each interface shares the same API (the one defined by the web service endpoint), but their processing may differ (like using different algorithms, using different data management systems, etc.)

This is a list of the core interfaces for this endpoint. Organizations that hosts a OSF Web Service network could create their own interface and make it available to the users. However such private source interface won't be part of this list, but should be publicized by the organization.


Source Interface Name Description
default Default source interface for this OSF Web Service endpoint. This interface implements the default behavior of this OSF Web Service endpoint.

Example of Returned XML Document

This is an example of the XML document returned by this Web service endpoint for a given URI. This example returns a list of datasets accessible by a given user IP.

Query:
  • http://[...]/ws/search/parameters: query=rdf&types=all&datasets=http%3A%2F%2F[...]%2Fwsf%2Fdatasets%2F283%2F%3Bhttp%3A%2F%2F[...]%2Fwsf%2Fdatasets%2F160%2F&items=10&page=0&inference=on&include_aggregates=true

"Accept:" HTTP header field value:

  • text/xml

Result:

  1. <?xml version="1.0" encoding="utf-8"?>
  2. <!DOCTYPE resultset PUBLIC "-//Structured Dynamics LLC//Search DTD 0.1//EN" "http://constructscs.com:8890/ws/dtd/search/search.dtd">
  3. <resultset>
  4.    <prefix entity="aggr" uri="http://purl.org/ontology/aggregate#"/>
  5.    <subject type="http://purl.org/ontology/swt#Ontology" uri="http://constructscs.com/conStruct/datasets/122/resource/mopy">
  6.       <predicate type="http://purl.org/dc/terms/isPartOf">
  7.          <object type="http://rdfs.org/ns/void#Dataset" uri="http://constructscs.com/wsf/datasets/122/"/>
  8.       </predicate>
  9.       <predicate type="http://usefulinc.com/ns/doap#name">
  10.          <object type="rdfs:Literal">mopy</object>
  11.       </predicate>
  12.       <predicate type="http://usefulinc.com/ns/doap#homepage">
  13.          <object type="rdfs:Literal">http://www.sourceforge.net/projects/motools</object>
  14.       </predicate>
  15.       <predicate type="http://usefulinc.com/ns/doap#programming-language">
  16.          <object type="rdfs:Literal">Python</object>
  17.       </predicate>
  18.       <predicate type="http://purl.org/ontology/swt#status">
  19.          <object type="rdfs:Literal">Existing
  20.          </object>
  21.       </predicate>
  22.       <subject type="aggr:Aggregate" uri="http://constructscs.com/wsf/ws/search/aggregate/8d4746ea554cfec324b0a740fbbc9be6/6ff6595d838e72f230b1b88974705166/">
  23.       <predicate type="aggr:property">
  24.          <object uri="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/>
  25.       </predicate>
  26.       <predicate type="aggr:object">
  27.          <object uri="http://purl.org/ontology/swt#SearchEngine"/>
  28.       </predicate>
  29.       <predicate type="aggr:count">
  30.          <object type="rdfs:Literal">5
  31.          </object>
  32.       </predicate>
  33.    </subject>
  34. </resultset>


HTTP Status Codes

Here are the possible HTTP status (error) codes returned by this Web service endpoint.

On error code and the specific error, a different message description can be issued (meaning a different error has been returned).


HTTP 200

Message Description
OK


HTTP 400

ID Level Name Description
WS-SEARCH-200 Warning Invalid number of items requested The number of items returned per request has to be greater than 0 and lesser than 300
WS-SEARCH-300 Warning No datasets accessible by that user No datasets are accessible to that user
WS-SEARCH-301 Warning Not geo-enabled The Search web service endpoint is not geo-enabled. Please modify your query such that it does not use any geo feature such as the distance_filter and the range_filter parameters.
WS-SEARCH-302 Fatal Requested source interface not existing The source interface you requested is not existing for this web service endpoint.
WS-SEARCH-303 Fatal Requested incompatible Source Interface version The version of the source interface you requested is not compatible with the version of the source interface currently hosted on the system. Please make sure that your tool get upgraded for using this current version of the endpoint.
WS-SEARCH-304 Fatal Source Interface's version not compatible with the web service endpoint's The version of the source interface you requested is not compatible with the one of the web service endpoint. Please contact the system administrator such that he updates the source interface to make it compatible with the new endpoint version.
WS-SEARCH-305 Fatal Invalid query date(s) The dates range of one of your date range attribute/value filter is invalid. Please make sure you entered to valid date-ranges.
WS-SEARCH-306 Fatal Invalid number in the numbers range filter Numbers are expected in the numbers range filter you defined for this query
WS-SEARCH-307 Fatal Language not supported by the endpoint The language you requested for you query is currently not supported by the endpoint. Please use another one and re-send your query.
WS-SEARCH-308 Fatal Sort property is multi-valued The sort property you provided is multi-valued. Only single-valued properties can be sorted in a search query. You can make sure you have a single valued property by defining it with a sco:maxCardinality of 1.
WS-SEARCH-309 Fatal A dataset defined in the extended filters is not accessible A dataset that you defined in one of your extended filters is not accessible to you. Make sure you only use datasets for which you have access to.
WS-SEARCH-310 Fatal Filter not available in your extended filters query A filtering criteria you defined for this extended filters query is not avaible or defined in the system. Please remove or change that filter.
WS-SEARCH-311 Fatal Query failed The query to the Solr server failed using these parameters.
WS-SEARCH-312 Fatal fieldsIndex.srz unexisting The file fieldsIndex.srz is unexisting. Make sure it can be created by the CRUD: Create and CRUD: Update web service endpoints at the folder location specified by the fields_index_folder setting of the osf.ini file.
WS-SEARCH-313 Warning Unused attribute specified in the attributes filters An unused attribute as been specified as an attributes filter for this query. Make sure the URI of this attribute is the good one, and make sure there is data currently indexed for this attribute then try this query again.
WS-SEARCH-314 Warning Unused attribute specified in the attributes boost parameter An unused attribute as been specified as an attributes boost parameter for this query. Make sure the URI of this attribute is the good one, and make sure there is data currently indexed for this attribute then try this query again.
WS-SEARCH-315 Warning Unused attribute specified in the attributes phrases boost parameter An unused attribute as been specified as an attributes phrases boost parameter for this query. Make sure the URI of this attribute is the good one, and make sure there is data currently indexed for this attribute then try this query again.
WS-SEARCH-316 Warning Unused attribute specified in the search restrictions parameter An unused attribute as been specified as a search restrictions parameter for this query. Make sure the URI of this attribute is the good one, and make sure there is data currently indexed for this attribute then try this query again.
WS-SEARCH-317 Warning Unused attribute specified in the extended filters parameter An unused attribute as been specified in the extended filters parameter for this query. Make sure the URI of this attribute is the good one, and make sure there is data currently indexed for this attribute then try this query again.
WS-SEARCH-318 Warning Unused attribute specified in the sort parameter An unused attribute as been specified in the sort parameter for this query. Make sure the URI of this attribute is the good one, and make sure there is data currently indexed for this attribute then try this query again.
WS-SEARCH-319 Warning Unexisting sorting order The sorting order you specified is unexisting. Possible sorting orders are: 'asc' or 'desc'

HTTP 403

ID Level Name Description
WS-AUTH-VALIDATION-100 Fatal Unauthorized Request Your request cannot be authorized for this web service call
WS-AUTH-VALIDATION-101 Fatal Unauthorized Request Your request cannot be authorized for this web service call
WS-AUTH-VALIDATION-102 Fatal Couldn't authorize request An internal error occured when we tried to authorize this request
WS-AUTH-VALIDATION-103 Fatal Unauthorized Request Your request cannot be authorized for this user: "---", on this dataset: "---", using this web service endpoint: "---"


HTTP 406

Message Description
Not Acceptable Unacceptable mime type requested


HTTP 500

Message Description
Internal Error