The Search Web service is used to perform full text searches on the structured data indexed on an OSF Web Service instance. A search query can be as simple as querying the data store for a single keyword, or to query it using a series of complex filters. Each search query can be applied to all, or a subset of, datasets accessible by the requester. All of the full text queries comply with the Lucene querying syntax.
Each Search query can be filtered by these different filtering criteria:
- Type of the record(s) being requested
- Dataset where the record(s) got indexed
- Presence of an attribute describing the record(s)
- A specific value, for a specific attribute describing the record(s)
- A distance from a lat/long coordinate (for Web Service geo-enabled OSF Web Service instance)
- A range of lat/long coordinates (for Web Service geo-enabled OSF Web Service instance)
Developers communicate with the Search Web service using the HTTP POST method. You may request one of the following mime types: (1) text/xml, (2) application/rdf+xml, (3) application/rdf+n3 or (4) application/json. The content returned by the Web service is serialized using the mime type requested and the data returned depends on the parameters selected.
This documentation page is used for the version 3 of this endpoint. Check at the top of this page to see the documentation pages for the other versions of this endpoint.
This Web service is intended to be used to perform full text searches, and filtered searches, on all the datasets hosted on a OSF Web Service instance.
Web Service Endpoint Information
This section describes all the permissions you need in the WSF (Web Service Framework) to send a query to this Web service endpoint, and it describes how to access it.
To access this Web service endpoint you need the proper CRUD (Create, Read, Update and Delete) permissions on a specific graph (dataset) of the WSF. Without the proper permissions on this graph you won't be able to send any queries to the endpoint.
- Create: False
- Read: True
- Update: False
- Delete: False
As shown on the graph URI:
- URIs of the datasets to be queried
Here is the information needed to communicate with this Web service's endpoint. Descriptions of the parameters are included below.
Note: if a parameter has a default value, the requester can omit it and the default value will be used. Also, some baseline Web services may not offer other values than the default.
Possible "Accept:" HTTP header field value:
- text/xml (structXML)
- application/json (structJSON)
- application/rdf+xml (RDF+XML)
- application/rdf+n3 (N3/Turtle)
- application/iron+json (irJSON)
- application/iron+csv (commON)
- http://[...]/ws/search/ ?query=&types=&datasets=&attributes=&attributes_boolean_operator=&include_attributes_list=&items=&page=&inference=&include_aggregates=&aggregate_attributes=&aggregate_attributes_object_type=&aggregate_attributes_object_nb=&distance_filter=&range_filter=&interface=&lang=&sort=&results_location_aggregator=&extended_filters=&types_boost=&datasets_boost=&attributes_boost=&spellcheck=&version=&search_restrictions=
URI dynamic parameters description:
Note: All parameters have to be URL-encoded
- query. Full text query. This query should comply with the Lucene Querying Syntax.
- types. (default: all). List of types of the records to be searched. Each type is separated by the ";" character. an example of such a list is: "type-a;type-b;type-c" meaning: I want to search for all the records with these types .
- datasets. (default: all). List of dataset URIs to be searched. Each dataset URI is separated by the ";".
- attributes. (default: all). List of filtering attributes (property) of (encoded) URIs separated by ";". Additionally, the URI can end with a (un-encoded) double-colon "::". What follows this double colons is a possible value restriction to be applied as a filter to this attribute to perform attribute/value filtered searches. The query syntax can be used for that filtering value. The value also has to be encoded. An example of this "attribute" parameter is: "http%3A%2F%2Fsome-attribute-uri::some%2Bfiltering%2Bvalue". There is a special markup used with the
prefLabelattribute when the attribute/value filtering is used in this parameter. It is the double stars "
**" that introduces an auto-completion behavior on the prefLabel core attribute. It should be used like: "
attributes=prefLabel::te**"; this will tells the search endpoint that the requester is performing an auto-completion task. That way, the endpoint will ensure that the autocompletion task can be performed for more than one word, including spaces. If the target attribute is defined in the ontology with the
xsd:dateTimedatatype in its range, then
datequeries can be used in this filter. If a single date is specified, such as
2001-05-24, then all the records from that date until now will be returned by the query. If a range of date is specified such as
[1999 to 2010], then all the records between these two dates will be returned. A range of dates has to be between double brackets. Also, the seperator of the two dates has to be
" to "(space, the word "to" and another space). The format of a date description is about any English textual datetime description. If the target attribute is defined in the ontology with the
xsd:floatdatatype in its range, then
numericqueries can be used in this filter. If a single number is specified, such as
235, then all the records with that attribute/value will be returned. If a range of numbers is specified such as
[235 to 900], then all the records between these two numbers will be returned. A range of numbers has to be between double brackets. Also, the seperator of the two dates has to be
" to "(space, the word "to" and another space). When a range is defined for an attribute/value filter, the star character (
*) can be used to denote "any" (so, any number, any date, etc) like
[235 to *].
- attributes_boolean_operator. (default: and). Tells the endpoint what boolean operator to use ("or" or "and") when doing attribute/value filtering. One of:
- "or": Use the OR boolean operator between all attribute/value filters. This means that if the user filter with 3 attributes, then the returned records will be described using one of these three.
- "and": Use the AND boolean operator between all attribute/value filters. this means that if the user filter with 3 attributes, then the returned records will be described using all the three. This parameter affects all the attribute/value filters.
- include_attributes_list. (optional) A list of attribute URIs to include into the resultset. Sometime, you may be dealing with datasets where the description of the entities are composed of thousands of attributes/values. Since the Search web service endpoint returns the complete entities descriptions in its resultsets, this parameter enables you to restrict the attribute/values you want included in the resultset which considerably reduce the size of the resultset to transmit and manipulate. Multiple attribute URIs can be added to this parameter by splitting them with ";". If "none" is specified for this parameter, only the "uri" and the "type" of the results will be returned. If one or more property URI(s) is specified for this parameter, then these properties, the "uri", the dataset provenance and the "type" will be returned for this search query.
- items. (default: 10)). The number of items to return in a single resultset
- page. (default: 0). The offset of the resultset to return. By example, to get the item 90 to 100, this parameter should be set to 90.
- inference. (default: on). One of:
- "on": Inference is enabled
- "off": Inference is disabled
- include_aggregates.(default: false) One of:
- "true": Aggregation data included in the resultset
- "false": Aggregation data not included in the resultset
- aggregate_attributes. Specify a set of attributes URI for which we want their aggregated values. The URIs should be url-encoded. Each attribute for which we want the aggregated values should be separated by a semi-colon ";". This is used to get a list of values, and their counts for a given attribute.
- aggregate_attributes_object_type. (default: literal). Determines what kind of object value you are want the search endpoint to return as aggregate values for the list of attributes for which you want their possible values. This list of attributes is determined by the
- "literal": The aggregated value returned by the endpoint is a literal. If the value is a URI (a reference to some record), then the literal value will be the preferred label of that referred record.
- "uri": If the value of the attribute(s) is a URI (a reference to some record) then that URI will be returned as the aggregated value.
- "uriliteral": If the value of the attribute(s) is a URI (a reference to some record) then that URI and its preferred label will be returned as the aggregated value.
- aggregate_attributes_object_nb. (default: 10). Determines the number of value to aggregate for each
aggregated_attributesfor this query. If the value is
-1, then it means that all possible values for the target
aggregated_attributeshave to be returned.
- distance_filter. The distance filter is a series of parameter that are used to filter records of the dataset according to the distance they are located from a given
lat;longpoint. The values are separated by a semi-column ";". The format is as follow:
lat;long;distance;distanceType. The distanceType can have two values
0means that the distance specified is in kilometers and
1means that the distance specified is in miles. An example is:
-98.45;10.4324;5;0, which means getting all the results that are at maximum 5 kilometers from the lat/long position.
- range_filter. The range filter is a series of parameter that are used to filter records of the dataset according to a rectangle bounds they are located in given their
lat;longposition. The values are separated by a semi-column ";". The format is as follow:
top-left-lat;top-left-long;bottom-right-lat;bottom-right-long. Returned results will be compromised in that region.
- interface. Source interface used for this web service query. The interface is a different way to process a query (different algorithms, different data management system, etc. The default interface is 'default'
- lang. (default: en) Language of the records to be returned by the search endpoint. Only the textual information of the requested language will be returned to the user. If no textual information is available for a record, for a requested language, then only non-textual information will be returned about the record.
- sort. Sorting criterias for this query. Sort can be used for "type", "dataset", "uri", "preflabel", "score" or any other url-encoded attribute URIs that are defined with a maximum cardinality of 1. Sorting fields needs to be followed by a space character and a direction "desc" or "asc". Multiple sorting criterias can be added by splitting them with ";". Here is an example of query using sort to sort by type: "type desc". Here is an example of sort that sort by type and dataset: "type desc; dataset asc". Here is an example of a sort that sort with a custom attribute: "http%3A%2F%2Fpurl.org%2Fontology%2Firon%23prefURL desc". By default the sorting order is "asc".
- results_location_aggregator. Specify a lat/long location where all the results should be aggregated around. For example, if we have a set of results compromised within a region. If we don't want the results spread everywhere in that region, we have to specify a location for this parameter such that all results get aggregated around that specific location within the region. The value should be: "latitude,longitude". By example: "49.92545999127249,-97.14934608459475"
- extended_filters. Extended filters are used to define more complex search filtered searches. This parameter uses a more complex syntax which enable the grouping of filter criterias and the usage of the AND, OR and NOT boolean operators. The grouping is done with the parenthesis. Each filter is composed of a url-encoded attribute URI to use as filters, followed by a colomn and the value to filter with. The full lucene syntax can be used to define the value to filter. If all values are required, the "*" (star) operator should be used as the value. If the value of an attribute needs to be considered a URI, then the "[uri]" syntax should be added at the end of the attribute filter like: "http%3A%2F%2Fpurl.org%2Fontology%2Ffoo%23friend[uri]:http%3A%2F%2Fbar.com%2Fmy-friend-uri". That way, the value of that attribute filter will be handled as a URI. There are a series of core attributes that can be used without specifying their full URI: dataset, type, inferred_type, prefLabel, altLabel, lat, long, description, polygonCoordinates, polylineCoordinates and located in. The extended filters are not a replacement to the attributes, types and datasets filtering parameters, they are an extension of it. Subsequent filtering criterias can be defined in the extended filtering parameter. The resolution logic by the Search endpoint is: attributes AND datasets AND types AND extended-filters. An example of such an extended query is: (http%3A%2F%2Fpurl.org%2Fontology%2Firon%23prefLabel:cancer AND NOT (breast OR ovarian)) AND (http%3A%2F%2Fpurl.org%2Fontology%2Fnhccn%23useGroupSignificant[uri]: (http%3A%2F%2Fpurl.org%2Fontology%2Fdoha%23liver_cancer OR http%3A%2F%2Fpurl.org%2Fontology%2Fdoha%23cancers_by_histologic_type)) AND dataset:"file://localhost/data/ontologies/files/doha.owl". Note: both the URI and the value (all kind of values: literals and URIs) need to be URL encoded before being sent to the Search endpoint.
- types_boost. Modifying the score of the results returned by the Search endpoint by boosting the results that have that type, and boosting it by the modifier weight that boost the overall scoring algorithm. The types URI to boost are url-encoded and separated by semi-colomns. The boosting factor is delemited with a "^" character at the end of the encoded type's URI followed by the boosting factor. Boosting a type only impacts the scoring/relevancy of the returned results. This doesn't affect what is returned by the endpoint in any ways, so this won't restrict results to be returned by the endpoint. Here is an example of two boosted types:
- datasets_boost. Modifying the score of the results returned by the Search endpoint by boosting the results that belongs to that dataset, and boosting it by the modifier weight that boost the overall scoring algorithm. The datasets URI to boost are url-encoded and separated by semi-colomns. The boosting factor is delemited with a "^" character at the end of the encoded dataset's URI followed by the boosting factor. Boosting a type only impacts the scoring/relevancy of the returned results. This doesn't affect what is returned by the endpoint in any ways, so this won't restrict results to be returned by the endpoint.Here is an example of two boosted datasets:
- attributes_boost. Modifying the score of the results returned by the Search endpoint by boosting the results that have these attribute(s) or these attribute(s)/value(s), and boosting it by the modifier weight that boost the overall scoring algorithm. This parameter is used to boost the relevancy of the returned records if they are described with a particular attribute URI, or if they are described with a particular attribute URI and a particual value for that attribute. The attributes URI to boost are url-encoded and separated by semi-colomns. If a value is specified for this attribute, then it will be seperated with the attribute URI by two colomns "::" followed by the url-encoded value. Then the boosting factor is delemited with a "^" character at the end of the encoded attribute's URI, or the encoded value followed by the boosting factor. Boosting a attribute/value only impacts the scoring/relevancy of the returned results. This doesn't affect what is returned by the endpoint in any ways, so this won't restrict results to be returned by the endpoint. Here is an example of a boosted attribute URI and another booster attribute URI with a particular value:
- spellcheck. Includes the spellchecking suggestions to the resultset in the case that the resultset is empty. The search endpoint will create a resultset with a single result. This result will be of type
wsf:SpellSuggestion. The suggested query words will be returned with the property
wsf:frequencyand the collated search would be returned with the property
wsf:collation. Suggested terms can be ordered based on their frequency.
- version. (default: 3.0) Version of the interface to query
- search_restrictions. Restrict the search to be performed on the list of attributes listed in this parameter. The attributes URI to restrict to are url-encoded and separated by semi-colomns. Aditionally, the score of the results can be modified by specifying a boosting factor that will change the scoring of the returned results. The boosting factor is delemited with a "^" character at the end of the encoded attribute's URI followed by the boosting factor. Here is an example of two restricted attributes and their boosting modifier:
Available Sources Interfaces
A source interface is a way to process a web service query. Different sources interfaces can be implemented for the same OSF Web Service endpoint. Each interface will process the query differently, but all the queries to the web service endpoint will be the same, at the exception of the
interface parameter. Each interface shares the same API (the one defined by the web service endpoint), but their processing may differ (like using different algorithms, using different data management systems, etc.)
This is a list of the core interfaces for this endpoint. Organizations that hosts a OSF Web Service network could create their own interface and make it available to the users. However such private source interface won't be part of this list, but should be publicized by the organization.
|Source Interface Name||Description|
||Default source interface for this OSF Web Service endpoint. This interface implements the default behavior of this OSF Web Service endpoint.|
Example of Returned XML Document
This is an example of the XML document returned by this Web service endpoint for a given URI. This example returns a list of datasets accessible by a given user IP.
- http://[...]/ws/search/parameters: query=rdf&types=all&datasets=http%3A%2F%2F[...]%2Fwsf%2Fdatasets%2F283%2F%3Bhttp%3A%2F%2F[...]%2Fwsf%2Fdatasets%2F160%2F&items=10&page=0&inference=on&include_aggregates=true
"Accept:" HTTP header field value:
1 <?xml version="1.0" encoding="utf-8"?> 2 <!DOCTYPE resultset PUBLIC "-//Structured Dynamics LLC//Search DTD 0.1//EN" "http://constructscs.com:8890/ws/dtd/search/search.dtd"> 3 <resultset> 4 <prefix entity="aggr" uri="http://purl.org/ontology/aggregate#"/> 5 <subject type="http://purl.org/ontology/swt#Ontology" uri="http://constructscs.com/conStruct/datasets/122/resource/mopy"> 6 <predicate type="http://purl.org/dc/terms/isPartOf"> 7 <object type="http://rdfs.org/ns/void#Dataset" uri="http://constructscs.com/wsf/datasets/122/"/> 8 </predicate> 9 <predicate type="http://usefulinc.com/ns/doap#name"> 10 <object type="rdfs:Literal">mopy</object> 11 </predicate> 12 <predicate type="http://usefulinc.com/ns/doap#homepage"> 13 <object type="rdfs:Literal">http://www.sourceforge.net/projects/motools</object> 14 </predicate> 15 <predicate type="http://usefulinc.com/ns/doap#programming-language"> 16 <object type="rdfs:Literal">Python</object> 17 </predicate> 18 <predicate type="http://purl.org/ontology/swt#status"> 19 <object type="rdfs:Literal">Existing 20 </object> 21 </predicate> 22 <subject type="aggr:Aggregate" uri="http://constructscs.com/wsf/ws/search/aggregate/8d4746ea554cfec324b0a740fbbc9be6/6ff6595d838e72f230b1b88974705166/"> 23 <predicate type="aggr:property"> 24 <object uri="http://www.w3.org/1999/02/22-rdf-syntax-ns#type"/> 25 </predicate> 26 <predicate type="aggr:object"> 27 <object uri="http://purl.org/ontology/swt#SearchEngine"/> 28 </predicate> 29 <predicate type="aggr:count"> 30 <object type="rdfs:Literal">5 31 </object> 32 </predicate> 33 </subject> 34 </resultset>
HTTP Status Codes
Here are the possible HTTP status (error) codes returned by this Web service endpoint.
On error code and the specific error, a different message description can be issued (meaning a different error has been returned).
|WS-SEARCH-200||Warning||Invalid number of items requested||The number of items returned per request has to be greater than 0 and lesser than 300|
|WS-SEARCH-300||Warning||No datasets accessible by that user||No datasets are accessible to that user|
|WS-SEARCH-301||Warning||Not geo-enabled||The Search web service endpoint is not geo-enabled. Please modify your query such that it does not use any geo feature such as the distance_filter and the range_filter parameters.|
|WS-SEARCH-302||Fatal||Requested source interface not existing||The source interface you requested is not existing for this web service endpoint.|
|WS-SEARCH-303||Fatal||Requested incompatible Source Interface version||The version of the source interface you requested is not compatible with the version of the source interface currently hosted on the system. Please make sure that your tool get upgraded for using this current version of the endpoint.|
|WS-SEARCH-304||Fatal||Source Interface's version not compatible with the web service endpoint's||The version of the source interface you requested is not compatible with the one of the web service endpoint. Please contact the system administrator such that he updates the source interface to make it compatible with the new endpoint version.|
|WS-SEARCH-305||Fatal||Invalid query date(s)||The dates range of one of your date range attribute/value filter is invalid. Please make sure you entered to valid date-ranges.|
|WS-SEARCH-306||Fatal||Invalid number in the numbers range filter||Numbers are expected in the numbers range filter you defined for this query|
|WS-SEARCH-307||Fatal||Language not supported by the endpoint||The language you requested for you query is currently not supported by the endpoint. Please use another one and re-send your query.|
|WS-SEARCH-308||Fatal||Sort property is multi-valued||The sort property you provided is multi-valued. Only single-valued properties can be sorted in a search query. You can make sure you have a single valued property by defining it with a sco:maxCardinality of 1.|
|WS-SEARCH-309||Fatal||A dataset defined in the extended filters is not accessible||A dataset that you defined in one of your extended filters is not accessible to you. Make sure you only use datasets for which you have access to.|
|WS-SEARCH-310||Fatal||Filter not available in your extended filters query||A filtering criteria you defined for this extended filters query is not avaible or defined in the system. Please remove or change that filter.|
|WS-SEARCH-311||Fatal||Query failed||The query to the Solr server failed using these parameters.|
|WS-SEARCH-312||Fatal||fieldsIndex.srz unexisting||The file fieldsIndex.srz is unexisting. Make sure it can be created by the CRUD: Create and CRUD: Update web service endpoints at the folder location specified by the fields_index_folder setting of the osf.ini file.|
|WS-SEARCH-313||Warning||Unused attribute specified in the attributes filters||An unused attribute as been specified as an attributes filter for this query. Make sure the URI of this attribute is the good one, and make sure there is data currently indexed for this attribute then try this query again.|
|WS-SEARCH-314||Warning||Unused attribute specified in the attributes boost parameter||An unused attribute as been specified as an attributes boost parameter for this query. Make sure the URI of this attribute is the good one, and make sure there is data currently indexed for this attribute then try this query again.|
|WS-SEARCH-315||Warning||Unused attribute specified in the attributes phrases boost parameter||An unused attribute as been specified as an attributes phrases boost parameter for this query. Make sure the URI of this attribute is the good one, and make sure there is data currently indexed for this attribute then try this query again.|
|WS-SEARCH-316||Warning||Unused attribute specified in the search restrictions parameter||An unused attribute as been specified as a search restrictions parameter for this query. Make sure the URI of this attribute is the good one, and make sure there is data currently indexed for this attribute then try this query again.|
|WS-SEARCH-317||Warning||Unused attribute specified in the extended filters parameter||An unused attribute as been specified in the extended filters parameter for this query. Make sure the URI of this attribute is the good one, and make sure there is data currently indexed for this attribute then try this query again.|
|WS-SEARCH-318||Warning||Unused attribute specified in the sort parameter||An unused attribute as been specified in the sort parameter for this query. Make sure the URI of this attribute is the good one, and make sure there is data currently indexed for this attribute then try this query again.|
|WS-SEARCH-319||Warning||Unexisting sorting order||The sorting order you specified is unexisting. Possible sorting orders are: 'asc' or 'desc'|
|WS-AUTH-VALIDATION-100||Fatal||Unauthorized Request||Your request cannot be authorized for this web service call|
|WS-AUTH-VALIDATION-101||Fatal||Unauthorized Request||Your request cannot be authorized for this web service call|
|WS-AUTH-VALIDATION-102||Fatal||Couldn't authorize request||An internal error occured when we tried to authorize this request|
|WS-AUTH-VALIDATION-103||Fatal||Unauthorized Request||Your request cannot be authorized for this user: "---", on this dataset: "---", using this web service endpoint: "---"|
|Not Acceptable||Unacceptable mime type requested|