From OSF Wiki
Jump to: navigation, search
Apache Solr

Many structured data systems lack good performing full-text search. Also, structured data based on linked data RDF (Resource Description Framework) often substitutes Web identifiers for literal text values. This is good for linking and tracking purposes, but can excise much text, leading to apparently incomplete results sets during standard text search.

To address these issues, we: 1) changed standard RDF practice to also record literals in addition to URI identifiers; and 2) integrated our structured data store with the Solr faceted text-search engine. Solr is an open source enterprise search server based on the Lucene Java search library, with faceted search, caching, and many more features.

We changed standard RDF practice by adding a series of SPARQL queries at indexing time to trace identifiers and then extract the full-text information they referenced. Thus, each record also contained complete full-text representation of all objects and properties.

This design showed an unanticipated — but extremely beneficial — side-effect. We found we could drive the entire faceting system in Solr with some straightforward additions to these SPARQL queries. To our knowledge no one has previously integrated Solr and an RDF datastore (Virtuoso, see here). The match is natural and extremely performant. In these ways, the conventional full-text limitations of RDF have now been completely removed and easy faceting is achieved with Solr.