Ontology-based Information Extraction

Ontology-based information extraction, or OBIE for short, is the use of ontologies and their specifications to "drive" or inform the information extraction process. The terms and concepts in the source ontology(ies) form the basis for term matching when tagging text documents.

OBIE is a form of knowledge extraction where the knowledge basis is the ontology. Though ontologies have been used for some time as the basis for driving information extraction systems, the specific use of the term OBIE appears to have first occurred in relation to the SEKT project. . Some searches on Google Scholar can also provide further documentation.

OBIE is now available via a variety of plug-ins to the GATE system, and is becoming more common in other general text processing (NLP) systems.

As used in OSF, some of the best practices for OBIE include to make sure that:


 * All ontology concepts have a definition, and to include that in the extraction basis
 * All ontology concepts have alternative labels, and to include those in the extraction basis
 * Where appropriate, ontology concepts have hidden labels to account for common misspellings, and to include those in the extraction basis
 * Inferencing is used as appropriate during the extraction (tagging) process.

