Semantic Technologies Glossary

From OSF Wiki
Jump to: navigation, search

These are many semantic technology terms relevant to the context of an OSF installation. Some of these are general terms related to ontologies or the dataset concept.

An ABox (for assertions, the basis for A in ABox) is an "assertion component"; that is, a fact associated with a terminological vocabulary within a knowledge base. ABox are TBox-compliant statements about instances belonging to the concept of an ontology.
Adaptive ontology
An adaptive ontology is a conventional knowledge representational ontology that has added to it a number of specific best practices, including modeling the ABox and TBox constructs separately; information that relates specific types to different and appropriate display templates or visualization components; use of preferred labels for user interfaces, as well as alternative labels and hidden labels; defined concepts; and a design that adheres to the open world assumption.
Administrative ontology
Administrative ontologies govern internal application use and user interface interactions.
An annotation, specifically as an annotation property, is a way to provide metadata or to describe vocabularies and properties used within an ontology. Annotations do not participate in reasoning or coherency testing for ontologies.
The name Atom applies to a pair of related standards. The Atom Syndication Format is an XML language used for web feeds, while the Atom Publishing Protocol (APP for short) is a simple HTTP-based protocol for creating and updating Web resources.
These are the aspects, properties, features, characteristics, or parameters that objects (and classes) may have. They are the descriptive characteristics of a thing. Key-value pairs match an attribute with a value; the value may be a reference to another object, an actual value or a descriptive label or string. In an RDF statement, an attribute is expressed as a property (or predicate or relation). In intensional logic, all attributes or characteristics of similarly classifiable items define the membership in that set.
An axiom is a premise or starting point of reasoning. In an ontology, each statement (assertion) is an axiom.
Binding is the creation of a simple reference to something that is larger and more complicated and used frequently. The simple reference can be used instead of having to repeat the larger thing.
A class is a collection of sets or instances (or sometimes other mathematical objects) which can be unambiguously defined by a property that all of its members share. In ontologies, classes may also be known as sets, collections, concepts, types of objects, or kinds of things.
Closed World Assumption
CWA is the presumption that what is not currently known to be true, is false. CWA also has a logical formalization. CWA is the most common logic applied to relational database systems, and is particularly useful for transaction-type systems. In knowledge management, the closed world assumption is used in at least two situations: 1) when the knowledge base is known to be complete (e.g., a corporate database containing records for every employee), and 2) when the knowledge base is known to be incomplete but a "best" definite answer must be derived from incomplete information. See contrast to the open world assumption.
Data Space
A data space may be personal, collective or topical, and is a virtual "container" for related information irrespective of storage location, schema or structure.
An aggregation of similar kinds of things or items, mostly comprised of instance records.
A project that extracts structured content from Wikipedia, and then makes that data available as linked data. There are millions of entities characterized by DBpedia in this way. As such, DBpedia is one of the largest -- and most central -- hubs for linked data on the Web.
DOAP (Description Of A Project) is an RDF schema and XML vocabulary to describe open-source projects.
Description logics
Description logics and their semantics traditionally split concepts and their relationships from the different treatment of instances and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. The second split of instances is known as the ABox and describes the attributes of instances (and individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts.
Domain ontology
Domain (or content) ontologies embody more of the traditional ontology functions such as information interoperability, inferencing, reasoning and conceptual and knowledge capture of the applicable domain.
An individual object or member of a class; when affixed with a proper name or label is also known as a named entity (thus, named entities are a subset of all entities).
Entity–attribute–value model
EAV is a data model to describe entities where the number of attributes (properties, parameters) that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest. In the EAV data model, each attribute-value pair is a fact describing an entity. EAV systems trade off simplicity in the physical and logical structure of the data for complexity in their metadata, which, among other things, plays the role that database constraints and referential integrity do in standard database designs.
The extension of a class, concept, idea, or sign consists of the things to which it applies, in contrast with its intension. For example, the extension of the word "dog" is the set of all (past, present and future) dogs in the world. The extension is most akin to the attributes or characteristics of the instances in a set defining its class membership.
FOAF (Friend of a Friend) is an RDF schema for machine-readable modelling of homepage-like profiles and social networks.
A folksonomy is a user-generated set of open-ended labels called tags organized in some manner and used to categorize and retrieve Web content such as Web pages, photographs, and Web links.
GeoNames integrates geographical data such as names of places in various languages, elevation, population and others from various sources.
GRDDL is a markup format for Gleaning Resource Descriptions from Dialects of Languages; that is, for getting RDF data out of XML and XHTML documents using explicitly associated transformation algorithms, typically represented in XSLT.
High-level Subject
A high-level subject is both a subject proxy and category label used in a hierarchical subject classification scheme (taxonomy). Higher-level subjects are classes for more atomic subjects, with the height of the level representing broader or more aggregate classes.
See Instance.
Inference is the act or process of deriving logical conclusions from premises known or assumed to be true. The logic within and between statements in an ontology is the basis for inferring new conclusions from it, using software applications known as inference engines or reasoners.
Instances are the basic, "ground level" components of an ontology. An instance is individual member of a class, also used synonomously with entity. The instances in an ontology may include concrete objects such as people, animals, tables, automobiles, molecules, and planets, as well as abstract instances such as numbers and words. An instance is also known as an individual, with member and entity also used somewhat interchangeably.
Instance Record
An instance with one or more attributes also provided.
irON (instance record and Object Notation) is a abstract notation and associated vocabulary for specifying RDF (Resource Description Framework) triples and schema in non-RDF forms. Its purpose is to allow users and tools in non-RDF formats to stage interoperable datasets using RDF.
The intension of a class is what is intended as a definition of what characteristics its members should have; it is akin to a definition of a concept and what is intended for a class to contain. It is therefore like the schema aspects (or TBox) in an ontology.
Key-Value Pair
Also known as a name–value pair or attribute–value pair, a key-value pair is a fundamental, open-ended data representation. All or part of the data model may be expressed as a collection of tuples <attribute name, value> where each element is a key-value pair. The key is the defined attribute and the value may be a reference to another object or a literal string or value. In RDF triple terms, the subject is implied in a key-value pair by nature of the instance record at hand.
Used synonomously herein with class.
Knowledge base
A knowledge base (abbreviated KB or kb) is a special kind of database for knowledge management. A knowledge base provides a means for information to be collected, organized, shared, searched and utilized. Formally, the combination of a TBox and ABox is a knowledge base.
A specification that relates an object or attribute name to its full URI (as required in the RDF language).
Linked data
Linked data is a set of best practices for publishing and deploying instance and class data using the RDF data model, and uses uniform resource identifiers (URIs) to name the data objects. The approach exposes the data for access via the HTTP protocol, while emphasizing data interconnections, interrelationships and context useful to both humans and machine agents.
A considered correlation of objects in two different sources to one another, with the relation between the objects defined via a specific property. Linkage is a subset of possible mappings.
Used synonomously herein with instance.
Metadata (metacontent) is supplementary data that provides information about one or more aspects of the content at hand such as means of creation, purpose, when created or modified, author or provenance, where located, topic or subject matter, standards used, or other annotation characteristics. It is "data about data", or the means by which data objects or aggregations can be described. Contrasted to an attribute, which is an individual characteristic intrinsic to a data object or instance, metadata is a description about that data, such as how or when created or by whom.
Metamodeling is the analysis, construction and development of the frames, rules, constraints, models and theories applicable and useful for modeling a predefined class of problems.
Microdata is a proposed specification used to nest semantics within existing content on web pages. Microdata is an attempt to provide a simpler way of annotating HTML elements with machine-readable tags than the similar approaches of using RDFa or microformats.
A microformat (sometimes abbreviated μF or uF) is a piece of mark up that allows expression of semantics in an HTML (or XHTML) web page. Programs can extract meaning from a web page that is marked up with one or more microformats.
Natural language processing
NLP is the process of a computer extracting meaningful information from natural language input and/or producing natural language output. NLP is one method for assigning structured data characterizations to text content for use in semantic technologies. (Hand assignment is another method.) Some of the specific NLP techniques and applications relevant to semantic technologies include automatic summarization, coreference resolution, machine translation, named entity recognition (NER), question answering, relationship extraction, topic segmentation and recognition, word segmentation, and word sense disambiguation, among others.
Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. Ontology-based information extraction (OBIE) is the use of an ontology to inform a "tagger" or information extraction program when doing natural language processing. Input ontologies thus become the basis for generating metadata tags when tagging text or documents.
An ontology is a data model that represents a set of concepts within a domain and the relationships between those concepts. Loosely defined, ontologies on the Web can have a broad range of formalism, or expressiveness or reasoning power.
Ontology-driven application
Ontology-driven applications (or ODapps) are modular, generic software applications designed to operate in accordance with the specifications contained in one or more ontologies. The relationships and structure of the information driving these applications are based on the standard functions and roles of ontologies (namely as domain ontologies), as supplemented by UI and instruction sets and validations and rules.
Open Semantic Framework
The open semantic framework, or OSF, is a combination of a layered architecture and an open-source, modular software stack. The stack combines many leading third-party software packages with open source semantic technology developments from Structured Dynamics.
Open World Assumption
OWA is a formal logic assumption that the truth-value of a statement is independent of whether or not it is known by any single observer or agent to be true. OWA is used in knowledge representation to codify the informal notion that in general no single agent or observer has complete knowledge, and therefore cannot make the closed world assumption. The OWA limits the kinds of inference and deductions an agent can make to those that follow from statements that are known to the agent to be true. OWA is useful when we represent knowledge within a system as we discover it, and where we cannot guarantee that we have discovered or will discover complete information. In the OWA, statements about knowledge that are not included in or inferred from the knowledge explicitly recorded in the system may be considered unknown, rather than wrong or false. Semantic Web languages such as OWL make the open world assumption. See contrast to the closed world assumption.
OPML (Outline Processor Markup Language) is an XML format for outlines, and is commonly used to exchange lists of web feeds between web feed aggregators.
The Web Ontology Language (OWL) is designed for defining and instantiating formal Web ontologies. An OWL ontology may include descriptions of classes, along with their related properties and instances. There are also a variety of OWL dialects.
See Property.
Properties are the ways in which classes and instances can be related to one another. Properties are thus a relationship, and are also known as predicates. Properties are used to define an attribute relation for an instance.
In computer science, punning refers to a programming technique that subverts or circumvents the type system of a programming language, by allowing a value of a certain type to be manipulated as a value of a different type. When used for ontologies, it means to treat a thing as both a class and an instance, with the use depending on context.
Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata model but which has come to be used as a general method of modeling information, through a variety of syntax formats. The RDF metadata model is based upon the idea of making statements about resources in the form of subject-predicate-object expressions, called triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object.
RDFa 1.0 is a set of extensions to XHTML that is a W3C Recommendation. RDFa uses attributes from meta and link elements, and generalizes them so that they are usable on all elements allowing annotation markup with semantics. A W3C Working draft is presently underway that expands RDFa into version 1.1 with HTML5 and SVG support, among other changes.
RDF Schema
RDFS or RDF Schema is an extensible knowledge representation language, providing basic elements for the description of ontologies, otherwise called RDF vocabularies, intended to structure RDF resources.
A semantic reasoner, reasoning engine, rules engine, or simply a reasoner, is a piece of software able to infer logical consequences from a set of asserted facts or axioms. The notion of a semantic reasoner generalizes that of an inference engine, by providing a richer set of mechanisms.
Reasoning is one of many logical tests using inference rules as commonly specified by means of an ontology language, and often a description language. Many reasoners use first-order predicate logic to perform reasoning; inference commonly proceeds by forward chaining or backward chaining.
As used herein, a shorthand reference to an instance record.
Used synonomously herein with attribute.
RSS (an acronym for Really Simple Syndication) is a family of web feed formats used to publish frequently updated digital content, such as blogs, news feeds or podcasts. is an initiative launched by the major search engines of Bing, Google and Yahoo!, and later jointed by Yandex, in order to create and support a common set of schema for structured data markup on web pages. provided a starter set of schema and extension mechanisms for adding to them. supports markup in microdata, microformat and RDFa formats.
Semantic enterprise
An organization that uses semantic technologies and the languages and standards of the semantic Web, including RDF, RDFS, OWL, SPARQL and others to integrate existing information assets, using the best practices of linked data and the open world assumption, and targeting knowledge management applications.
Semantic technology
Semantic technologies are a combination of software and semantic specifications that encodes meanings separately from data and content files and separately from application code. This approach enables machines as well as people to understand, share and reason with data and specifications separately. With semantic technologies, adding, changing and implementing new relationships or interconnecting programs in a different way can be as simple as changing the external model that these programs share. New data can also be brought into the system and visualized or worked upon based on the existing schema. Semantic technologies provide an abstraction layer above existing IT technologies that enables bridging and interconnection of data, content, and processes.
Semantic Web
The Semantic Web is a collaborative movement led by the World Wide Web Consortium (W3C) that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of data". It builds on the W3C's Resource Description Framework (RDF).
A semset is the use of a series of alternate labels and terms to describe a concept or entity. These alternatives include true synonyms, but may also be more expansive and include jargon, slang, acronyms or alternative terms that usage suggests refers to the same concept.
Semantically-Interlinked Online Communities Project (SIOC) is based on RDF and is an ontology defined using RDFS for interconnecting discussion methods such as blogs, forums and mailing lists to each other.
SKOS or Simple Knowledge Organisation System is a family of formal languages designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary; it is built upon RDF and RDFS.
Semantic Knowledge Source Integration provides a declarative mapping language and API between external sources of structured knowledge and the Cyc knowledge base.
SPARQL (pronounced "sparkle") is an RDF query language; its name is a recursive acronym that stands for SPARQL Protocol and RDF Query Language.
A statement is a "triple" in an ontology, which consists of a subject - predicate - object (S-P-O) assertion. By definition, each statement is a "fact" or axiom within an ontology.
A subject is always a noun or compound noun and is a reference or definition to a particular object, thing or topic, or groups of such items. Subjects are also often referred to as concepts or topics.
Subject extraction
Subject extraction is an automatic process for retrieving and selecting subject names from existing knowledge bases or data sets. Extraction methods involve parsing and tokenization, and then generally the application of one or more information extraction techniques or algorithms.
Subject proxy
A subject proxy as a canonical name or label for a particular object; other terms or controlled vocabularies may be mapped to this label to assist disambiguation. A subject proxy is always representative of its object but is not the object itself.
A tag is a keyword or term associated with or assigned to a piece of information (e.g., a picture, article, or video clip), thus describing the item and enabling keyword-based classification of information. Tags are usually chosen informally by either the creator or consumer of the item.
A TBox (for terminological knowledge, the basis for T in TBox) is a "terminological component"; that is, a conceptualization associated with a set of facts. TBox statements describe a conceptualization, a set of concepts and properties for these concepts. The TBox is sufficient to describe an ontology (best practice often suggests keeping a split between instance records -- and ABox -- and the TBox schema).
In the context of knowledge systems, taxonomy is the hierarchical classification of entities of interest of an enterprise, organization or administration, used to classify documents, digital assets and other information. Taxonomies can cover virtually any type of physical or conceptual entities (products, processes, knowledge fields, human groups, etc.) at any level of granularity.
The topic (or theme) is the part of the proposition that is being talked about (predicated). In topic maps, the topic may represent any concept, from people, countries, and organizations to software modules, individual files, and events. Topics and subjects are closely related.
Topic Map
Topic maps are an ISO standard for the representation and interchange of knowledge. A topic map represents information using topics, associations (similar to a predicate relationship), and occurrences (which represent relationships between topics and information resources relevant to them), quite similar in concept to the RDF triple.
A basic statement in the RDF language, which is comprised of a subject - property - object construct, with the subject and property (and object optionally) referenced by URIs.
Used synonomously herein with class.
UMBEL, short for Upper Mapping and Binding Exchange Layer, is an upper ontology of about 28,000 reference concepts, designed to provide common mapping points for relating different ontologies or schema to one another, and a vocabulary for aiding that ontology mapping, including expressions of likelihood relationships distinct from exact identity or equivalence. This vocabulary is also designed for interoperable domain ontologies.
Upper ontology
An upper ontology (also known as a top-level ontology or foundation ontology) is an ontology that describes very general concepts that are the same across all knowledge domains. An important function of an upper ontology is to support very broad semantic interoperability between a large number of ontologies that are accessible ranking "under" this upper ontology.
A vocabulary in the sense of knowledge systems or ontologies are controlled vocabularies. They provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies and other form of knowledge organization systems.
WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. The purpose is twofold: to produce a combination of dictionary and thesaurus that is more intuitively usable, and to support automatic text analysis and artificial intelligence applications. The database and software tools can be downloaded and used freely. Multiple language versions exist, and WordNet is a frequent reference structure for semantic applications.
"Yet another great ontology" is a WordNet structure placed on top of Wikipedia.