Semantic interoperability
Semantic interoperability is the ability of computer systems to exchange data with unambiguous, shared meaning. Semantic interoperability is a requirement to enable machine computable logic, inferencing, knowledge discovery, and data federation between information systems.[1]
Semantic interoperability is therefore concerned not just with the packaging of data (syntax), but the simultaneous transmission of the meaning with the data (semantics). This is accomplished by adding data about the data (metadata), linking each data element to a controlled, shared vocabulary. The meaning of the data is transmitted with the data itself, in one self-describing "information package" that is independent of any information system. It is this shared vocabulary, and its associated links to an ontology, which provides the foundation and capability of machine interpretation, inference, and logic.
Syntactic interoperability (see below) is a prerequisite for semantic interoperability. Syntactic interoperability refers to the packaging and transmission mechanisms for data. In healthcare, NHL has been in use for over thirty years (which predates the internet and web technology), and uses the pipe character (|) as a data delimiter. The current internet standard for document markup is XML, which uses "< >" as a data delimiter. The data delimiters convey no meaning to the data other than to structure the data. Without a data dictionary to translate the contents of the delimiters, the data remains meaningless. While there are many attempts at creating data dictionaries and information models to associate with these data packaging mechanisms, none have been practical to implement. This has only perpetuated the ongoing "babelization" of data and inability to exchange data with meaning.
Since the introduction of the Semantic Web concept by Tim Berners-Lee in 1999,[2] there has been growing interest and application of the W3C (World Wide Web Consortium) standards to provide web-scale semantic data exchange, federation, and inferencing capabilities.
Semantic as a function of syntactic interoperability
Syntactic interoperability, provided by for instance XML or the SQL standards, is a pre-requisite to semantic. It involves a common data format and common protocol to structure any data so that the manner of processing the information will be interpretable from the structure. It also allows detection of syntactic errors, thus allowing receiving systems to request resending of any message that appears to be garbled or incomplete. No semantic communication is possible if the syntax is garbled or unable to represent the data. However, information represented in one syntax may in some cases be accurately translated into a different syntax. Where accurate translation of syntaxes is possible, systems using different syntaxes may also interoperate accurately. In some cases, the ability to accurately translate information among systems using different syntaxes may be limited to one direction, when the formalisms used have different levels of expressivity (ability to express information).
A single ontology containing representations of every term used in every application is generally considered impossible, because of the rapid creation of new terms or assignments of new meanings to old terms. However, though it is impossible to anticipate every concept that a user may wish to represent in a computer, there is the possibility of finding some finite set of "primitive" concept representations that can be combined to create any of the more specific concepts that users may need for any given set of applications or ontologies. Having a foundation ontology (also called upper ontology) that contains all those primitive elements would provide a sound basis for general semantic interoperability, and allow users to define any new terms they need by using the basic inventory of ontology elements, and still have those newly defined terms properly interpreted by any other computer system that can interpret the basic foundation ontology. Whether the number of such primitive concept representations is in fact finite, or will expand indefinitely, is a question under active investigation. If it is finite, then a stable foundation ontology suitable to support accurate and general semantic interoperability can evolve after some initial foundation ontology has been tested and used by a wide variety of users. At the present time, no foundation ontology has been adopted by a wide community, so such a stable foundation ontology is still in the future.
Words and meanings
One persistent misunderstanding recurs in discussion of semantics is "the confusion of words and meanings". The meanings of words change, sometimes rapidly. But a formal language such as used in an ontology can encode the meanings (semantics) of concepts in a form that does not change. In order to determine what is the meaning of a particular word (or term in a database, for example) it is necessary to label each fixed concept representation in an ontology with the word(s) or term(s) that may refer to that concept. When multiple words refer to the same (fixed) concept in language this is called synonymy; when one word is used to refer to more than one concept, that is called ambiguity. Ambiguity and synonymy are among the factors that make computer understanding of language very difficult. The use of words to refer to concepts (the meanings of the words used) is very sensitive to the context and the purpose of any use for many human-readable terms. The use of ontologies in supporting semantic interoperability is to provide a fixed set of concepts whose meanings and relations are stable and can be agreed to by users. The task of determining which terms in which contexts (each database is a different context) is then separated from the task of creating the ontology, and must be taken up by the designer of a database, or the designer of a form for data entry, or the developer of a program for language understanding. When the meaning of a word used in some interoperable context is changed, then to preserve interoperability it is necessary to change the pointer to the ontology element(s) that specifies the meaning of that word.
Knowledge representation requirements and languages
A knowledge representation language may be sufficiently expressive to describe nuances of meaning in well understood fields. There are at least five levels of complexity of these.
For general semi-structured data one may use a general purpose language such as XML.[3]
Languages with the full power of first-order predicate logic may be required for many tasks.
Human languages are highly expressive, but are considered too ambiguous to allow the accurate interpretation desired, given the current level of human language technology.
Prior agreement not required
Semantic interoperability may be distinguished from other forms of interoperability by considering whether the information transferred has, in its communicated form, all of the meaning required for the receiving system to interpret it correctly, even when the algorithms used by the receiving system are unknown to the sending system. Consider sending one number:
If that number is intended to be the sum of money owed by one company to another, it implies some action or lack of action on the part of both those who send it and those who receive it.
It may be correctly interpreted if sent in response to a specific request, and received at the time and in the form expected. This correct interpretation does not depend only on the number itself, which could represent almost any of millions of types of quantitative measurement, rather it depends strictly on the circumstances of transmission. That is, the interpretation depends on both systems expecting that the algorithms in the other system use the number in exactly the same sense, and it depends further on the entire envelope of transmissions that preceded the actual transmission of the bare number. By contrast, if the transmitting system does not know how the information will be used by other systems, it is necessary to have a shared agreement on how information with some specific meaning (out of many possible meanings) will appear in a communication. For a particular task, one solution is to standardize a form, such as a request for payment; that request would have to encode, in standardized fashion, all of the information needed to evaluate it, such as: the agent owing the money, the agent owed the money, the nature of the action giving rise to the debt, the agents, goods, services, and other participants in that action; the time of the action; the amount owed and currency in which the debt is reckoned; the time allowed for payment; the form of payment demanded; and other information. When two or more systems have agreed on how to interpret the information in such a request, they can achieve semantic interoperability for that specific type of transaction. For semantic interoperability generally, it is necessary to provide standardized ways to describe the meanings of many more things than just commercial transactions, and the number of concepts whose representation needs to be agreed upon are at a minimum several thousand.
Ontology research
How to achieve semantic interoperability for more than a few restricted scenarios is currently a matter of research and discussion. For the problem of General Semantic Interoperability, some form of foundation ontology ('upper ontology') is required that is sufficiently comprehensive to provide the definition of concepts for more specialized ontologies in multiple domains. Over the past decade, more than ten foundation ontologies have been developed, but none have as yet been adopted by a wide user base.
The need for a single comprehensive all-inclusive ontology to support Semantic Interoperability can be avoided by designing the common foundation ontology as a set of basic ("primitive") concepts that can be combined to create the logical descriptions of the meanings of terms used in local domain ontologies or local databases. This tactic is based on the principle that:
If:
(1) the meanings and usage of the primitive ontology elements in the foundation ontology are agreed on, and (2) the ontology elements in the domain ontologies are constructed as logical combinations of the elements in the foundation ontology,
Then:
The intended meanings of the domain ontology elements can be computed automatically using an FOL (first-order logic) reasoner, by any system that accepts the meanings of the elements in the foundation ontology, and has both the foundation ontology and the logical specifications of the elements in the domain ontology.
Therefore:
Any system wishing to interoperate accurately with another system need transmit only the data to be communicated, plus any logical descriptions of terms used in that data that were created locally and are not already in the common foundation ontology.
This tactic then limits the need for prior agreement on meanings to only those ontology elements in the common Foundation Ontology (FO). Based on several considerations, this is likely to be fewer than 10,000 elements (types and relations).
In practice, together with the FO focused on representations of the primitive concepts, a set of domain extension ontologies to the FO with elements specified using the FO elements will likely also be used. Such pre-existing extensions will ease the cost of creating domain ontologies by providing existing elements with the intended meaning, and will reduce the chance of error by using elements that have already been tested. Domain extension ontologies may be logically inconsistent with each other, and that needs to be determined if different domain extensions are used in any communication.
Whether use of such a single foundation ontology can itself be avoided by sophisticated mapping techniques among independently developed ontologies is also under investigation.
Importance
The practical significance of semantic interoperability has been measured by several studies that estimate the cost (in lost efficiency) due to lack of semantic interoperability. One study,[4] focusing on the lost efficiency in the communication of healthcare information, estimated that US$77.8 billion per year could be saved by implementing an effective interoperability standard in that area. Other studies, of the construction industry[5] and of the automobile manufacturing supply chain,[6] estimate costs of over US$10 billion per year due to lack of semantic interoperability in those industries. In total these numbers can be extrapolated to indicate that well over US$100 billion per year is lost because of the lack of a widely used semantic interoperability standard in the US alone.
There has not yet been a study about each policy field that might offer big cost savings applying semantic interoperability standards. But to see which policy fields are capable of profiting from semantic interoperability, see 'Interoperability' in general. Such policy fields are eGovernment, health, security and many more. The EU also set up the Semantic Interoperability Centre Europe in June 2007.
See also
- Data integration
- Interoperability, a more general concept
- Semantic computing
- UDEF, Universal Data Element Framework
References
- NCOIC, "SCOPE", Network Centric Operations Industry Consortium, 2008
- Berners-Lee, Tim; Fischetti, Mark (1999). Weaving the Web. HarperSanFrancisco. chapter 12. ISBN 978-0-06-251587-2.
- XML as a tool for Semantic Interoperability Semantic Interoperability on the Web, Jeff Heflin and James Hendler
- Jan Walker, Eric Pan, Douglas Johnston, Julia Adler-Milstein, David W. Bates and Blackford Middleton, The Value of Healthcare Information Exchange and Interoperability Health Affairs, 19 January 2005
- Microsoft Word - 08657 Final Rpt_8-2-04.doc
- https://www.nist.gov/director/prog-ofc/report99-1.pdf