Articles via Databases
Articles via Journals
Online Catalog
E-books
Research & Information Literacy
Interlibrary loan
Theses & Dissertations
Collections
Policies
Services
About / Contact Us
Administration
Littman Architecture Library
This site will be removed in January 2019, please change your bookmarks.
This page will redirect to https://digitalcommons.njit.edu/dissertations/775 in 5 seconds

The New Jersey Institute of Technology's
Electronic Theses & Dissertations Project

Title: Structural auditing methodologies for controlled terminologies
Author: Min, Hua
View Online: njit-etd2006-082
(xi, 114 pages ~ 6.6 MB pdf)
Department: Department of Computer Science
Degree: Doctor of Philosophy
Program: Computer Science
Document Type: Dissertation
Advisory Committee: Perl, Yehoshua (Committee co-chair)
Halper, Michael (Committee co-chair)
Cohen, Barry (Committee member)
Elhanan, Gai (Committee member)
Geller, James (Committee member)
Gu, Huanying (Committee member)
Date: 2006-05
Keywords: Controlled terminology
Quality assurance
Inheritance
Auditing
Abstraction networks
Divide and conquer
Availability: Unrestricted
Abstract:

Several auditing methodologies for large controlled terminologies are developed. These are applied to the Unified Medical Language System XXXX and the National Cancer Institute Thesaurus (NCIT). Structural auditing methodologies are based on the structural aspects such as IS-A hierarchy relationships groups of concepts assigned to semantic types and groups of relationships defined for concepts. Structurally uniform groups of concepts tend to be semantically uniform. Structural auditing methodologies focus on concepts with unlikely or rare configuration. These concepts have a high likelihood for errors.

One of the methodologies is based on comparing hierarchical relationships between the META and SN, two major knowledge sources of the UMLS. In general, a correspondence between them is expected since the SN hierarchical relationships should abstract the META hierarchical relationships. It may indicate an error when a mismatch occurs.

The UMLS SN has 135 categories called semantic types. However, in spite of its medium size, the SN has limited use for comprehension purposes because it cannot be easily represented in a pictorial form, it has many (about 7,000) relationships. Therefore, a higher-level abstraction for the SN called a metaschema, is constructed. Its nodes are meta-semantic types, each representing a connected group of semantic types of the SN. One of the auditing methodologies is based on a kind of metaschema called a cohesive metaschema. The focus is placed on concepts of intersections of meta-semantic types. As is shown, such concepts have high likelihood for errors.

Another auditing methodology is based on dividing the NCIT into areas according to the roles of its concepts. Moreover, each multi-rooted area is further divided into pareas that are singly rooted. Each p-area contains a group of structurally and semantically uniform concepts. These groups, as well as two derived abstraction networks called taxonomies, help in focusing on concepts with potential errors. With genomic research being at the forefront of bioscience, this auditing methodology is applied to the Gene hierarchy as well as the Biological Process hierarchy of the NCIT, since processes are very important for gene information. The results support the hypothesis that the occurrence of errors is related to the size of p-areas. Errors are more frequent for small p-areas.


If you have any questions please contact the ETD Team, libetd@njit.edu.

 
ETD Information
Digital Commons @ NJIT
Theses and DIssertations
ETD Policies & Procedures
ETD FAQ's
ETD home

Request a Scan
NDLTD

NJIT's ETD project was given an ACRL/NJ Technology Innovation Honorable Mention Award in spring 2003