NJIT ETD: "Concept graphs: Applications to biomedical text categorization and concept extraction" by Bleik, Said

E-books

Research & Information Literacy

Interlibrary loan

Theses & Dissertations

Littman Architecture Library

This site will be removed in January 2019, please change your bookmarks.
This page will redirect to https://digitalcommons.njit.edu/dissertations/360 in 5 seconds

The New Jersey Institute of Technology's
Electronic Theses & Dissertations Project

Title: Concept graphs: Applications to biomedical text categorization and concept extraction

Author: Bleik, Said

View Online: njit-etd2013-067
(xiv, 117 pages ~ 3.1 MB pdf)

Department: Department of Information Systems

Degree: Doctor of Philosophy

Program: Information Systems

Document Type: Dissertation

Advisory Committee: Song, Min (Committee chair)
Deek, Fadi P. (Committee member)
Geller, James (Committee member)
Oria, Vincent (Committee member)
Huan, Jun (Committee member)

Date: 2013-05

Keywords: Text categorization
Concept extraction
Graph mining
Text mining
Graph representation
Concept graphs

Availability: Unrestricted

Abstract:
As science advances, the underlying literature grows rapidly providing valuable knowledge mines for researchers and practitioners. The text content that makes up these knowledge collections is often unstructured and, thus, extracting relevant or novel information could be nontrivial and costly. In addition, human knowledge and expertise are being transformed into structured digital information in the form of vocabulary databases and ontologies. These knowledge bases hold substantial hierarchical and semantic relationships of common domain concepts. Consequently, automating learning tasks could be reinforced with those knowledge bases through constructing human-like representations of knowledge. This allows developing algorithms that simulate the human reasoning tasks of content perception, concept identification, and classification.

This study explores the representation of text documents using concept graphs that are constructed with the help of a domain ontology. In particular, the target data sets are collections of biomedical text documents, and the domain ontology is a collection of predefined biomedical concepts and relationships among them. The proposed representation preserves those relationships and allows using the structural features of graphs in text mining and learning algorithms. Those features emphasize the significance of the underlying relationship information that exists in the text content behind the interrelated topics and concepts of a text document. The experiments presented in this study include text categorization and concept extraction applied on biomedical data sets. The experimental results demonstrate how the relationships extracted from text and captured in graph structures can be used to improve the performance of the aforementioned applications. The discussed techniques can be used in creating and maintaining digital libraries through enhancing indexing, retrieval, and management of documents as well as in a broad range of domain-specific applications such as drug discovery, hypothesis generation, and the analysis of molecular structures in chemoinformatics.

If you have any questions please contact the ETD Team, libetd@njit.edu.

ETD Information

Digital Commons @ NJIT

Theses and DIssertations

ETD Policies & Procedures

ETD FAQ's

ETD home

Request a Scan

NDLTD

NJIT's ETD project was given an ACRL/NJ Technology Innovation Honorable Mention Award in spring 2003