NJIT ETD: "Text mining with exploitation of user's background knowledge : discovering novel association rules from text" by Chen, Xin

E-books

Research & Information Literacy

Interlibrary loan

Theses & Dissertations

Littman Architecture Library

This site will be removed in January 2019, please change your bookmarks.
This page will redirect to https://digitalcommons.njit.edu/dissertations/742 in 5 seconds

The New Jersey Institute of Technology's
Electronic Theses & Dissertations Project

Title: Text mining with exploitation of user's background knowledge : discovering novel association rules from text

Author: Chen, Xin

View Online: njit-etd2006-020
(xvii, 164 pages ~ 7.8 MB pdf)

Department: Department of Information Systems

Degree: Doctor of Philosophy

Program: Information Systems

Document Type: Dissertation

Advisory Committee: Wu, Yi-Fang Brook (Committee chair)
Turoff, Murray (Committee member)
Im, Il (Committee member)
Oria, Vincent (Committee member)
Zeng, Marcia L. (Committee member)

Date: 2006-01

Keywords: Text mining
Association rule mining
Interestingness
Data mining
Noun phrase extraction

Availability: Unrestricted

Abstract:
The goal of text mining is to find interesting and non-trivial patterns or knowledge from unstructured documents. Both objective and subjective measures have been proposed in the literature to evaluate the interestingness of discovered patterns. However, objective measures alone are insufficient because such measures do not consider knowledge and interests of the users. Subjective measures require explicit input of user expectations which is difficult or even impossible to obtain in text mining environments.

This study proposes a user-oriented text-mining framework and applies it to the problem of discovering novel association rules from documents. The developed system, uMining, consists of two major components: a background knowledge developer and a novel association rules miner. The background knowledge developer learns a user's background knowledge by extracting keywords from documents already known to the user (background documents) and developing a concept hierarchy to organize popular keywords. The novel association rule miner discovers association rules among noun phrases extracted from relevant documents (target documents) and compares the rules with the background knowledge to predict the rule novelty to the particular user (useroriented novelty).

The user-oriented novelty measure is defined as the semantic distance between the antecedent and the consequent of a rule in the background knowledge. It consists of two components: occurrence distance and connection distance. The former considers the co-occurrences of two keywords in the background documents: the more the shorter the distance. The latter considers the common connections of with others in the concept hierarchy. It is defined as the length of the connecting the two keywords in the concept hierarchy: the longer the path, distance.

The user-oriented novelty measure is evaluated from two perspectives: novelty prediction accuracy and usefulness indication power. The results show that the useroriented novelty measure outperforms the WordNet novelty measure and the compared objective measures in term of predicting novel rules and identifying useful rules.

If you have any questions please contact the ETD Team, libetd@njit.edu.

ETD Information

Digital Commons @ NJIT

Theses and DIssertations

ETD Policies & Procedures

ETD FAQ's

ETD home

Request a Scan

NDLTD

NJIT's ETD project was given an ACRL/NJ Technology Innovation Honorable Mention Award in spring 2003