Articles via Databases
Articles via Journals
Online Catalog
E-books
Research & Information Literacy
Interlibrary loan
Theses & Dissertations
Collections
Policies
Services
About / Contact Us
Administration
Littman Architecture Library
This site will be removed in January 2019, please change your bookmarks.
This page will redirect to https://digitalcommons.njit.edu/dissertations/727 in 5 seconds

The New Jersey Institute of Technology's
Electronic Theses & Dissertations Project

Title: Improving document representation by accumulating relevance feedback : the relevance feedback accumulation (RFA) algorithm
Author: Bot, Razvan Stefan
View Online: njit-etd2005-127
(xiv, 132 pages ~ 6.8 MB pdf)
Department: Department of Information Systems
Degree: Doctor of Philosophy
Program: Information Systems
Document Type: Dissertation
Advisory Committee: Wu, Yi-Fang Brook (Committee chair)
Turoff, Murray (Committee member)
Oria, Vincent (Committee member)
Belkin, Nicholas J. (Committee member)
Van de Walle, Bartel Albrecht (Committee member)
Date: 2005-05
Keywords: Information retrieval
Relevance feedback
Document representation
Availability: Unrestricted
Abstract:

Document representation (indexing) techniques are dominated by variants of the term-frequency analysis approach, based on the assumption that the more occurrences a term has throughout a document the more important the term is in that document. Inherent drawbacks associated with this approach include: poor index quality, high document representation size and the word mismatch problem. To tackle these drawbacks, a document representation improvement method called the Relevance Feedback Accumulation (RFA) algorithm is presented. The algorithm provides a mechanism to continuously accumulate relevance assessments over time and across users. It also provides a document representation modification function, or document representation learning function that gradually improves the quality of the document representations. To improve document representations, the learning function uses a data mining measure called "support" for analyzing the accumulated relevance feedback.

Evaluation is done by comparing the RFA algorithm to other four algorithms. The four measures used for evaluation are (a) average number of index terms per document; (b) the quality of the document representations assessed by human judges; (c) retrieval effectiveness; and (d) the quality of the document representation learning function. The evaluation results show that (1) the algorithm is able to substantially reduce the document representations size while maintaining retrieval effectiveness parameters; (2) the algorithm provides a smooth and steady document representation learning function; and (3) the algorithm improves the quality of the document representations. The RFA algorithm's approach is consistent with efficiency considerations that hold in real information retrieval systems.

The major contribution made by this research is the design and implementation of a novel, simple, efficient, and scalable technique for document representation improvement.


If you have any questions please contact the ETD Team, libetd@njit.edu.

 
ETD Information
Digital Commons @ NJIT
Theses and DIssertations
ETD Policies & Procedures
ETD FAQ's
ETD home

Request a Scan
NDLTD

NJIT's ETD project was given an ACRL/NJ Technology Innovation Honorable Mention Award in spring 2003