Articles via Databases
Articles via Journals
Online Catalog
E-books
Research & Information Literacy
Interlibrary loan
Theses & Dissertations
Collections
Policies
Services
About / Contact Us
Administration
Littman Architecture Library
This site will be removed in January 2019, please change your bookmarks.
This page will redirect to https://digitalcommons.njit.edu/dissertations/551 in 5 seconds

The New Jersey Institute of Technology's
Electronic Theses & Dissertations Project

Title: Knowledge discovery and modeling in genomic databases
Author: Yin, Michael M.
View Online: njit-etd2002-084
(xi, 114 pages ~ 4.8 MB pdf)
Department: Department of Computer and Information Science
Degree: Doctor of Philosophy
Program: Computer and Information Science
Document Type: Dissertation
Advisory Committee: Wang, Jason T. L. (Committee chair)
McHugh, James A. (Committee member)
Shih, Frank Y. (Committee member)
Oria, Vincent (Committee member)
Ruan, Xiaoan (Committee member)
Date: 2002-08
Keywords: Gene detection
Hidden Markov models
Bioinformatics
Splicing junction
Computational biology
Availability: Unrestricted
Abstract:

This dissertation research is targeted toward developing effective and accurate methods for identifying gene structures in the genomes of high eukaryotes, such as vertebrate organisms. Several effective hidden Markov models (HMMs) are developed to represent the consensus and degeneracy features of the functional sites including protein-translation start sites, mRNA splicing junction donor and acceptor sites in vertebrate genes. The HMM system based on the developed models is fully trained using an expectation maximization (EM) algorithm and the system performance is evaluated using a 10-way cross-validation method. Experimental results show that the proposed HMM system achieves high sensitivity and specificity in detecting the functional sites.

This HMM system is then incorporated into a new gene detection system, called GeneScout. The main hypothesis is that, given a vertebrate genomic DNA sequence S, it is always possible to construct a directed acyclic graph G such that the path for the actual coding region of S is in the set of all paths on G. Thus, the gene detection problem is reduced to the analysis of paths in the graph G. A dynamic programming algorithm is employed by GeneScout to find the optimal path in G. Experimental results on the standard test dataset collected by Burset and Guigo indicate that GeneScout is comparable to existing gene discovery tools and complements the widely used GenScan system.


If you have any questions please contact the ETD Team, libetd@njit.edu.

 
ETD Information
Digital Commons @ NJIT
Theses and DIssertations
ETD Policies & Procedures
ETD FAQ's
ETD home

Request a Scan
NDLTD

NJIT's ETD project was given an ACRL/NJ Technology Innovation Honorable Mention Award in spring 2003