Articles via Databases
Articles via Journals
Online Catalog
E-books
Research & Information Literacy
Interlibrary loan
Theses & Dissertations
Collections
Policies
Services
About / Contact Us
Administration
Littman Architecture Library
This site will be removed in January 2019, please change your bookmarks.
This page will redirect to https://digitalcommons.njit.edu/theses/47 in 5 seconds

The New Jersey Institute of Technology's
Electronic Theses & Dissertations Project

Title: In silico prediction of non-coding RNAs using supervised learning and feature ranking methods
Author: Griesmer, Stephen J.
View Online: njit-etd2010-001
(xii, 61 pages ~ 2.4 MB pdf)
Department: Department of Computer Science
Degree: Master of Science
Program: Bioinformatics
Document Type: Thesis
Advisory Committee: Wang, Jason T. L. (Committee chair)
Liu, Chengjun (Committee member)
Nassimi, David (Committee member)
Date: 2010-01
Keywords: Non-coding RNA classification
RNAMultifold
Availability: Unrestricted
Abstract:

This thesis presents a novel method, RNAMultifold, for development of a non-coding RNA (ncRNA) classification model based on features derived from folding the consensus sequence of multiple sequence alignments using different folding programs: RNAalifold, CentroidFold, and RSpredict. The method ranks these folding features according to a Class Separation Measure (CSM) that quantifies the ability of the features to differentiate between samples from positive and negative test sets. The set of top-ranked features is then used to construct classification models: Naive Bayes, Fisher Linear Discriminant, and Support Vector Machine (SVM). These models are compared to the performance of the same models with a baseline feature set and with an existing classification tool, RNAz.

The Support Vector Machine classification model with a radial basis function kernel, using the top 11 ranked features, is shown to be more sensitive than other models, including another ncRNA prediction program, RNAz, across all specificity values for the RNA families under study. In addition, the target feature set outperforms the baseline feature set of z score and structure conservation index across all classification methods, with the exception of Fisher Linear Discriminant. The RNAMultifold method is then used to search the genome of a Trypanosome species (Trypanosoma brucei) for novel ncRNAs. The results of this search are compared with known ncRNAs and with results from RNAz.


If you have any questions please contact the ETD Team, libetd@njit.edu.

 
ETD Information
Digital Commons @ NJIT
Theses and DIssertations
ETD Policies & Procedures
ETD FAQ's
ETD home

Request a Scan
NDLTD

NJIT's ETD project was given an ACRL/NJ Technology Innovation Honorable Mention Award in spring 2003