Articles via Databases
Articles via Journals
Online Catalog
E-books
Research & Information Literacy
Interlibrary loan
Theses & Dissertations
Collections
Policies
Services
About / Contact Us
Administration
Littman Architecture Library
This site will be removed in January 2019, please change your bookmarks.
This page will redirect to https://digitalcommons.njit.edu/dissertations/813 in 5 seconds

The New Jersey Institute of Technology's
Electronic Theses & Dissertations Project

Title: Prediction of mRNA polyadenylation sites in the human genome and Mathematical modeling of alternative polyadenylation
Author: Cheng, Yiming
View Online: njit-etd2007-044
(xviii, 132 pages ~ 11.5 MB pdf)
Department: Department of Mathematical Sciences
Degree: Doctor of Philosophy
Program: Mathematical Sciences
Document Type: Dissertation
Advisory Committee: Miura, Robert M. (Committee co-chair)
Tian, Bin (Committee co-chair)
Byrne, Bruce C. (Committee member)
Dhar, Sunil Kumar (Committee member)
Golowasch, Jorge P. (Committee member)
Date: 2007-05
Keywords: Mathematical modeling
Alternative polyadenylation
Polyadenylation sites
Polyadenylation
SAGE data analysis
Support vector machine
Availability: Unrestricted
Abstract:

Messenger RNA (mRNA) polyadenylation plays many important roles in the cell, such as transcription termination, mRNA stability and transportation, and mRNA translation in eukaryotic cells. A large number of human and mouse genes have multiple polyadenylation sites (referred to as poly(A) sites) that lead to variable transcripts, some of which are translated into various protein products with different functions. However, the details about when and where the polyadenylation occurs, and how pre-mRNA switches from one poly(A) site to another are still unknown. This kind of 3 '-end processing can be regulated by the cell environment, cell cycle stage, and tissue type.

It is generally accepted that the cleavage of pre-mRNA is based on the sequence of nucleotides around the poly(A) sites. So it is possible to predict the poly(A) sites accurately based on the pre-mRNA sequence. To accomplish the supervised prediction of a poly(A) site, a set of statistical models has been used, such as linear discriminant analysis, quadratic discriminant analysis, and support vector machine (SVM). Among these, SVM was chosen as the classification algorithm for the prediction of poly(A) sites in this work. A program called polya svm has been developed using PERL. The true positive and accuracy results obtained using this method are better than the results obtained using other commonly used algorithms.

Compared with the microarray technique, serial analysis of gene expression (SAGE) is another powerful technology for measuring the mRNA expression levels. Our study is the first investigation of the regulation of the transcripts from the same gene by analyzing the SAGE data. By filtering the noise data from the database and calculating the correlation between transcripts from the same unigene cluster, some significant genes are found to have multiple transcripts with opposite expression levels. These genes might be very interesting to biologists and they are worth being verified by biological experiments.

Alternative polyadenylation has been found to be very common in human and mouse genes recently. It has been believed that the selection of different poly(A) sites is related to biological factors such as the developmental stages, cell conditions, and the availability and abundance of some protein factors. However, it is not clear how these factors affect alternative polyadenylation. Mathematical modeling is applied to understand the dynamical selection of poly(A) sites. Cleavage stimulation Factor (CstF) is a very important protein complex required for efficient cleavage, containing subunits of 77, 64, and 50 kD (CstF-77, CstF-64, CstF-50). It has been found that human cstf-77 gene has several different transcripts due to the alternative polyadenylation and the expression levels of these transcripts display some auto-regulation. A mathematical model with a time delay is constructed to simulate the dynamical gene expression levels of gene cstf-77. Experimental data are compared with the model. This kind of mathematical model can also be extended to some other polyadenylation factors that have similar alternative polyadenylation patterns.


If you have any questions please contact the ETD Team, libetd@njit.edu.

 
ETD Information
Digital Commons @ NJIT
Theses and DIssertations
ETD Policies & Procedures
ETD FAQ's
ETD home

Request a Scan
NDLTD

NJIT's ETD project was given an ACRL/NJ Technology Innovation Honorable Mention Award in spring 2003