NJIT ETD: "Topics on multiple hypotheses testing and generalized linear model" by Zhu, Yalin

E-books

Research & Information Literacy

Interlibrary loan

Theses & Dissertations

Littman Architecture Library

This site will be removed in January 2019, please change your bookmarks.
This page will redirect to https://digitalcommons.njit.edu/dissertations/55 in 5 seconds

The New Jersey Institute of Technology's
Electronic Theses & Dissertations Project

Title: Topics on multiple hypotheses testing and generalized linear model

Author: Zhu, Yalin

View Online: njit-etd2017-131
(xvii, 144 pages ~ 2.2 MB pdf)

Department: Department of Mathematical Sciences

Degree: Doctor of Philosophy

Program: Mathematical Sciences

Document Type: Dissertation

Advisory Committee: Dhar, Sunil Kumar (Committee co-chair)
Guo, Wenge (Committee co-chair)
Loh, Ji Meng (Committee member)
Subramanian, Sundarraman (Committee member)
Roychowdhury, Satrajit (Committee member)

Date: 2017-12

Keywords: Multiple testing
Categorical data
Familywise error rate
False discovery rate
Clinical trials
Generalized linear model

Availability: Unrestricted

Abstract:
In applications such as studying drug adverse events (AE) in clinical trials and identifying differentially expressed genes in microarray experiments, the data of the experiments usually consists of frequency counts. In the analysis of such data, researchers often face multiple hypotheses testing based on discrete test statistics. Incorporating this discrete property of the data, several stepwise procedures, which allow to use the CDF of p-values to determine the testing threshold, are proposed for controlling familiwise error rate (FWER). It is shown that the proposed procedures strongly control the FWER and are more powerful than the existing ones for discrete data. Through some simulation studies and real data examples, the proposed procedures are shown to outperform the existing procedures in terms of the FWER control and power. An R package “MHTdiscrete” and a web application are developed for implementing the proposed procedures for discrete data.

Many complex biomedical studies, such as clinical safety studies and genome-wide association studies, often involve testing multiple families of hypotheses. Most existing multiple testing methods cannot guarantee strong control of appropriate type 1 error rates suitable for such increasingly complex research questions. A novel two-stage procedure based on the recently developed idea of selective inference for clinical safety studies is introduced. In the first stage, some significant families are selected by using some family-level global test, which guarantees control of generalized familywise error rate (k-FWER) among the selected families. In the second stage, individual hypotheses are tested for each selected families by using some multiple testing procedure, which controls conditional false discovery rate (cFDR) based on the fact that the family is selected. By applying the proposed procedure to clinical safety studies, one can not only efficiently flag the significant clinical adverse events (AEs) but also select body systems of interest (BSoI) as extra information for further research. The simulation studies show that the proposed procedure can be more reliable than alternative methods such as Mehrotra and Heyse’s double FDR procedure in the setting of clinical safety. The proposed procedure for multiple families structure is implemented in the R package “MHTmult”.

Categorical data arises in biomedical and healthcare experiments naturally. In many of these cases, the outcome variables of interest are the numbers of special events. At least one distinct special event category is observed, when the negative multinomial and extended negative multinomial or generalized inverse sampling scheme-based regression models are used. The new model, based on generalized inverse sampling scheme for several special events, is developed in this dissertation. This research is an adaption to the widely used multinomial logistic regression model. The resulting equations of the proposed model, corresponding to the natural log of the ratio of the expected responses, appears similar to the multinomial logistic regression. Using this expected response ratio of a category to that of the special category, the maximum likelihood estimator of the regression parameters can be computed by creating score equations and the Hessian matrix of the likelihood. The covariance matrix of estimators of the regression parameters for the new model can be estimated by inverting the Hessian matrix to develop the inference. This research also develops model diagnostics such as normality check with deviance and Pearson residuals, and likelihood based computations. The proposed model is implemented in the R package “mvlogit”.

If you have any questions please contact the ETD Team, libetd@njit.edu.

ETD Information

Digital Commons @ NJIT

Theses and DIssertations

ETD Policies & Procedures

ETD FAQ's

ETD home

Request a Scan

NDLTD

NJIT's ETD project was given an ACRL/NJ Technology Innovation Honorable Mention Award in spring 2003