Articles via Databases
Articles via Journals
Online Catalog
E-books
Research & Information Literacy
Interlibrary loan
Theses & Dissertations
Collections
Policies
Services
About / Contact Us
Administration
Littman Architecture Library
This site will be removed in January 2019, please change your bookmarks.
This page will redirect to https://digitalcommons.njit.edu/dissertations/1432/ in 5 seconds

The New Jersey Institute of Technology's
Electronic Theses & Dissertations Project

Title: Dimension reduction techniques for high dimensional and ultra-high dimensional data
Author: Datta, Subha
View Online: njit-etd2019-060
(xi, 70 pages ~ 0.9 MB pdf)
Department: Department of Mathematical Sciences
Degree: Doctor of Philosophy
Program: Mathematical Sciences
Document Type: Dissertation
Advisory Committee: Loh, Ji Meng (Committee co-chair)
Fang, Yixin (Committee co-chair)
Dhar, Sunil Kumar (Committee member)
Wang, Antai (Committee member)
Feng, Yang (Committee member)
Date: 2019-12
Keywords: Big data
High-dimensional
Spatial statistics
Ultra high-dimensional
Weighted principal support vector machine
Availability: Unrestricted
Abstract:

This dissertation introduces two statistical techniques to tackle high-dimensional data, which is very commonplace nowadays. It consists of two topics which are inter-related by a common link, dimension reduction.

The first topic is a recently introduced classification technique, the weighted principal support vector machine (WPSVM), which is incorporated into a spatial point process framework. The WPSVM possesses an additional parameter, a weight parameter, besides the regularization parameter. Most statistical techniques, including WPSVM, have an inherent assumption of independence, which means the data points are not connected with each other in any manner. But spatial data violates this assumption. Correlation between two spatial data points increases as the distance between them decreases. However, under some conditions on the spatial point process, the WPSVM is still valid. Furthermore, through extensive simulations it has been shown that WPSVM performs better than other dimension reduction techniques. The main advantage of WPSVM comes from the fact that it can handle non-linear relationships. WPSVM is also applied to a rainforest dataset.

The second topic talks about another recently introduced technique, joint-screening. Unlike the previous method, this works for ultra-high dimensional data (p >> n). Most existing variable screening methods fail to identify those marginally unimportant but jointly important genetic variables. The joint screening (JS) procedure screens all the covariates at the same time based on a criterion. In this way a subset of variables that are suspected to be highly associated with the outcome can be identified. One massive advantage of the JS procedure comes from the fact that it is computationally simple and easy to understand. The performance of the proposed JS procedure is evaluated via simulation studies and an application to the Genetics Analysis Workshop 20 data.


If you have any questions please contact the ETD Team, libetd@njit.edu.

 
ETD Information
Digital Commons @ NJIT
Theses and DIssertations
ETD Policies & Procedures
ETD FAQ's
ETD home

Request a Scan
NDLTD

NJIT's ETD project was given an ACRL/NJ Technology Innovation Honorable Mention Award in spring 2003