NJIT ETD: "Machine learning and network embedding methods for gene co-expression networks" by Aghaieabiane, Niloofar

E-books

Research & Information Literacy

Interlibrary loan

Theses & Dissertations

Littman Architecture Library

This site will be removed in January 2019, please change your bookmarks.
This page will redirect to https://digitalcommons.njit.edu/dissertations/1658/ in 5 seconds

The New Jersey Institute of Technology's
Electronic Theses & Dissertations Project

Title: Machine learning and network embedding methods for gene co-expression networks

Author: Aghaieabiane, Niloofar

View Online: njit-etd2023-016
(xviii, 111 pages ~ 4.8 MB pdf)

Department: Department of Computer Science

Degree: Doctor of Philosophy

Program: Computer Science

Document Type: Dissertation

Advisory Committee: Koutis, Ioannis (Committee chair)
Schieber, Baruch (Committee member)
Wei, Zhi (Committee member)
Basu Roy, Senjuti (Committee member)
Tzatsos, Alexandros (Committee member)

Date: 2023-05

Keywords: Clustering
Gene co-expression networks
Machine learning
Network embedding
Networks and graphs
Self-training

Availability: Unrestricted

Abstract:
High-throughput technologies such as DNA microarrays and RNA-seq are used to measure the expression levels of large numbers of genes simultaneously. To support the extraction of biological knowledge, individual gene expression levels are transformed into Gene Co-expression Networks (GCNs). GCNs are analyzed to discover gene modules. GCN construction and analysis is a well-studied topic, for nearly two decades. While new types of sequencing and the corresponding data are now available, the software package WGCNA and its most recent variants are still widely used, contributing to biological discovery.
The discovery of biologically significant modules of genes from raw expression data is a non-typical unsupervised problem; while there are no training data to drive the computational discovery of modules, the biological significance of the discovered modules can be evaluated with the widely used module enrichment metric, measuring the statistical significance of the occurrence of Gene Ontology terms within the computed modules. WGCNA and other related methods are entirely heuristic and they do not leverage the aforementioned non-typical nature of the underlying unsupervised problem.
The main contribution of this thesis is SGCP, a novel Self-Training Gene Clustering Pipeline for discovering modules of genes from raw expression data. SGCP almost entirely replaces the steps followed by existing methods, based on recent progress in mathematically justified unsupervised clustering algorithms. It also introduces a conceptually novel self-training step that leverages Gene Ontology information to modify and improve the set of modules computed by the unsupervised algorithm.
SGCP is tested on a rich set of DNA microarrays and RNA-seq benchmarks, coming from various organisms. These tests show that SGCP greatly outperforms all previous methods, resulting in highly enriched modules. Furthermore, these modules are often quite dissimilar from those computed by previous methods, suggesting the possibility that SGCP can indeed become an auxiliary tool for extracting biological knowledge. To this end, SGCP is implemented as an easy-to-use R package that is made available on Bioconductor.

If you have any questions please contact the ETD Team, libetd@njit.edu.

ETD Information

Digital Commons @ NJIT

Theses and DIssertations

ETD Policies & Procedures

ETD FAQ's

ETD home

Request a Scan

NDLTD

NJIT's ETD project was given an ACRL/NJ Technology Innovation Honorable Mention Award in spring 2003