Articles via Databases
Articles via Journals
Online Catalog
E-books
Research & Information Literacy
Interlibrary loan
Theses & Dissertations
Collections
Policies
Services
About / Contact Us
Administration
Littman Architecture Library
This site will be removed in January 2019, please change your bookmarks.
This page will redirect to https://digitalcommons.njit.edu/dissertations/1572/ in 5 seconds

The New Jersey Institute of Technology's
Electronic Theses & Dissertations Project

Title: On resource-efficiency and performance optimization in big data computing and networking using machine learning
Author: Liu, Wuji
View Online: njit-etd2021-071
(xii, 64 pages ~ 1.7 MB pdf)
Department: Department of Computer Science
Degree: Doctor of Philosophy
Program: Computer Science
Document Type: Dissertation
Advisory Committee: Wu, Chase Qishi (Committee chair)
Wang, Jason T. L. (Committee member)
Basu Roy, Senjuti (Committee member)
Liu, Qing Gary (Committee member)
Zhao, Hui (Committee member)
Date: 2021-12
Keywords: High performance computing
High performance networking
Latent effect analysis
Machine learning
Availability: Unrestricted
Abstract:

Due to the rapid transition from traditional experiment-based approaches to large-scale, computational intensive simulations, next-generation scientific applications typically involve complex numerical modeling and extreme-scale simulations. Such model-based simulations oftentimes generate colossal amounts of data, which must be transferred over high-performance network (HPN) infrastructures to remote sites and analyzed against experimental or observation data on high-performance computing (HPC) facility. Optimizing the performance of both data transfer in HPN and simulation-based model development on HPC is critical to enabling and accelerating knowledge discovery and scientific innovation. However, such processes generally involve an enormous set of attributes including domain-specific model parameters, network transport properties, and computing system configurations. The vast space of model parameters, the sheer volume of generated data, the limited amount of allocatable bandwidths, and the complex settings of computing systems make it practically infeasible for domain experts to manually deploy and optimize big data transfer and computing solutions in next-generation scientific applications.

The research in this dissertation identifies such attributes in networks, systems, and models, conducts in-depth exploratory analysis of their impacts on data transfer throughput, computing efficiency, and modeling accuracy, and designs and customizes various machine learning techniques to optimize the performance of big data transfer in HPN, big data computing on HPC, and model development through large-scale simulations. Particularly, unobservable latent factors such as competing loads on end hosts are investigated and an algorithm named Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is employed to eliminate their negative impacts on performance prediction using machine learning models such as Support Vector Regression (SVR). Based on such analysis results, a customized, domain-specific loss function is employed within machine learning models such as Stochastic Gradient Descent Regression for throughput prediction to advise bandwidth allocation in HPN. A Bayesian Optimization (BO)-based online computational steering framework is also designed to facilitate the process of scientific simulations and improve the accuracy of model development. The solution proposed in this dissertation provides an additional layer of intelligence in big data transfer and computing, and the resulted machine learning techniques help guide strategic provisioning of high-performance networking and computing resources to maximize the performance of next-generation scientific applications.


If you have any questions please contact the ETD Team, libetd@njit.edu.

 
ETD Information
Digital Commons @ NJIT
Theses and DIssertations
ETD Policies & Procedures
ETD FAQ's
ETD home

Request a Scan
NDLTD

NJIT's ETD project was given an ACRL/NJ Technology Innovation Honorable Mention Award in spring 2003