This dissertation is divided in two Parts. The first Part explores probabilistic modeling of propagation of computer `malware' (generally referred to as `virus') across a network of computers, and investigates modeling improvements achieved by introducing a random latency period during which an infected computer in the network is unable to infect others. In the second Part, two approaches for modeling life distributions in univariate and bivariate setups are developed.
In Part I, homogeneous and non-homogeneous stochastic susceptible-exposedinfectious- recovered (SEIR) models are specifically explored for the propagation of computer virus over the Internet by borrowing ideas from mathematical epidemiology. Large computer networks such as the Internet have become essential in today's technological societies and even critical to the financial viability of the national and the global economy. However, the easy access and widespread use of the Internet makes it a prime target for malicious activities, such as introduction of computer viruses, which pose a major threat to large computer networks. Since an understanding of the underlying dynamics of their propagation is essential in efforts to control them, a fair amount of research attention has been devoted to model the propagation of computer viruses, starting from basic deterministic models with ordinary differential equations (ODEs) through stochastic models of increasing realism.
In the spirit of exploring more realistic probability models that seek to explain the time dependent transient behavior of computer virus propagation by exploiting the essential stochastic nature of contacts and communications among computers, the present study introduces a new refinement in such efforts to consider the suitability and use of the stochastic SEIR model of mathematical epidemiology in the context of computer viruses propagation. We adapt the stochastic SEIR model to the study of computer viruses prevalence by incorporating the idea of a latent period during which computer is in an `exposed state' in the sense that the computer is infected but cannot yet infect other computers until the latency is over. The transition parameters of the SEIR model are estimated using real computer viruses data. We develop the maximum likelihood (MLE) and Bayesian estimators for the SEIR model parameters, and apply them to the `Code Red worm' data.
Since network structure can be a possibly important factor in virus propagation, multi-group stochastic SEIR models for the spreading of computer virus in heterogeneous networks are explored next. For the multi-group stochastic SEIR model using Markovian approach, the method of maximum likelihood estimation for model parameters of interest are derived. The method of least squares is used to estimate the model parameters of interest in the multi-group stochastic SEIR-SDE model, based on stochastic differential equations. The models and methodologies are applied to Code Red worm data.
Simulations based on different models proposed in this dissertation and deterministic/ stochastic models available in the literature are conducted and compared. Based on such comparisons, we conclude that (i) stochastic models using SEIR framework appear to be relatively much superior than previous models of computer virus propagation - even up to its saturation level, and (ii) there is no appreciable dierence between homogeneous and heterogeneous (multi-group) models. The `no dierence' nding of course may possibly be in uenced by the criterion used to assign computers in the overall network to different groups. In our study, the grouping of computers in the total network into subgroups or, clusters were based on their geographical location only, since no other grouping criterion were available in the Code Red worm data.
Part II covers two approaches for modeling life distributions in univariate and bivariate setups. In the univariate case, a new partial order based on the idea of `star-shaped functions' is introduced and explored. In the bivariate context; a class of models for joint lifetime distributions that extends the idea of univariate proportional hazards in a suitable way to the bivariate case is proposed. The expectation-maximization (EM) method is used to estimate the model parameters of interest. For the purpose of illustration, the bivariate proportional hazard model and the method of parameter estimation are applied to two real data sets.