The effective sound design of environmental sounds is crucial to demonstrating an immersive experience. Classical Procedural Audio (PA) models have been developed to give the sound designer a fast way to synthesize a specific class of environmental sounds in a physically accurate and computationally efficient manner. These models are controllable due to the choice of parameters from analyzing a class of sound. However, the resulting synthesis lacks the fidelity for the preferred immersive experience; thus, the sound designer would rather search through an extensive database for real recordings of a target sound class. This thesis proposes the Procedural audio Variational autoEncoder (ProVE), a general framework for developing a high-fidelity PA model through data-driven neural audio synthesis methods to address the lack of realism in classical PA models. The two-step procedure of training ProVE models is explained through examples of sound classes of footstep sounds and the sound of pouring water.
Furthermore, the thesis demonstrates a web application where users can generate footstep sounds by defining control variables for a pretrained ProVE model to show its capacity for interactive use in sound design workflows. The increase in fidelity from ProVE models is explored through objective evaluations of audio and subjective evaluations against classical PA methods. These results show that these learned neural PA models are feasible for sound design projects. The thesis concludes with a discussion of applications and future research directions.
If you have any questions please contact the ETD Team, libetd@njit.edu.