A method to retrieve the parameters used to create a multitrack mix using only raw tracks and the stereo mixdown is presented. This method is able to model nonlinear time-invariant effects such as gain, pan, equalisation, dynamic range compression, distortion, delay, and reverb. The optimization procedure used is the stochastic gradient descent with the aid of differentiable digital signal processing modules. This method allows for a fully interpretable representation of the mixing signal chain by explicitly modelling the audio effects rather than using differentiable blackbox modules. The modelling capacity of several different mixing chains are investigated. Objective feature measures are taken of the outputs of the various mixing chains when tasked with estimating a target mix and compared against a stereo gain mix baseline. A listening study is performed to measure how closely the mixing chains can perceptually match a reference mix when compared to a stereo gain mix. Results show that the full signal chain performs best on objective measures and there is no statistically significant difference between the participants' perception of the full mixing chain and reference mixes.
Mixdown | Reference | Full Mixing Chain | DRC - EQ - Reverb | EQ - Reverb (Linear Mix) | Gain Mix |
---|---|---|---|---|---|
03_A | |||||
08_A | |||||
06_B | |||||
14_B | |||||
19_C | |||||
20_C |
All listening samples are 44.1kHz wav files.
@inproceedings{-,
title={-},
author={Colonel, Joseph T. and Reiss, Joshua},
booktitle={-},
year={2023},
organization={-}
}