Generative stochastic models for audio. Analyse audio using machine listening methods to decompose it into features, maybe in an sparse basis, as in learning gamelan and possibly of low dimension due to some sparsification maybe including with some stochastic dependence, e.g. a random field or regression model of some kind. Then simulate features from that stochastic model. Depending what your cost function was and how good your model fit was and how you smoothed your data, this might produce something acoustically indistinguishable from the source, or have performed concatenative synthesis from a sparse basis dictionary, or have produced a parametric synthesizer software package.
There is a lot of funny business with machine learning for polyphonic audio. For a start, a naive linear-algebra-style decomposition doesn’t perform great because human acoustic perception is messy. e.g. all white noise sounds the same to us, but deterministic models need a large basis to minutely approximate it in the \(L_2\) norm. Our phase sensitivity is frequency dependent. Adjacent frequencies mask each other. Many other things I don’t know about. One could use cost functions based on psychoacoustic cochlear models, but those are tricky to synthesize from, (although possible if perhaps unsatisfying with a neural network). There are also classic alternate psychoacoustic decompositions such as the Mel Frequency Cepstral Transform, but these are even harder to invert.
That first step might be to find some model which can approximately capture the cyclic and disordered components of the signal. Indeed Metamorph and smstools, based on a “sinusoids+noise” model do this kind of decomposition, but they mostly use it for resynthesis in limited ways, not re-simulating from a distribution of possible stochastic processes. OR am I missing something? there is an implementation in csound called ATS which looks worth playing with also.
Some non-parametric conditional wavelet density sounds more fun to me, maybe as a Markov random field - although what exact generative model I would fit here is still opaque. The sequence probably possesses multiple at scales, and there is evidence that music might have a recursive grammatical structure which would be hard to learn even if we had a perfect decomposition.
What is Loris?
- Boyes, G. (2011) Dictionary-Based Analysis/Synthesis and Structured Representations of Musical Audio. . McGill University
- Coleman, G. (2015, December) Descriptor Control of Sound Transformations and Mosaicing Synthesis.
- Cont, A., Dubnov, S., & Assayag, G. (2007) GUIDAGE: A Fast Audio Query Guided Assemblage. . Presented at the Proceedings of International Computer Music Conference (ICMC), ICMA
- DI LISCIA, O. P.(n.d.) A Pure Data toolkit for real-time synthesis of ATS spectral data.
- Goodwin, M., & Vetterli, M. (1997) Atomic decompositions of audio signals. In 1997 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, 1997 (p. 4 pp.-). DOI.
- Hohmann, V. (2002) Frequency analysis and synthesis using a Gammatone filterbank. Acta Acustica United with Acustica, 88(3), 433–442.
- Masri, P., Bateman, A., & Canagarajah, N. (1997a) A review of time–frequency representations, with application to sound/music analysis–resynthesis. Organised Sound, 2(03), 193–205.
- Masri, P., Bateman, A., & Canagarajah, N. (1997b) The importance of the time–frequency representation for sound/music analysis–resynthesis. Organised Sound, 2(03), 207–214.
- Sarroff, A. M., & Casey, M. (2014) Musical audio synthesis using autoencoding neural nets. . Ann Arbor, MI: Michigan Publishing, University of Michigan Library
- Schwarz, D. (2011) State of the art in sound texture synthesis. In Proceedings of Digital Audio Effects (DAFx) (pp. 221–231).
- Serra, X., & Smith, J. (1990) Spectral Modeling Synthesis: A Sound Analysis/Synthesis System Based on a Deterministic Plus Stochastic Decomposition. Computer Music Journal, 14(4), 12–24. DOI.
- Simon, I., Basu, S., Salesin, D., & Agrawala, M. (2005) Audio analogies: Creating new music from an existing performance by concatenative synthesis. In Proceedings of the 2005 International Computer Music Conference (pp. 65–72).