The Living Thing / Notebooks :

Learning Gamelan

I feel a certain class of audio signal should be easy to decompose and thence learn in a musically useful way; ones approximated by LTI, nearly-linear, nearly-additive filterbanks with sparse activations. This is a very specialised thing, except musically very useful, and close enough so soluble that it might be worth it.

On online learning of sparse basis dictionaries, for music; A specialised type of system identification or some generalisation of “shift invariant sparse coding”.

It seems like this would boil down to something like sparse dictionary learning, with the sparse activations, and a dictionary sparse in LPC components.

There are two ways to do this – time domain, and frequency domain.

For the latter, sparse time-domain activations are non local in Fourier components, but possibly simple to recover.

For the former, one could solve Durbin-Watson equations in the time domain, although we expect that to be unstable. We could go for sparse simultaneous kernel inference in the time domain, which might be better, or directly infer the Horner-form. Then we have a lot of simultaneous filter components and tedious inference for them. Otherwise, we could do it directly in the FFT domain, although this makes MIMO harder, and excludes the potential for non-linearities. The fact that I am expecting to identify many distinct systems in Fourier space as atoms complicates this slightly.

Thought: can I use HPSS to do this with the purely harmonic components? And use the percussive components as priors for the activations? How do you enforce causality for triggering in the FFT-transformed domain?

We have activations and components, but the activations are a KxT matrix, and the K components the rows of a KxL matrix. We wish the convolution of one with the other to approximately recover the original signal with a certain loss function.

Why gamelan? It’s tuned percussion, with a non-trivial tuning system, and no pitch bending.

Theory: TBD

Other questions: Infer chained biquads? Even restrict them to be bandpass? Or sparse, high-order filters of some description?

RNN notes

Refs