The Living Thing / Notebooks :

Concatenative synthesis

Transferring timbre from one sound to another; Synthesis by example. There are a lot of ways this could be done, but the classic is the “talking orchestra” vocoder, which always seemed ham fisted to me. I think about it more in terms of basis expansions, but there are many ways to think about it. When you refer to “concatenative synthesis” or an “Audio mosaic”, you usually mean using a granular synthesis method. This being the epoch of neural networks, someone will probably get style transfer for audio functioning soon.

The most comprehensive overview of classic concatenativge style stuff IMO is contained in Graham Coleman’s doctoral dissertation, below, which frames it in terms of loss functions and descriptors. (Cole15)

There are a few classic implementations about;

Audio analogies

Related: analysis-resynthesis, learning gamelan.


Amatriain, X., Bonada, J., Loscos, A., Arcos, J. L., & Verfaille, V. (2003) Content-based Transformations. Journal of New Music Research, 32(1), 95–114. DOI.
Aucouturier, J.-J., & Pachet, F. (2006) Jamming with Plunderphonics: Interactive concatenative synthesis of music. Journal of New Music Research, 35(1), 35–50. DOI.
Blumensath, T., & Davies, M. (2004) On Shift-Invariant Sparse Coding. In C. G. Puntonet & A. Prieto (Eds.), Independent Component Analysis and Blind Signal Separation (Vol. 3195, pp. 1205–1212). Berlin, Heidelberg: Springer Berlin Heidelberg
Blumensath, T., & Davies, M. (2006) Sparse and shift-Invariant representations of music. IEEE Transactions on Audio, Speech and Language Processing, 14(1), 50–57. DOI.
Coleman, G. (2015, December) Descriptor Control of Sound Transformations and Mosaicing Synthesis.
Collins, N. (2012) Even More Errant Sound Synthesis.
Collins, N., & Sturm, B. L.(n.d.) Sound cross-synthesis and morphing using dictionary-based methods. In International Computer Music Conference.
Cont, A., Dubnov, S., & Assayag, G. (2007) GUIDAGE: A Fast Audio Query Guided Assemblage. . Presented at the Proceedings of International Computer Music Conference (ICMC), ICMA
Driedger, J., Muller, M., & Ewert, S. (2014) Improving time-scale modification of music signals using harmonic-percussive separation. IEEE Signal Processing Letters, 21(1), 105–109. DOI.
Ellis, D. P. W., Cotton, C. V., & Mandel, M. I.(2008) Cross-correlation of beat-synchronous representations for music similarity. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008 (pp. 57–60). DOI.
Forrester, A. I. J., & Keane, A. J.(2009) Recent advances in surrogate-based optimization. Progress in Aerospace Sciences, 45(1–3), 50–79. DOI.
Gatys, L. A., Ecker, A. S., & Bethge, M. (2015) A Neural Algorithm of Artistic Style. arXiv:1508.06576 [Cs, Q-Bio].
Green, D., & Bass, S. (1984) Representing periodic waveforms with nonorthogonal basis functions. IEEE Transactions on Circuits and Systems, 31(6), 518–534. DOI.
Kersten, S., & Purwins, H. (2010) Sound texture synthesis with hidden markov tree models in the wavelet domain.
Kowalski, M., Siedenburg, K., & Dorfler, M. (2013) Social Sparsity! Neighborhood Systems Enrich Structured Shrinkage Operators. IEEE Transactions on Signal Processing, 61(10), 2498–2511. DOI.
Kronland-Martinet, R., Guillemain, P., & Ystad, S. (1997) Modelling of natural sounds by time–frequency and wavelet representations. Organised Sound, 2(03), 179–191. DOI.
Masri, P., Bateman, A., & Canagarajah, N. (1997a) A review of time–frequency representations, with application to sound/music analysis–resynthesis. Organised Sound, 2(03), 193–205.
Masri, P., Bateman, A., & Canagarajah, N. (1997b) The importance of the time–frequency representation for sound/music analysis–resynthesis. Organised Sound, 2(03), 207–214.
Mital, P. K., Grierson, M., & Smith, T. J.(2013) Corpus-based visual synthesis: an approach for artistic stylization. (p. 51). ACM Press DOI.
Neidinger, R. (2010) Introduction to Automatic Differentiation and MATLAB Object-Oriented Programming. SIAM Review, 52(3), 545–563. DOI.
Owens, A., Isola, P., McDermott, J., Torralba, A., Adelson, E. H., & Freeman, W. T.(2015) Visually Indicated Sounds. arXiv:1512.08512 [Cs].
Queipo, N. V., Haftka, R. T., Shyy, W., Goel, T., Vaidyanathan, R., & Kevin Tucker, P. (2005) Surrogate-based analysis and optimization. Progress in Aerospace Sciences, 41(1), 1–28. DOI.
Rebollo-Neira, L., & Lowe, D. (2002) Optimized orthogonal matching pursuit approach. IEEE Signal Processing Letters, 9(4), 137–140. DOI.
Schwarz, D. (2007) Corpus-based concatenative synthesis. IEEE Signal Processing Magazine, 24(2), 92–104. DOI.
Schwarz, D. (2011) State of the art in sound texture synthesis. In Proceedings of Digital Audio Effects (DAFx) (pp. 221–231).
Simon, I., Basu, S., Salesin, D., & Agrawala, M. (2005) Audio analogies: Creating new music from an existing performance by concatenative synthesis. In Proceedings of the 2005 International Computer Music Conference (pp. 65–72).
Sturm, B. L.(2009) Sparse Approximation and Atomic Decomposition: Considering Atom Interactions in Evaluating and Building Signal Representations (phdthesis). . University of California, Santa Barbara, CA
Tachibana, H., Ono, N., & Sagayama, S. (2014) Singing voice enhancement in monaural music signals based on two-stage harmonic/percussive sound separation on multiple resolution spectrograms. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 22(1), 228–237. DOI.