The Living Thing / Notebooks : Concatenative synthesis

Transferring timbre from one sound to another; Synthesis by example. There are a lot of ways this could be done, but the classic is the “talking orchestra” vocoder, which always seemed ham fisted to me. I think about it more in terms of basis expansions, but there are many ways to think about it. When you refer to “concatenative synthesis” or an “Audio mosaic”, you usually mean using a granular synthesis method. This being the epoch of neural networks, someone will probably get style transfer for audio functioning soon.

The most comprehensive overview of classic concatenativge style stuff IMO is contained in Graham Coleman’s doctoral dissertation, below, which frames it in terms of loss functions and descriptors. (Cole15)

There are a few classic implementations about;

Audio analogies

Related: analysis-resynthesis, learning gamelan.

Refs

ABLA03
Amatriain, X., Bonada, J., Loscos, A., Arcos, J. L., & Verfaille, V. (2003) Content-based Transformations. Journal of New Music Research, 32(1), 95–114. DOI.
AuPa06
Aucouturier, J.-J., & Pachet, F. (2006) Jamming with Plunderphonics: Interactive concatenative synthesis of music. Journal of New Music Research, 35(1), 35–50. DOI.
BlDa04
Blumensath, T., & Davies, M. (2004) On Shift-Invariant Sparse Coding. In C. G. Puntonet & A. Prieto (Eds.), Independent Component Analysis and Blind Signal Separation (Vol. 3195, pp. 1205–1212). Berlin, Heidelberg: Springer Berlin Heidelberg
BlDa06
Blumensath, T., & Davies, M. (2006) Sparse and shift-Invariant representations of music. IEEE Transactions on Audio, Speech and Language Processing, 14(1), 50–57. DOI.
Cole15
Coleman, G. (2015, December) Descriptor Control of Sound Transformations and Mosaicing Synthesis.
Coll12
Collins, N. (2012) Even More Errant Sound Synthesis.
CoSt00
Collins, N., & Sturm, B. L.(n.d.) Sound cross-synthesis and morphing using dictionary-based methods. In International Computer Music Conference.
CoDA07
Cont, A., Dubnov, S., & Assayag, G. (2007) GUIDAGE: A Fast Audio Query Guided Assemblage. . Presented at the Proceedings of International Computer Music Conference (ICMC), ICMA
DrME14
Driedger, J., Muller, M., & Ewert, S. (2014) Improving time-scale modification of music signals using harmonic-percussive separation. IEEE Signal Processing Letters, 21(1), 105–109. DOI.
ElCM08
Ellis, D. P. W., Cotton, C. V., & Mandel, M. I.(2008) Cross-correlation of beat-synchronous representations for music similarity. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008 (pp. 57–60). DOI.
FoKe09
Forrester, A. I. J., & Keane, A. J.(2009) Recent advances in surrogate-based optimization. Progress in Aerospace Sciences, 45(1–3), 50–79. DOI.
GaEB15
Gatys, L. A., Ecker, A. S., & Bethge, M. (2015) A Neural Algorithm of Artistic Style. arXiv:1508.06576 [Cs, Q-Bio].
GrBa84
Green, D., & Bass, S. (1984) Representing periodic waveforms with nonorthogonal basis functions. IEEE Transactions on Circuits and Systems, 31(6), 518–534. DOI.
KePu10
Kersten, S., & Purwins, H. (2010) Sound texture synthesis with hidden markov tree models in the wavelet domain.
KoSD13
Kowalski, M., Siedenburg, K., & Dorfler, M. (2013) Social Sparsity! Neighborhood Systems Enrich Structured Shrinkage Operators. IEEE Transactions on Signal Processing, 61(10), 2498–2511. DOI.
KrGY97
Kronland-Martinet, R., Guillemain, P., & Ystad, S. (1997) Modelling of natural sounds by time–frequency and wavelet representations. Organised Sound, 2(03), 179–191. DOI.
MaBC97a
Masri, P., Bateman, A., & Canagarajah, N. (1997a) A review of time–frequency representations, with application to sound/music analysis–resynthesis. Organised Sound, 2(03), 193–205.
MaBC97b
Masri, P., Bateman, A., & Canagarajah, N. (1997b) The importance of the time–frequency representation for sound/music analysis–resynthesis. Organised Sound, 2(03), 207–214.
MiGS13
Mital, P. K., Grierson, M., & Smith, T. J.(2013) Corpus-based visual synthesis: an approach for artistic stylization. (p. 51). ACM Press DOI.
Neid10
Neidinger, R. (2010) Introduction to Automatic Differentiation and MATLAB Object-Oriented Programming. SIAM Review, 52(3), 545–563. DOI.
OIMT15
Owens, A., Isola, P., McDermott, J., Torralba, A., Adelson, E. H., & Freeman, W. T.(2015) Visually Indicated Sounds. arXiv:1512.08512 [Cs].
QHSG05
Queipo, N. V., Haftka, R. T., Shyy, W., Goel, T., Vaidyanathan, R., & Kevin Tucker, P. (2005) Surrogate-based analysis and optimization. Progress in Aerospace Sciences, 41(1), 1–28. DOI.
ReLo02
Rebollo-Neira, L., & Lowe, D. (2002) Optimized orthogonal matching pursuit approach. IEEE Signal Processing Letters, 9(4), 137–140. DOI.
Schw07
Schwarz, D. (2007) Corpus-based concatenative synthesis. IEEE Signal Processing Magazine, 24(2), 92–104. DOI.
Schw11
Schwarz, D. (2011) State of the art in sound texture synthesis. In Proceedings of Digital Audio Effects (DAFx) (pp. 221–231).
SBSA05
Simon, I., Basu, S., Salesin, D., & Agrawala, M. (2005) Audio analogies: Creating new music from an existing performance by concatenative synthesis. In Proceedings of the 2005 International Computer Music Conference (pp. 65–72).
Stur09
Sturm, B. L.(2009) Sparse Approximation and Atomic Decomposition: Considering Atom Interactions in Evaluating and Building Signal Representations (phdthesis). . University of California, Santa Barbara, CA
TaOS14
Tachibana, H., Ono, N., & Sagayama, S. (2014) Singing voice enhancement in monaural music signals based on two-stage harmonic/percussive sound separation on multiple resolution spectrograms. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 22(1), 228–237. DOI.