ISO 226:2003 Equal loudness contour image by Lindosland:
A quick incomplete reference to pascals, Bels, erbs, Barks, sones, Hertz, semitones, Mels and whatever else I happen to need.
The actual auditory system is atrociously complex and I'm not going in to complete e.g. perceptual models here, even if I did know a stirrup from a hammer or a cochlea from a cauliflower ear. Measuring what we can perceive with our sensory apparatus is itself a complex thing, involving masking effects and variable resolution in time, space and frequency, not to mention variation between individuals.
Nonetheless, when studying audio it is worthwhile using units other than the natural-to-a-physicist Hz and Pascals even without hoping to pretend that we have found the native units of the human ear. SI units are inconvenient when studying musical metrics or machine listening because do not closely match human perceptual difference - 50 Herz is a significant difference at a base frequency of 100 Herz, but insignificant at 2000 Hz. But how big this difference is and what it means is rather a complex and contingent question. This means that we should not be too attached to getting this one “right”, and feel free to take adequate simple approximations as the project demands.
Since my needs are machine listening features and thus computational speed and simplicity over perfection, I will wilfully and with malice ignore any fine distinctions I cannot be bothered with, regardless of how many articles have been published discussing said details. For example, I will not cover “salience”, “sonorousness” or cultural difference issues. I will also ignore issues of uncertainty principles in inferring such qualities.
Start point: physical units
SPL, Hertz, pascals.
First step: Logarithmic units
This innovation is nearly universal in music studies, because of its extreme simplicity. However, it's constantly surprising to machine listening who keep rediscovering it when they get frustrated with the FFT spectrogram. Bels/deciBels, semitones/octaves… dbV.
“Cambridge” and “Munich” frequency units
Bark and ERB measures; these seem to be more common in the acoustics and psycho-acoustics community. An introduction to selected musically useful bits is given by Parncutt and Strasberger (PaSt94).
According to Moor14 the key references for Barks is Zwicker “critical band” research (Zwic61) extended by Brian Moore, et al. (e.g. in MoGl83)
Trau90 gives a simple rational formula to approximate the in-any-case-approximate lookup tables, as does MoGl83, and both relate these to Erbs.
Descriptions of Barks seem to start with the statement that above about 500 Hz this scale is near logarithmic in the frequency axis. Below 500 Hz the Bark scale approaches linearity. It is defined by an empirically derived table, but there are analytic approximations which seem just as good.
Traunmüller approximation for critical band rate in bark
Lach Lau amends the formula:
Harmut Traunmüller's online unit conversion page can convert these for you and Dik Hermes summarises some history of how we got this way.
Newer, works better on lower frequencies. (but possibly not at very high frequencies?) Seem to be popular for analysing psychoacoustic masking effects?
Erbs are given different formulae and capitalisation depending where you look. Here's one from PaSt94 for the “ERB-rate”
Erbs themselves (which is different at the erb-rate for a given frequency?)
Mels are credited by Traunmüller to Bera49 and by Parncutt to Stevens and Volkmann (StVo40).
The mel scale is not used as a metric for computing pitch distance in the present model, because it applies only to pure tones, whereas most of the tone sensations evoked by complex sonorities are of the complex variety (virtual rather than spectral pitches).
Certainly some of the ERB experiment are also done using pure tones, but maybe… Ach, I don't even care.
Mels are common in the machine listening community, mostly through the MFCC, the Mel-frequency Cepstral Transform, which is a metric that seems to be a historically popular one to measure psychoacoustic similarity of sounds. (MeCh76, DaMe80)
Here's one formula, the “HTK” formula.
There are others, such as the “Slanek” formula which is much more complicated and piecewise defined. I can't be bothered searching for details for now.
Sones – StVN37 are a power-law-intensity scale. Phons, ibid, are a logarithmic intensity scale, something like the dB level of the signal filtered to match the human ear, which is close to… dbA? Something like that. But you can get more sophisticated. Keyword: Fletcher-Munson curves.
For this level of precision, the coupling of frequency and amplitude into perceptual “loudness” becomes important and they are no longer the same at different source sound frequencies via equal-loudness contours, which you can get from an actively updated ISO standard at great expense, or try to reconstruct from journals. SMRM03 seems to be the accepted modern version, but their report only lists graphs and is missing values in the few equations. Table-based loudness contours are available under the MIT license from the Surrey git repo, under iso226.m. Closed-form approximations for an equal loudness contour at fixed SPL are given in SuTa04, equation 6.
When the loudness of an -Hz comparison tone is equal to the loudness of a reference tone at 1 kHz with a sound pressure of , then the sound pressure of at the frequency of Hz is given by the following function:
AFAICT they don't define or anywhere, and I don't have enough free attention to find a simple expression for the frequency-dependent parameters, which I think are still spline-fit. (?)
There is an excellent explanation of the point of all this – with diagrams - by Joe Wolfe.
Onwards and upwards like a Shepard tone
At this point, where we are already combining frequency and loudness, things are getting weird; we are usually measuring people's reported subjective loudness levels for unnatural signals (pure tones), and with real signals we rapidly start running into temporal masking effects and phasing and so on.
Thankfully, we aren't in the business of exhaustive cochlear modeling, so we can all go home now. The unhealthily curious might read Moor07 or Hart97 and tell me the good bits, then move onto sensory neurology.
Psychoacoustic models in lossy audio compression
Pure link dump, sorry.
- Vorbis pyschoacousitcs (and the Forum post that lead me therel)
- Radified Ogg pyschoacoustic model expalantion
- Hydrogen Audio's MDCT explanation
- MaOK01: Ken’ichiro Masaoka, Kazuho Ono, Setsu Komiyama (2001) A measurement of equal-loudness level contours for tone burst. Acoustical Science and Technology, 22(1), 35–39. DOI
- Neel93: Stephen T. Neely (1993) A model of cochlear mechanics with outer hair cell motility. Journal of the Acoustical Society of America, 94(1), 137–146. DOI
- StBP09: Charles Steele, Jacques Boutet de Monvel, Sunil Puria (2009) A multiscale model of the organ of Corti. Journal of Mechanics of Materials and Structures, 4(4), 755–778. DOI
- RoLM97: Jean Rouat, Yong Chun Liu, Daniel Morissette (1997) A pitch determination and voiced/unvoiced decision algorithm for noisy speech. Speech Communication, 21(3), 191–207.
- RoDa56: D. W. Robinson, R. S. Dadson (1956) A re-determination of the equal-loudness relations for pure tones. British Journal of Applied Physics, 7(5), 166. DOI
- StVN37: S. S. Stevens, J. Volkmann, E. B. Newman (1937) A Scale for the Measurement of the Psychological Magnitude Pitch. The Journal of the Acoustical Society of America, 8(3), 185–190. DOI
- LaNK87: M. Lahat, Russell J. Niederjohn, D. Krubsack (1987) A spectral autocorrelation method for measurement of the fundamental frequency of noise-corrupted speech. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(6), 741–750. DOI
- Bera49: Leo Leroy Beranek (1949) Acoustic Measurements.
- TFHL05: Alex Tarnopolsky, Neville Fletcher, Lloyd Hollenberg, Benjamin Lange, John Smith, Joe Wolfe (2005) Acoustics: The vocal tract and the sound of a didgeridoo. Nature, 436(7047), 39–39. DOI
- BrBr74: J. S. Bridle, M. D. Brown (1974) An experimental automatic word recognition system. JSRU Report, 1003(5).
- HuPa93: David Huron, Richard Parncutt (1993) An improved model of tonality perception incorporating pitch salience and echoic memory. Psychomusicology: A Journal of Research in Music Cognition, 12(2), 154–171. DOI
- Trau90: Hartmut Traunmüller (1990) Analytical expressions for the tonotopic sensory scale. The Journal of the Acoustical Society of America, 88(1), 97–100. DOI
- PaSt94: Richard Parncutt, Hans Strasburger (1994) Applying Psychoacoustics in Composition: “Harmonic” Progressions of “Nonharmonic” Sonorities. Perspectives of New Music, 32(2), 88–129. DOI
- SkKr10: Erika Skoe, Nina Kraus (2010) Auditory brainstem response to complex sounds: a tutorial. Ear and Hearing, 31(3), 302–324. DOI
- Slan98: Malcolm Slaney (1998) Auditory toolbox. Interval Research Corporation, Tech. Rep, 10, 1998.
- NoFa88: Jan Nordmark, Lennart E. Fahlen (1988) Beat theories of musical consonance. Speech Transmission Laboratory, Quarterly Progress and Status Report.
- Lerd96: Fred Lerdahl (1996) Calculating tonal tension. Music Perception: An Interdisciplinary Journal, 13(3), 319–363. DOI
- Brow91: Judith C. Brown (1991) Calculation of a constant Q spectral transform. The Journal of the Acoustical Society of America, 89(1), 425–434. DOI
- Moor07: Brian C. J. Moore (2007) Cochlear hearing loss: physiological, psychological and technical issues. Chichester: Wiley
- Cart87: G.Clifford Carter (1987) Coherence and time delay estimation. Proceedings of the IEEE, 75(2), 236–255. DOI
- DaMe80: S. Davis, P. Mermelstein (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366. DOI
- FePa04: Sean Ferguson, Richard Parncutt (2004) Composing In the Flesh: Perceptually-Informed Harmonic Syntax. In Proceedings of Sound and Music Computing.
- Moor14: Brian C. J. Moore (2014) Development and Current Status of the “Cambridge” Loudness Models. Trends in Hearing, 18. DOI
- Helm63: Heinrich Helmholtz (1863) Die Lehre von den Tonempfindungen als physiologische Grundlage für die Theorie der Musik. Braunschweig: J. Vieweg
- MeCh76: Paul Mermelstein, CH Chen (1976) Distance measures for speech recognition: psychological and instrumental. In Pattern Recognition and Artificial Intelligence, (Vol. 101, pp. 374–388).
- SmLe06: Evan C. Smith, Michael S. Lewicki (2006) Efficient auditory coding. Nature, 439(7079), 978–982. DOI
- SuTa04: Yôiti Suzuki, Hisashi Takeshima (2004) Equal-loudness-level contours for pure tones. The Journal of the Acoustical Society of America, 116(2), 918. DOI
- GóHe04: Emilia Gómez, Perfecto Herrera (2004) Estimating The Tonality Of Polyphonic Audio Files: Cognitive Versus Machine Learning Modelling Strategies. In ISMIR.
- UmCN99: S. Umesh, L. Cohen, D. Nelson (1999) Fitting the Mel scale. In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258) (Vol. 1, pp. 217–220 vol.1). DOI
- NTRR98: S. Shyamla Narayan, Andrei N. Temchin, Alberto Recio, Mario A. Ruggero (1998) Frequency Tuning of Basilar Membrane and Auditory Nerve Fibers in the Same Cochleae. Science, 282(5395), 1882–1884. DOI
- Stol15: Frieder Stolzenburg (2015) Harmony perception by periodicity detection. Journal of Mathematics and Music, 9(3), 215–238. DOI
- Guin12: John J. Guinan Jr. (2012) How are inner hair cells stimulated? Evidence for multiple mechanical drives. Hearing Research, 292(1–2), 35–50. DOI
- KXGC04: Ananthanarayan Krishnan, Yisheng Xu, Jackson T. Gandour, Peter A. Cariani (2004) Human frequency-following response: representation of pitch contours in Chinese tones. Hearing Research, 189(1–2), 1–12. DOI
- Olso01: Elizabeth S. Olson (2001) Intracochlear pressure measurements related to cochlear tuning. The Journal of the Acoustical Society of America, 110(1), 349–367. DOI
- CaSo03: Ramon Ferrer i Cancho, Ricard V. Solé (2003) Least effort and the origins of scaling in human language. Proceedings of the National Academy of Sciences, 100(3), 788–791. DOI
- Iriz01: Rafael A Irizarry (2001) Local harmonic estimation in musical sound signals. Journal of the American Statistical Association, 96(454), 357–367. DOI
- BiGT67: Christopher Bingham, M. Godfrey, John W. Tukey (1967) Modern techniques of power spectrum estimation. Audio and Electroacoustics, IEEE Transactions On, 15(2), 56–66.
- BiKr09: Gavin M. Bidelman, Ananthanarayan Krishnan (2009) Neural correlates of consonance, dissonance, and the hierarchy of musical pitch in the human brainstem. Journal of Neuroscience, 29(42), 13165–13171. DOI
- CaDe96a: P. A. Cariani, B. Delgutte (1996a) Neural correlates of the pitch of complex tones I Pitch and pitch salience. Journal of Neurophysiology, 76(3), 1698–1716. DOI
- CaDe96b: P. A. Cariani, B. Delgutte (1996b) Neural correlates of the pitch of complex tones II Pitch shift, pitch ambiguity, phase invariance, pitch circularity, rate pitch, and the dominance region for pitch. Journal of Neurophysiology, 76(3), 1717–1734. DOI
- CaGP99: Julyan H. E. Cartwright, Diego L. González, Oreste Piro (1999) Nonlinear Dynamics of the Perceived Pitch of Complex Sounds. Physical Review Letters, 82(26), 5389–5392. DOI
- ThPa97: William Forde Thompson, Richard Parncutt (1997) Perceptual judgments of triads and dyads: Assessment of a psychoacoustic model. Music Perception, 263–280.
- Herm07: Irving P. Herman (2007) Physics of the human body. Berlin ; New York: Springer
- Terh74: Ernst Terhardt (1974) Pitch, consonance, and harmony. The Journal of the Acoustical Society of America, 55(5), 1061–1069. DOI
- CeDe05: Leonardo Cedolin, Bertrand Delgutte (2005) Pitch of complex tones: Rate-place and interspike interval representations in the auditory nerve. Journal of Neurophysiology, 94(1), 347–362. DOI
- SMRM03: Yôiti Suzuki, Volker Mellert, Utz Richter, Henrik Møller, Leif Nielsen, Rhona Hellman, … Hisashi Takeshima (2003) Precise and Full-range Determination of Two-dimensional Equal Loudness Contours.
- Parn05: Richard Parncutt (2005) Psychoacoustics and music perception. Musikpsychologie–Das Neue Handbuch.
- FaZw07: H. Fastl, Eberhard Zwicker (2007) Psychoacoustics: facts and models. Berlin ; New York: Springer
- Ball99: Philip Ball (1999) Pump up the bass. Nature News. DOI
- BaTo66: B. Bauer, E. Torick (1966) Researches in loudness measurement. IEEE Transactions on Audio and Electroacoustics, 14(3), 141–151. DOI
- Ball14: Philip Ball (2014) Rhythm is heard best in the bass. Nature. DOI
- Hart97: William M. Hartmann (1997) Signals, sound, and sensation. Woodbury, N.Y: American Institute of Physics
- SmCh11: Sonya T. Smith, Richard S. Chadwick (2011) Simulation of the Response of the Inner Hair Cell Stereocilia Bundle to an Acoustical Stimulus. PLoS ONE, 6(3), e18161. DOI
- Seth97: William A. Sethares (1997) Specifying spectra for musical scales. The Journal of the Acoustical Society of America, 102(4), 2422–2431. DOI
- Slep96: Norma B. Slepecky (1996) Structure of the Mammalian Cochlea. In The Cochlea (pp. 44–129). Springer New York
- Zwic61: E. Zwicker (1961) Subdivision of the Audible Frequency Range into Critical Bands (Frequenzgruppen). The Journal of the Acoustical Society of America, 33(2), 248–248. DOI
- MoGl83: Brian C. J. Moore, Brian R. Glasberg (1983) Suggested formulae for calculating auditory‐filter bandwidths and excitation patterns. The Journal of the Acoustical Society of America, 74(3), 750–753. DOI
- HMBT14: Michael J. Hove, Céline Marie, Ian C. Bruce, Laurel J. Trainor (2014) Superior time perception for lower musical pitch explains why bass-ranged instruments lay down musical rhythms. Proceedings of the National Academy of Sciences, 111(28), 10383–10388. DOI
- Zwis80: J. J. Zwislocki (1980) Symposium on cochlear mechanics: Where do we stand after 50 years of research? The Journal of the Acoustical Society of America, 67(5), 1679–1679. DOI
- CoMP12: Marion Cousineau, Josh H. McDermott, Isabelle Peretz (2012) The basis of musical consonance as revealed by congenital amusia. Proceedings of the National Academy of Sciences, 109(48), 19858–19863. DOI
- Tymo06: Dmitri Tymoczko (2006) The Geometry of Musical Chords. Science, 313(5783), 72–74. DOI
- YEGH02: Steve Young, Gunnar Evermann, Mark Gales, Thomas Hain, Dan Kershaw, Xunying Liu, … Dan Povey (2002) The HTK book (p. 384).
- HFFH11: Holger Hennig, Ragnar Fleischmann, Anneke Fredebohm, York Hagmayer, Jan Nagler, Annette Witt, … Theo Geisel (2011) The Nature and Perception of Fluctuations in Human Musical Rhythms. PLoS ONE, 6(10), 26457. DOI
- RaPl99: Rudolf Rasch, Reinier Plomp (1999) The perception of musical tones. The Psychology of Music, 2, 89–112.
- StVo40: S. S. Stevens, J. Volkmann (1940) The relation of pitch to frequency: a revised scale. The American Journal of Psychology, 53(3), 329–353. DOI
- PlLe65: Reinier Plomp, Willem JM Levelt (1965) Tonal consonance and critical bandwidth. The Journal of the Acoustical Society of America, 38(4), 548–560. DOI