Automated composition, music theory and tools therefor.
Colin Morris' SongSim visualises lyrics, in fact, not notes, but wow don't they look nice?.
Where my audio software frameworks page does more DSP, this is mostly about MIDI; choosing notes. A specialisation of Generative art with machine learning,
Sometime you don't want to measure a chord, or see a chord, you just want to write a chord.
See also machine listening, musical corpora, musical metrics, synchronisation. The discrete symbolic cousin to analysis/resynthesis project.
Related projects: How I would do generative art with neural networks and learning gamelan.
To understand
Dmitri Tymozcko claims, music data is most naturally regarded as existing on an orbifold (“quotient manifold”), which I'm sure you could do some clever regression upon but I can't yet see how. Orbifolds are, AFAICT, something like what you get when you have a bag of regressors instead of a tuple, and are reminiscent of the string bag models of the natural language information retrieval people, except there is not as much hustle for music as there is for NLP. Nonetheless manifold regression is a thing, and regression on manifolds also, so there is probably some stuff done there, as documented at arpeggiate by numbers.
Also it's not a single scalar (which note) we are predicting here, and not just a distribution of a single output; (probability of each notes). At the very least it's the cooccurence of several notes.
More generally, it's the joint distribution of the evolution of the harmonics and the noise and all that other stuff that our ear can resolve and which can be simultaneously extracted. And we know from psychoaccoustics that these will be coupled  dissonance of two pure tones depends on frequency and amplitude of each of those components, for example.
In any case, these wrinkles aside, if I could predict the conditional distribution of the sequence in a way that produced recognisably musical sound, then simulate from it, I would be happy for a variety of reasons.
So I guess this page is “nonparametric vector regression on an orbifold”. Hmm.
Interesting examples
Deep Bach (paper HaPa16, code) seems to be doing a related thing. Similar sets of authors (HaSP16) have some other related work):
Modeling polyphonic music is a particularly challenging task because of the intricate interplay between melody and harmony. A good model should satisfy three requirements: statistical accuracy (capturing faithfully the statistics of correlations at various ranges, horizontally and vertically), flexibility (coping with arbitrary user constraints), and generalization capacity (inventing new material, while staying in the style of the training corpus). Models proposed so far fail on at least one of these requirements. We propose a statistical model of polyphonic music, based on the maximum entropy principle. This model is able to learn and reproduce pairwise statistics between neighboring note events in a given corpus. The model is also able to invent new chords and to harmonize unknown melodies. We evaluate the invention capacity of the model by assessing the amount of cited, rediscovered, and invented chords on a corpus of Bach chorales. We discuss how the model enables the user to specify and enforce userdefined constraints, which makes it useful for stylebased, interactive music generation.
Random ideas

How to reconstruct a piece from its recurrence matrix, or at least constrain pieces by their recurrence matrix;

Composition path dependence: If everything were ordered by equilibrium, then orchestras would tend toward a Pareto optimal distribution of french horn. How to capture time dependence? How to quantify “motifs”?

Can I use a chain graph to do this?

Evan Chow represents for team nondeeplearning with jazzml:
Computer jazz improvisation powered by machine learning, specifically trigram modeling, KMeans clustering, and chord inference with SVMs.

There are a whole bunch of neuralnetworkbased approaches  see generative art & neural networks
Helpful software for the musically vexed

Fabrizio Poce's J74 progressive and J74 bassline are some chord progression generators from his library of clever chord generators linked in to Ableton Live's scripting engine, so if you are using Ableton they might be handy. They are cheap (EUR12 + EUR15). I use them myself, but they DO make Ableton crash a wee bit, so not really suited for live performance, which is a pity because that would be a wonderful unique selling point. The realtimeoriented J74 HarmoTools from the same guy are less sophisticated but worth trying, especially since they are free, and he has lot of other clever hacks there too. Basically, just go to this guy's site and try his stuff out. You don't have to stop there.

Odesi (USD49) has been doing lots of advertising and has poptastic interface to pop music. It's like Synfirelite with a library of top 40 tricks and rhythms. The desktop version tries to install gigabytes of synths of meagre merit on your machine, which is a giant waste of space an time if you are using a computer with synths on, which you are because this is not the 90s.

Helio is free and cross platform and totally worth a shot. There is a chord model in there and version control (!) but you might not notice the chord thing if you aren't careful, because the UI is idiosyncratic.

Mixtikl / Noatikl are grandaddy apps for this, although the creators doubtless put much effort into the sleek user interfaces, their complete inability to explain their app or provide compelling demonstrations or use cases leave me cold. I get the feeling they had highart aspirations but have ended up basically doing ambient noodles in order to sell product. Maybe I'm not being fair. Not rich enough to find out. (USD25/USD40)

Rapid Compose (USD99/USD249) might make decent software, but can't really explain why their app is nice or provide a demo version.

synfire explains how it uses music theory to do largescale scoring etc. Get the string section to behave itself or you'll replace them with MIDIbots. (EUR996, so I won't be buying it, but great demo video.)

harmony builder does classical music theory for you. USD39USD219 depending on heinously complex pricing schemes. Will pass your conservatorium finals.

You can't resist rolling your own? sharp11 is a node.js music theory library for javascript with demo application to create jazz improv.

Supercollider of course does this and everything else, but designing user interfaces for it will take years off your life. OTOH, if you are happy with text, this might be a goer.
Arpeggiators

Bluearp vst does 2note chord extrapolation (free)

Hypercyclic is an LFOable arpeggiator (free)

kirnu (free) and kirnu cream
Constraint Composition
All of that too mainstream? Try a weird alternative formalism! How about constraint composition? That is, declarative musical composition by defining constraints on the relations which the notes must satisfy. Sounds fun in the abstract but the practice doesn't grab me especially as a creative tool.
The reference here is strasheela built on an obscure, unpopular, and apparently discontinued Prologlike language called “Oz” or “Mozart”, because using popular languages is not a grand a gesture as claiming none of them are quite Turing complete enough, in the right way, for your special thingy.
That language is a bit of a ghost town, which means headaches if you wish to use it in practice; If you wanted to actually do this, you'd probably use overtone + minikanren (prologforlisp), as with the composing schemer, or to be even more mainstream, just use a conventional constraint solver in a popular language. I am fond of python and ncvx, but there are many choices.
Anyway, prolog fans can read on: see Anders and Miranda (AnMi10, AnMi11)
Refs
 ReYE12: K. Reese, R. Yampolskiy, A. Elmaghraby (2012) A framework for interactive generation of music for games. In 2012 17th International Conference on Computer Games (CGAMES) (pp. 131–137). Washington, DC, USA: IEEE Computer Society DOI
 KoSW18: Filip Korzeniowski, David R. W. Sears, Gerhard Widmer (2018) A LargeScale Study of Language Models for Chord Prediction. ArXiv:1804.01849 [Cs, Eess, Stat].
 WiTH09: Daniela M. Witten, Robert Tibshirani, Trevor Hastie (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, kxp008. DOI
 DiMS10: A Di Lillo, G. Motta, J.A Storer (2010) A rotation and scale invariant descriptor for shape recognition. In 2010 17th IEEE International Conference on Image Processing (ICIP) (pp. 257–260). DOI
 Bod02a: Rens Bod (2002a) A unified model of structural organization in language and music. Journal of Artificial Intelligence Research, 17(2002), 289–308.
 BCCZ14: Christian Borgs, Jennifer T. Chayes, Henry Cohn, Yufei Zhao (2014) An theory of sparse graph convergence I: limits, sparse random graph models, and power law distributions. ArXiv:1401.2906 [Math].
 BiGS11: Louis Bigo, JeanLouis Giavitto, Antoine Spicher (2011) Building Topological Spaces for Musical Objects. In Proceedings of the Third International Conference on Mathematics and Computation in Music (pp. 13–28). Berlin, Heidelberg: SpringerVerlag DOI
 MaQW16: Sephora Madjiheurem, Lizhen Qu, Christian Walder (2016) Chord2Vec: Learning Musical Chord Embeddings.
 YaMu17: Anna K. Yanchenko, Sayan Mukherjee (2017) Classical Music Composition Using State Space Models. ArXiv:1708.03822 [Cs].
 Mont15: Andrea Montanari (2015) Computational implications of reducing data to sufficient statistics. Electronic Journal of Statistics, 9(2), 2370–2390. DOI
 EiPa13: Arne Eigenfeldt, Philippe Pasquier (2013) Considering vertical and horizontal context in corpusbased generative electronic dance music. In Proceedings of the fourth international conference on computational creativity (Vol. 72).
 AnMi10: Torsten Anders, Eduardo R. Miranda (2010) Constraint Application with HigherOrder Programming for Modeling Music Theories. Computer Music Journal, 34(2), 25–38. DOI
 AnMi11: Torsten Anders, Eduardo R. Miranda (2011) Constraint programming systems for modeling music theories and composition. ACM Computing Surveys, 43(4), 1–38. DOI
 YeFW05: Jonathan S. Yedidia, W.T. Freeman, Y. Weiss (2005) Constructing freeenergy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51(7), 2282–2312. DOI
 CoDu02: Michael Collins, Nigel Duffy (2002) Convolution Kernels for Natural Language. In Advances in Neural Information Processing Systems 14 (pp. 625–632). MIT Press
 Haus99: David Haussler (1999) Convolution kernels on discrete structures. Technical report, UC Santa Cruz
 BrHP17: JeanPierre Briot, Gaëtan Hadjeres, François Pachet (2017) Deep Learning Techniques for Music Generation  A Survey. ArXiv:1709.01620 [Cs].
 HaPa16: Gaëtan Hadjeres, François Pachet (2016) DeepBach: a Steerable Model for Bach chorales generation. ArXiv:1612.01010 [Cs].
 LeGK06: SuIn Lee, Varun Ganapathi, Daphne Koller (2006) Efficient Structure Learning of Markov Networks using Regularization. In Advances in neural Information processing systems (pp. 817–824). MIT Press
 WaKR13: Larry Wasserman, Mladen Kolar, Alessandro Rinaldo (2013) Estimating Undirected Graphs Under Weak Assumptions. ArXiv:1309.6933 [Cs, Math, Stat].
 Poss86: Antonio Possolo (1986) Estimation of binary Markov random fields
 Rath96: Stephen L. Rathbun (1996) Estimation of Poisson intensity using partially observed concomitant variables. Biometrics, 226–242.
 PPRS15: Alexandre Papadopoulos, François Pachet, Pierre Roy, Jason Sakellariou (2015) Exact Sampling for Regular and Markov Constraints with Belief Propagation. In Principles and Practice of Constraint Programming (pp. 341–350). Switzerland: Springer, Cham DOI
 WiTi09: Daniela M Witten, Robert J. Tibshirani (2009) Extensions of sparse canonical correlation analysis with applications to genomic data. Statistical Applications in Genetics and Molecular Biology, 8(1), 1–27. DOI
 Tymo09: Dmitri Tymoczko (2009) Generalizing Musical Intervals. Journal of Music Theory, 53(2), 227–254. DOI
 Grav13: Alex Graves (2013) Generating Sequences With Recurrent Neural Networks. ArXiv:1308.0850 [Cs].
 ElWA17: Andrew J. Elmsley, Tillman Weyde, Newton Armstrong (2017) Generating Time: Rhythmic Perception, Prediction and Production with Recurrent Neural Networks. Journal of Creative Music Systems, 1(2).
 Bret08: Romain Brette (2008) Generation of Correlated Spike Trains. Neural Computation, 0(0), 080804143617793–28. DOI
 KrSh09: Michael Krumin, Shy Shoham (2009) Generation of Spike Trains with Controlled Auto and CrossCorrelation Functions. Neural Computation, 21(6), 1642–1664. DOI
 Dean17: Roger Dean (2017) Generative Live Musicmaking Using Autoregressive Time Series Models: Melodies and Beats. Journal of Creative Music Systems, 1(2).
 TNIY17: Hiroaki Tsushima, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii (2017) Generative Statistical Models with SelfEmergent Grammar of Chord Sequences. ArXiv:1708.02255 [Cs].
 Hall08: Rachel Wells Hall (2008) Geometrical Music Theory. Science, 320(5874), 328–329. DOI
 LCWL10: Han Liu, Xi Chen, Larry Wasserman, John D. Lafferty (2010) GraphValued Regression. In Advances in Neural Information Processing Systems 23 (pp. 1423–1431). Curran Associates, Inc.
 Poll04: Dave Pollard (2004) HammersleyClifford theorem for Markov random fields
 MeBü06: Nicolai Meinshausen, Peter Bühlmann (2006) Highdimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3), 1436–1462. DOI
 RaWL10: Pradeep Ravikumar, Martin J. Wainwright, John D. Lafferty (2010) Highdimensional Ising model selection using ℓ1regularized logistic regression. The Annals of Statistics, 38(3), 1287–1319. DOI
 TiBB00: Barbara Tillmann, Jamshed J. Bharucha, Emmanuel Bigand (2000) Implicit learning of tonality: a selforganizing approach. Psychological Review, 107(4), 885.
 Huro94: David Huron (1994) IntervalClass Content in Equally Tempered PitchClass Sets: Common Scales Exhibit Optimum Tonal Consonance. Music Perception: An Interdisciplinary Journal, 11(3), 289–305. DOI
 KoCM08: Leonid (Aryeh) Kontorovich, Corinna Cortes, Mehryar Mohri (2008) Kernel methods for learning languages. Theoretical Computer Science, 405(3), 223–236. DOI
 MoSF13: Karim Abou Moustafa, Dale Schuurmans, Frank Ferrie (2013) Learning a Metric Space for Neighbourhood Topology Estimation: Application to Manifold Learning. In Journal of Machine Learning Research (pp. 341–356).
 HiOB05: Geoffrey E. Hinton, Simon Osindero, Kejie Bao (2005) Learning causally linked markov random fields. In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (pp. 128–135). Citeseer
 GiTK10: Jon Gillick, Kevin Tang, Robert M. Keller (2010) Machine Learning of Jazz Grammars. Computer Music Journal, 34(3), 56–66. DOI
 RiKe77: B. D. Ripley, F. P. Kelly (1977) Markov Point Processes. Journal of the London Mathematical Society, s215(1), 188–192. DOI
 BaVM96: A. J. Baddeley, MarieColette NM Van Lieshout, J. Møller (1996) Markov Properties of Cluster Processes. Advances in Applied Probability, 28(2), 346–355. DOI
 Bod02b: Rens Bod (2002b) Memorybased models of melodic analysis: Challenging the Gestalt principles. Journal of New Music Research, 31(1), 27–36. DOI
 HeCh17: Dorien Herremans, ChingHua Chuan (2017) Modeling Musical Context with Word2vec. In Proceedings of the First International Conference on Deep Learning and Music, Anchorage, US, May, 2017.
 BoBV12: Nicolas BoulangerLewandowski, Yoshua Bengio, Pascal Vincent (2012) Modeling Temporal Dependencies in HighDimensional Sequences: Application to Polyphonic Music Generation and Transcription. In 29th International Conference on Machine Learning.
 MøWa07: Jesper Møller, Rasmus P. Waagepetersen (2007) Modern Statistics for Spatial Point Processes. Scandinavian Journal of Statistics, 34(4), 643–684. DOI
 GoKa04: V. Gontis, B. Kaulakys (2004) Multiplicative point process as a model of trading activity. Physica A: Statistical Mechanics and Its Applications, 343, 505–514. DOI
 BaMW00: A. J. Baddeley, J. Møller, R. Waagepetersen (2000) Non and semiparametric estimation of interaction in inhomogeneous point patterns. Statistica Neerlandica, 54(3), 329–350. DOI
 BTCC18: Tijn Borghuis, Alessandro Tibo, Simone Conforti, Luca Canciello, Lorenzo Brusci, Paolo Frasconi (2018) Off the Beaten Track: Using Deep Learning to Interpolate Between Music Genres. ArXiv:1804.09808 [Cs, Eess].
 Lies96: MarieColette N. M. van Lieshout (1996) On likelihoods for Markov random sets and Boolean models. In Proceedings of the International Symposium.
 BoRo90: Paul T. Boggs, Janet E. Rogers (1990) Orthogonal distance regression. Contemporary Mathematics, 112, 183–194.
 HaDr13: Naftali Harris, Mathias Drton (2013) PC Algorithm for Nonparanormal Graphical Models. Journal of Machine Learning Research, 14(1), 3365–3383.
 KaGA05: B. Kaulakys, V. Gontis, M. Alaburda (2005) Point process model of noise vs a sum of Lorentzians. Physical Review E, 71(5), 051105. DOI
 JoWe02: Michael I. Jordan, Yair Weiss (2002) Probabilistic inference in graphical models. Handbook of Neural Networks and Brain Theory.
 GaMa12: Mike Gashler, Tony Martinez (2012) Robust manifold learning with CycleCut. Connection Science, 24(1), 57–69. DOI
 LaWa08: John Lafferty, Larry Wasserman (2008) Rodeo: Sparse, greedy nonparametric regression. The Annals of Statistics, 36(1), 28–63. DOI
 RLLW07: Pradeep D. Ravikumar, Han Liu, John D. Lafferty, Larry A. Wasserman (2007) SpAM: Sparse Additive Models. In NIPS.
 KrBo13: Dirk P. Kroese, Zdravko I. Botev (2013) Spatial process generation. ArXiv:1308.0399 [Stat].
 Seth97: William A. Sethares (1997) Specifying spectra for musical scales. The Journal of the Acoustical Society of America, 102(4), 2422–2431. DOI
 SMTP09: William A. Sethares, Andrew J. Milne, Stefan Tiedje, Anthony Prechtl, James Plamondon (2009) Spectral Tools for Dynamic Tonality and Audio Morphing. Computer Music Journal, 33(2), 71–84. DOI
 LiRW10: Han Liu, Kathryn Roeder, Larry Wasserman (2010) Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models. In Advances in Neural Information Processing Systems 23 (pp. 1432–1440). Curran Associates, Inc.
 MeBü10: Nicolai Meinshausen, Peter Bühlmann (2010) Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417–473. DOI
 HaSP16: Gaëtan Hadjeres, Jason Sakellariou, François Pachet (2016) Style Imitation and Chord Invention in Polyphonic Music with Exponential Families. ArXiv:1609.05152 [Cs].
 Hutc17: P. Hutchings (2017) Talking Drums: Generating drum grooves with neural networks. In arXiv:1706.09558 [cs].
 GaMa11: Mike Gashler, Tony Martinez (2011) Tangent space guided intelligent neighbor finding. (pp. 2617–2624). IEEE DOI
 LSSC02: Huma Lodhi, Craig Saunders, John ShaweTaylor, Nello Cristianini, Chris Watkins (2002) Text Classification Using String Kernels. Journal of Machine Learning Research, 2, 419–444.
 VeRo15: Victor Veitch, Daniel M. Roy (2015) The Class of Random Graphs Arising from Exchangeable Random Measures. ArXiv:1512.03099 [Cs, Math, Stat].
 Tymo06: Dmitri Tymoczko (2006) The Geometry of Musical Chords. Science, 313(5783), 72–74. DOI
 LiLW09: Han Liu, John Lafferty, Larry Wasserman (2009) The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs. Journal of Machine Learning Research, 10, 2295–2328.
 LHYL12: Han Liu, Fang Han, Ming Yuan, John Lafferty, Larry Wasserman (2012) The Nonparanormal SKEPTIC. ArXiv:1206.6488 [Cs, Stat].
 BuSe14: Ryan Budney, William Sethares (2014) Topology of Musical Data. Journal of Mathematics and Music, 8(1), 73–92. DOI
 GBTE14: Ross Goroshin, Joan Bruna, Jonathan Tompson, David Eigen, Yann LeCun (2014) Unsupervised Learning of Spatiotemporally Coherent Metrics. ArXiv:1412.6056 [Cs].
 Bod01: Rens Bod (2001) What is the Minimal Set of Fragments That Achieves Maximal Parse Accuracy? In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics (pp. 66–73). Stroudsburg, PA, USA: Association for Computational Linguistics DOI