Speech recognition

Turning audio into text.

Not much here; I just needed to audio some speech recognition for a friend. I care more about a different machine-listening niche, musical machine listening.

You might try CMU sphinx.

The interesting warping sub-problem sounds interesting, though.


Connectionist Temporal Classification is a loss function useful for performing supervised learning on sequence data, without needing an alignment between input data and labels. For example, CTC can be used to train end-to-end systems for speech recognition, which is how we have been using it at Baidu’s Silicon Valley AI Lab.