The Living Thing / Notebooks : Classification (labelling losses, fitting them etc)

Multi-label

Precision/Recall and f-scores all work for multi-label classification, although they have bad qualities in unbalanced classes.

Unbalanced class problems

Matthews correlation coefficient.

Due to Matthews, it seems (Matt75) Worth explaining.

ROC/AUC

HaMc83 talk about the AUC for Radiology; Supposedly Spac89 introduced it to machine learning, but I haven’t read the article in question.

Binary cross entropy

I’d better write the form for this, since most ML toolkits are curiously shy about it.

Let \(x\) be the estimated probability and \(z\) be the supervised class label. Then the binary cross entropy loss is

\begin{equation*} \ell(x,z) = -z\log(x) - (1-z)\log(1-x) \end{equation*}

If \(y=\operatorname{logit}(x)\) is not a probability but a logit, then the numerically stable version is

\begin{equation*} \ell(y,z) = \max\{y,0\} - y + \log(1+\exp(-|x|)) \end{equation*}

Refs

FlHF11
Flach, P., Hernández-Orallo, J., & Ferri, C. (2011) A Coherent Interpretation of AUC as a Measure of Aggregated Classification Performance. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) (pp. 657–664).
Goro04
Gorodkin, J. (2004) Comparing two K-category assignments by a K-category correlation coefficient. Computational Biology and Chemistry, 28(5–6), 367–374. DOI.
Hand09
Hand, D. J.(2009) Measuring classifier performance: a coherent alternative to the area under the ROC curve. Machine Learning, 77(1), 103–123. DOI.
HaMc83
Hanley, J. A., & McNeil, B. J.(1983) A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology, 148(3), 839–843. DOI.
LoJR08
Lobo, J. M., Jiménez-Valverde, A., & Real, R. (2008) AUC: a misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography, 17(2), 145–151. DOI.
Matt75
Matthews, B. W.(1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Structure, 405(2), 442–451. DOI.
Powe07
Powers, D. M.(2007) Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation.
ReWi11
Reid, M. D., & Williamson, R. C.(2011) Information, Divergence and Risk for Binary Experiments. Journal of Machine Learning Research, 12(Mar), 731–817.
Spac89
Spackman, K. A.(1989) Signal Detection Theory: Valuable Tools for Evaluating Inductive Learning. In Proceedings of the Sixth International Workshop on Machine Learning (pp. 160–163). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.