The meeting point of differential privacy, accountability, interpretability,
the *tank detection* story, *clever horses* in machine learning
especially and the pertinent modern connection to working out if models are treating humans fairly, if fairness was not a criterion in training the models.

To put it another way, here are really two related problems:

- How can I work out what my model is using to tell me what it just told me?
- How can I ensure that my model is “fair” in what it does use?

## Understanding black box models

Much work here; I understand little of it at the moment, but I keep needing to refer to papers here.

- Most frequently I need the link to LIME, a neat model that uses penalised regreession to do local model explanations. (RiSG16) See their blog post.
- The deep dream “activation maximisation” images could sort of be classified as a type of model explanation, e.g. Multifaceted neuron visualization (NgYC16)

## Think pieces on unfair models in practice

visualisation of ML discrimination, by google staffers (HaPS16)

Homework problem: What can the following model tell us?

*Automated Inference on Criminality using Face Images*(WuZh16)[…]we find some discriminating structural features for predicting criminality, such as lip curvature, eye inner corner distance, and the so-called nose-mouth angle. Above all, the most important discovery of this research is that criminal and non-criminal face images populate two quite distinctive manifolds. The variation among criminal faces is significantly greater than that of the non-criminal faces. The two manifolds consisting of criminal and non-criminal faces appear to be concentric, with the non-criminal manifold lying in the kernel with a smaller span, exhibiting a law of normality for faces of non-criminals. In other words, the faces of general law-biding public have a greater degree of resemblance compared with the faces of criminals, or criminals have a higher degree of dissimilarity in facial appearance than normal people.

Oh, and what would you be happy with your local law enforcement authority taking home from this?

## Refs

- AgYu08
- Aggarwal, C. C., & Yu, P. S.(2008) A General Survey of Privacy-Preserving Data Mining Models and Algorithms. In C. C. Aggarwal & P. S. Yu (Eds.), Privacy-Preserving Data Mining (pp. 11–52). Springer US DOI.
- AlBe16
- Alain, G., & Bengio, Y. (2016) Understanding intermediate layers using linear classifier probes.
*arXiv:1610.01644 [Cs, Stat]*. - Burr16
- Burrell, J. (2016) How the machine “thinks”: Understanding opacity in machine learning algorithms.
*Big Data & Society*, 3(1), 2053951715622512. DOI. - ChGu05
- Chipman, H. A., & Gu, H. (2005) Interpretable dimension reduction.
*Journal of Applied Statistics*, 32(9), 969–987. DOI. - DHPR12
- Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012) Fairness Through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (pp. 214–226). New York, NY, USA: ACM DOI.
- FFMS15
- Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C., & Venkatasubramanian, S. (2015) Certifying and Removing Disparate Impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 259–268). New York, NY, USA: ACM DOI.
- HaPS16
- Hardt, M., Price, E., & Srebro, N. (2016) Equality of Opportunity in Supervised Learning.
- LLSR16
- Lash, M. T., Lin, Q., Street, W. N., Robinson, J. G., & Ohlmann, J. (2016) Generalized Inverse Classification.
*arXiv:1610.01675 [Cs, Stat]*. - Lipt16
- Lipton, Z. C.(2016) The Mythos of Model Interpretability. In arXiv:1606.03490 [cs, stat].
- MFFF16
- Moosavi-Dezfooli, S.-M., Fawzi, A., Fawzi, O., & Frossard, P. (2016) Universal adversarial perturbations.
*arXiv:1610.08401 [Cs, Stat]*. - NgYC16
- Nguyen, A., Yosinski, J., & Clune, J. (2016) Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks.
*arXiv Preprint arXiv:1602.03616*. - RiSG16
- Ribeiro, M. T., Singh, S., & Guestrin, C. (2016) “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. (pp. 1135–1144). ACM Press DOI.
- Swee13
- Sweeney, L. (2013) Discrimination in Online Ad Delivery.
*Queue*, 11(3), 10:10–10:29. DOI. - WuZh16
- Wu, X., & Zhang, X. (2016) Automated Inference on Criminality using Face Images.
*arXiv:1611.04135 [Cs]*. - ZWSP13
- Zemel, R., Wu, Y., Swersky, K., Pitassi, T., & Dwork, C. (2013) Learning Fair Representations. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) (pp. 325–333).