The meeting point of differential privacy, accountability, interpretability,
the *tank detection* story, *clever horses* in machine learning
especially and the pertinent modern connection to working out if models are treating humans fairly, if fairness was not a criterion in training the models.

To put it another way, here are really two related problems:

- How can I work out what my model is using to tell me what it just told me?
- How can I ensure that my model is “fair” in what it does use?

## Understanding black box models

Much work here; I understand little of it at the moment, but I keep needing to refer to papers here.

Frequently I need the link to LIME, a neat model that uses penalised regression to do local model explanations. (RiSG16) See their blog post.

Here’s a thing that was so obvious I assumed it had already been done: KRPH17

Recent work on fairness in machine learning has focused on various statistical discrimination criteria and how they trade off. Most of these criteria are observational: They depend only on the joint distribution of predictor, protected attribute, features, and outcome. While convenient to work with, observational criteria have severe inherent limitations that prevent them from resolving matters of fairness conclusively.

Going beyond observational criteria, we frame the problem of discrimination based on protected attributes in the language of causal reasoning. This viewpoint shifts attention from “What is the right fairness criterion?” to “What do we want to assume about the causal data generating process?” Through the lens of causality, we make several contributions. First, we crisply articulate why and when observational criteria fail, thus formalizing what was before a matter of opinion. Second, our approach exposes previously ignored subtleties and why they are fundamental to the problem. Finally, we put forward natural causal non-discrimination criteria and develop algorithms that satisfy them.

The deep dream “activation maximisation” images could sort of be classified as a type of model explanation, e.g. Multifaceted neuron visualization (NgYC16)

## Think pieces on unfair models in practice

visualisation of ML discrimination, by google staffers (HaPS16)

Homework problem: What can the following model tell us?

*Automated Inference on Criminality using Face Images*(WuZh16)[…]we find some discriminating structural features for predicting criminality, such as lip curvature, eye inner corner distance, and the so-called nose-mouth angle. Above all, the most important discovery of this research is that criminal and non-criminal face images populate two quite distinctive manifolds. The variation among criminal faces is significantly greater than that of the non-criminal faces. The two manifolds consisting of criminal and non-criminal faces appear to be concentric, with the non-criminal manifold lying in the kernel with a smaller span, exhibiting a law of normality for faces of non-criminals. In other words, the faces of general law-biding public have a greater degree of resemblance compared with the faces of criminals, or criminals have a higher degree of dissimilarity in facial appearance than normal people.

Oh, and what would you be happy with your local law enforcement authority taking home from this?

## Refs

- AgYu08
- Aggarwal, C. C., & Yu, P. S.(2008) A General Survey of Privacy-Preserving Data Mining Models and Algorithms. In C. C. Aggarwal & P. S. Yu (Eds.), Privacy-Preserving Data Mining (pp. 11–52). Springer US DOI.
- AlBe16
- Alain, G., & Bengio, Y. (2016) Understanding intermediate layers using linear classifier probes.
*ArXiv:1610.01644 [Cs, Stat]*. - BaSe16
- Barocas, S., & Selbst, A. D.(2016) Big Data’s Disparate Impact (SSRN Scholarly Paper No. ID 2477899). . Rochester, NY: Social Science Research Network
- Burr16
- Burrell, J. (2016) How the machine ‘thinks’: Understanding opacity in machine learning algorithms.
*Big Data & Society*, 3(1), 2053951715622512. DOI. - ChGu05
- Chipman, H. A., & Gu, H. (2005) Interpretable dimension reduction.
*Journal of Applied Statistics*, 32(9), 969–987. DOI. - ChFS16
- Choi, K., Fazekas, G., & Sandler, M. (2016) Explaining Deep Convolutional Neural Networks on Music Classification.
*ArXiv:1607.02444 [Cs]*. - DHPR12
- Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012) Fairness Through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (pp. 214–226). New York, NY, USA: ACM DOI.
- FFMS15
- Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C., & Venkatasubramanian, S. (2015) Certifying and Removing Disparate Impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 259–268). New York, NY, USA: ACM DOI.
- HaPS16
- Hardt, M., Price, E., & Srebro, N. (2016) Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems (pp. 3315–3323).
- KRPH17
- Kilbertus, N., Rojas-Carulla, M., Parascandolo, G., Hardt, M., Janzing, D., & Schölkopf, B. (2017) Avoiding Discrimination through Causal Reasoning.
*ArXiv:1706.02744 [Cs, Stat]*. - LLSR16
- Lash, M. T., Lin, Q., Street, W. N., Robinson, J. G., & Ohlmann, J. (2016) Generalized Inverse Classification.
*ArXiv:1610.01675 [Cs, Stat]*. - Lipt16
- Lipton, Z. C.(2016) The Mythos of Model Interpretability. In arXiv:1606.03490 [cs, stat].
- MFFF16
- Moosavi-Dezfooli, S.-M., Fawzi, A., Fawzi, O., & Frossard, P. (2016) Universal adversarial perturbations.
*ArXiv:1610.08401 [Cs, Stat]*. - NgYC16
- Nguyen, A., Yosinski, J., & Clune, J. (2016) Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks.
*ArXiv Preprint ArXiv:1602.03616*. - RiSG16
- Ribeiro, M. T., Singh, S., & Guestrin, C. (2016) “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. (pp. 1135–1144). ACM Press DOI.
- Swee13
- Sweeney, L. (2013) Discrimination in Online Ad Delivery.
*Queue*, 11(3), 10:10–10:29. DOI. - WPPA16
- Wisdom, S., Powers, T., Pitton, J., & Atlas, L. (2016) Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery. In Advances in Neural Information Processing Systems 29.
- WuZh16
- Wu, X., & Zhang, X. (2016) Automated Inference on Criminality using Face Images.
*ArXiv:1611.04135 [Cs]*. - ZWSP13
- Zemel, R., Wu, Y., Swersky, K., Pitassi, T., & Dwork, C. (2013) Learning Fair Representations. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) (pp. 325–333).