The Living Thing / Notebooks :

Model interpretation, fairness and trust

The meeting point of differential privacy, accountability, interpretability, the tank detection story, clever horses in machine learning especially and the pertinent modern connection to working out if models are treating humans fairly, if fairness was not a criterion in training the models.

To put it another way, here are really two related problems:

  1. How can I work out what my model is using to tell me what it just told me?
  2. How can I ensure that my model is “fair” in what it does use?

Understanding black box models

Much work here; I understand little of it at the moment, but I keep needing to refer to papers here.

Think pieces on unfair models in practice

Refs

AgYu08
Aggarwal, C. C., & Yu, P. S.(2008) A General Survey of Privacy-Preserving Data Mining Models and Algorithms. In C. C. Aggarwal & P. S. Yu (Eds.), Privacy-Preserving Data Mining (pp. 11–52). Springer US DOI.
AlBe16
Alain, G., & Bengio, Y. (2016) Understanding intermediate layers using linear classifier probes. arXiv:1610.01644 [Cs, Stat].
Burr16
Burrell, J. (2016) How the machine “thinks”: Understanding opacity in machine learning algorithms. Big Data & Society, 3(1), 2053951715622512. DOI.
ChGu05
Chipman, H. A., & Gu, H. (2005) Interpretable dimension reduction. Journal of Applied Statistics, 32(9), 969–987. DOI.
DHPR12
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012) Fairness Through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (pp. 214–226). New York, NY, USA: ACM DOI.
FFMS15
Feldman, M., Friedler, S. A., Moeller, J., Scheidegger, C., & Venkatasubramanian, S. (2015) Certifying and Removing Disparate Impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 259–268). New York, NY, USA: ACM DOI.
HaPS16
Hardt, M., Price, E., & Srebro, N. (2016) Equality of Opportunity in Supervised Learning.
LLSR16
Lash, M. T., Lin, Q., Street, W. N., Robinson, J. G., & Ohlmann, J. (2016) Generalized Inverse Classification. arXiv:1610.01675 [Cs, Stat].
Lipt16
Lipton, Z. C.(2016) The Mythos of Model Interpretability. In arXiv:1606.03490 [cs, stat].
MFFF16
Moosavi-Dezfooli, S.-M., Fawzi, A., Fawzi, O., & Frossard, P. (2016) Universal adversarial perturbations. arXiv:1610.08401 [Cs, Stat].
NgYC16
Nguyen, A., Yosinski, J., & Clune, J. (2016) Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks. arXiv Preprint arXiv:1602.03616.
RiSG16
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016) “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. (pp. 1135–1144). ACM Press DOI.
Swee13
Sweeney, L. (2013) Discrimination in Online Ad Delivery. Queue, 11(3), 10:10–10:29. DOI.
WuZh16
Wu, X., & Zhang, X. (2016) Automated Inference on Criminality using Face Images. arXiv:1611.04135 [Cs].
ZWSP13
Zemel, R., Wu, Y., Swersky, K., Pitassi, T., & Dwork, C. (2013) Learning Fair Representations. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) (pp. 325–333).