The meeting point of differential privacy, accountability, interpretability, the tank detection story, clever horses in machine learning. Closely related: are the models what you would call fair?
Much work here; I understand little of it at the moment, but I keep needing to refer to papers here.
Frequently I need the link to LIME, a neat model that uses penalised regression to do local model explanations. (RiSG16) See their blog post.
Here’s a thing that was so obvious I assumed it had already been done: KRPH17
Recent work on fairness in machine learning has focused on various statistical discrimination criteria and how they trade off. Most of these criteria are observational: They depend only on the joint distribution of predictor, protected attribute, features, and outcome. While convenient to work with, observational criteria have severe inherent limitations that prevent them from resolving matters of fairness conclusively.
Going beyond observational criteria, we frame the problem of discrimination based on protected attributes in the language of causal reasoning. This viewpoint shifts attention from “What is the right fairness criterion?” to “What do we want to assume about the causal data generating process?” Through the lens of causality, we make several contributions. First, we crisply articulate why and when observational criteria fail, thus formalizing what was before a matter of opinion. Second, our approach exposes previously ignored subtleties and why they are fundamental to the problem. Finally, we put forward natural causal non-discrimination criteria and develop algorithms that satisfy them.
The deep dream “activation maximisation” images could sort of be classified as a type of model explanation, e.g. Multifaceted neuron visualization (NgYC16)
Refs
- AgYu08: Charu C. Aggarwal, Philip S. Yu (2008) A General Survey of Privacy-Preserving Data Mining Models and Algorithms. In Privacy-Preserving Data Mining (pp. 11–52). Springer US DOI
- WuZh16: Xiaolin Wu, Xi Zhang (2016) Automated Inference on Criminality using Face Images. ArXiv:1611.04135 [Cs].
- KRPH17: Niki Kilbertus, Mateo Rojas-Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, Bernhard Schölkopf (2017) Avoiding Discrimination through Causal Reasoning. ArXiv:1706.02744 [Cs, Stat].
- BaSe16: Solon Barocas, Andrew D. Selbst (2016) Big Data’s Disparate Impact (SSRN Scholarly Paper No. ID 2477899). Rochester, NY: Social Science Research Network
- FFMS15: Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, Suresh Venkatasubramanian (2015) Certifying and Removing Disparate Impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 259–268). New York, NY, USA: ACM DOI
- Swee13: Latanya Sweeney (2013) Discrimination in Online Ad Delivery. Queue, 11(3), 10:10–10:29. DOI
- HaPS16: Moritz Hardt, Eric Price, Nati Srebro (2016) Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems (pp. 3315–3323).
- DHPR12: Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, Richard Zemel (2012) Fairness Through Awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (pp. 214–226). New York, NY, USA: ACM DOI
- LLSR16: Michael T. Lash, Qihang Lin, W. Nick Street, Jennifer G. Robinson, Jeffrey Ohlmann (2016) Generalized Inverse Classification. ArXiv:1610.01675 [Cs, Stat].
- Burr16: Jenna Burrell (2016) How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data & Society, 3(1), 2053951715622512. DOI
- KlMR16: Jon Kleinberg, Sendhil Mullainathan, Manish Raghavan (2016) Inherent Trade-Offs in the Fair Determination of Risk Scores.
- ChGu05: Hugh A. Chipman, Hong Gu (2005) Interpretable dimension reduction. Journal of Applied Statistics, 32(9), 969–987. DOI
- WPPA16: Scott Wisdom, Thomas Powers, James Pitton, Les Atlas (2016) Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery. In Advances in Neural Information Processing Systems 29.
- ZWSP13: Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, Cynthia Dwork (2013) Learning Fair Representations. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) (pp. 325–333).
- NgYC16: Anh Nguyen, Jason Yosinski, Jeff Clune (2016) Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks. ArXiv Preprint ArXiv:1602.03616.
- Mico17: Thomas Miconi (2017) The impossibility of “fairness”: a generalized impossibility result for decisions.
- Lipt16: Zachary C. Lipton (2016) The Mythos of Model Interpretability. In arXiv:1606.03490 [cs, stat].
- AlBe16: Guillaume Alain, Yoshua Bengio (2016) Understanding intermediate layers using linear classifier probes. ArXiv:1610.01644 [Cs, Stat].
- MFFF16: Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, Pascal Frossard (2016) Universal adversarial perturbations. In arXiv:1610.08401 [cs, stat].
- RiSG16: Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin (2016) “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. (pp. 1135–1144). ACM Press DOI