The Trouble with Bias (Kate Crawford, NIPS 2017)

Kate Crawford a donné un exposé remarqué à la conférence NIPS, qui est maintenant en ligne, ça relie data, biais, apprentissage automatique et sciences sociales:

Voici l’annonce de Kate sur twitter (

My NIPS talk ‘The Trouble with Bias’ is now up on the 'tubes. About the politics of classification, and limits of focusing on allocative harms not representational harms. The desire for a quick ‘technical fix’ to bias could actually do more damage.

Thomas G. Dietterich en a fait un résumé utile en quelques tweets (

@katecrawford makes many very important points. Here is my attempt to translate some of them into engineering terminology. 1/
As ML researchers and practitioners, we tend to formulate the problem as learning a mapping from inputs X to outputs Y. Crawford’s point about bias in the data is that we aren’t observing the real inputs or the real outputs 2/
For example, in crime prediction, X is not a sample from the population but instead from the subset of the population investigated by the police. 3/
Y is not the true outcome (e.g., did the person commit a crime), but the conclusion of the legal system. 4/
So a model predicting Y from X is predicting how the legal system will treat a person X that they have chosen to investigate. It is not predicting whether a person X’ drawn from the general population is guilty (Y’). 5/
To model X’->Y’, we need to model the sampling bias of X and the legal bias of Y. We have tools to help with this, so in some sense there is an “engineering solution”, but it requires good models. The experts on those kinds of models are social scientists 6/
Now let’s consider optimizing a policy to achieve some goal. One interpretation of @katecrawford 's remarks on representational harm is that we need to think carefully about the optimization objective. 7/
In building a system for Extreme Vetting, for example, the proximal “customer” is the government, and we tend to focus on the costs to the government of false positive and false negative errors. 8/
But she is telling us to think more broadly about the other “customers”, namely the people being vetted and the broader society. We must consider the short- and long-term costs of the vetting process and the false positive and false negative errors on these customers, too. 9/
Modeling those is very challenging, and again, the people with the most expertise are social scientists. The indignity of being “vetted”, for example, is itself a cost. 10/
AI tools work by optimizing an objective. If the objective is wrong, the resulting system will be wrong. As with all software engineering efforts, it is important to engage with all of the stakeholders to ensure that we have the correct specifications before optimizing. 11/
Formulating the right specifications is the joint responsibility of the software engineers and the stakeholders. Computer scientists can’t expect that other people will do this for us nor that we can do it alone. end/


Super intéressant et important !
Ça remet les sciences sociales au cœur du raisonnement, et ça conduit à rompre avec une forme de solutionisme un peu naïf.
À garder présent à l’esprit dans nos pratiques professionnelles…