Sunday, August 6, 2017

Testing for discrimination in college admissions

Recently, the Trump administration’s investigation into racial discrimination in college admissions has brought the topic back into the news. But the claim that some races need higher GPAs or SAT scores to be admitted to colleges is, of course, an old one. This post discusses the statistical subtleties involved in proving such a claim: specifically, I examine some of the arguments that Asian applicants need higher SAT scores than white applicants. To be open about my beliefs at the outset, I think that colleges probably do discriminate against Asians, as they once discriminated against Jews, but the statistical arguments made to prove discrimination are often flawed. This also describes my beliefs about discrimination more broadly: while it is pervasive, quantifying it statistically is hard.

We’re going to use a hypothetical example where only whites and Asians apply for admission, Asians tend to have higher SAT scores than whites, and the only thing that actually affects whether you get admitted is your SAT score. So in this hypothetical example, there is no discrimination; your race does not affect your chances of admission.
On the left, I show the scores for Asian applicants and white applicants. On the right, I show how your probability of admission depends on your SAT score. So someone with an SAT score of 1400 has about a 50% chance of admission, regardless of whether they’re white or Asian. Given that there’s no discrimination in our hypothetical example, if a statistical argument implies there is discrimination, that argument is flawed. So let’s take a look at some arguments.

The most common argument I’ve seen that Asians are discriminated against is that the SAT scores of admitted Asians are higher than SAT scores of admitted whites. But Kirabo Jackson, an economist at Northwestern University, points out the flaw in this argument. In our hypothetical example, where there is no discrimination, admitted Asians will have an average score of about 1460, and admitted whites will have an average score of about 1310. This happens because the Asian distribution is shifted to the right: even though a kid with a 1500 is equally likely to get in regardless of whether they’re white or Asian, there are more Asians with 1500s.

When I ran this argument by a friend, he said that the study which people often cite when claiming Asians are discriminated against is considerably more sophisticated. So I read the study, and it is more sophisticated; it’s worth reading. They fit a model where they simultaneously control for someone’s race and SAT score, which lets you see whether people of some races need higher scores to get in.

Here’s the subtlety. The paper doesn’t actually look at SAT scores, but SAT scores divided into bins from 1200 - 1300, 1300 - 1400, and so on. Within those bins, the paper’s model assumes all applicants should have an equal chance of admission (all else being equal). But that isn’t quite right: an applicant with a 1290 will have a higher chance of admission than an applicant with an 1210. And because Asians are right-shifted in our example, that means that Asians in the 1200 - 1300 bin will have higher scores, and a higher chance of admission, than whites in the 1200 - 1300 bin, even though the paper’s model assumes that applicants in that bin should be equal if there is no discrimination. Below is a plot which illustrates the idea. Within each score bin, Asians (red line) have a higher average SAT score (left plot), and thus a higher chance of admission (right plot), then whites in the same bin (blue line).


So what happens when we fit the paper’s model on our hypothetical data? Now we find discrimination against whites. This happens because the blue lines are below the red lines: whites in a bin have a lower chance of admission than Asians in a bin because they have lower average scores. So the paper’s model will incorrectly conclude that, controlling for SAT score, whites have about 20% lower odds of admission, a significant amount of discrimination. I should note it’s entirely possible that the authors fit other models that don’t bin SAT scores, although I couldn’t find those models mentioned in the paper [1]; please point me to anything I’ve missed.

Okay. So we took hypothetical data that had no discrimination. One widely repeated statistical argument shows discrimination against Asians. Another widely repeated statistical argument shows discrimination against whites. This isn’t good. The basic mathematical takeaway is that when races have different distributions over a variable (like SAT score) and you divide that variable into bins, you can get misleading results. (See the literature on infra-marginality for interesting discussions of related phenomena in tests for police discrimination).

The broader takeaway is that testing for discrimination is really hard. Which isn’t to say you should discount all evidence that it occurs; you should just be mindful of the caveats. Also, these statistical problems are tricky and fun to think about, so you should come work with me on them.

Footnotes:

[1] One of the authors went on to write a book on the topic, the one cited in the lawsuit against Harvard; I took a look at the relevant chapter, and it seems to use a similar binning strategy for SAT scores. To be clear, just because a model has caveats worth discussing doesn’t mean the work is bad or the conclusions are wrong; indeed, the book appears to be impressively comprehensive. Also, our hypothetical example actually suggests that this model might underestimate the amount of discrimination against Asians.

6 comments:

  1. If one did want to work on tricky and fun statistical problems what would one need to do?

    ReplyDelete
  2. Isn't the solution to look at the minimum SAT score among the accepted Asians and Whites? If admissions were based on SAT score alone (with some threshold for determining acceptance), we'd expect it to be equal for both. If it is higher for Asians, it would imply discrimination against Asians (and vice versa).

    Of course, the minimum is highly sensitive to outliers, so we would have to look at average SAT score of the last five or something.

    ReplyDelete
    Replies
    1. The "SAT-only" model is deliberately oversimplified. Thus the possibility of getting in with a low score. Populations may (and do) differ in those other factors as well.

      Decades ago, I had an inside look at an admissions office, and the first pass was a*(Grade Point Average) + b*(test score). Except what should a and b be set to? In practice, b had been raised because males got better test scores, but worse grades, and if the two sources of information were weighted more equally, the class would be too skewed by gender. (Yes, this means that in practice, *men* were benefiting from affirmative action, at least compared to earlier years.) A statistical test might show how the two inputs were weighted, but wouldn't provide any insight into why, or whether those weights were reasonable.

      By the time you looked at race, the sample size for whites was still large enough that GPA+test told me who was "probably in", "on the edge", or "out unless they had a story to make them interesting". The sample size for minorities was small enough that it would have been unreasonable to assume the GPA+test model was more than a first pass.

      And if you assumed that those beneath the "white" line were *all* given special treatment (and ignore the whites who also got in with lower grades/test scores, but an interesting story or good recommendation), it was still less than the number of "extra" students that the profs put up with to get a more interesting/diverse class -- but would not have accepted for yet another generic coin-flip level student. So even if you declare them insufficiently qualified, they still weren't taking the place of another student. Offhand, I'm not sure how to model potential extra seats that appear *only* when they contradict the simpler model.

      Delete
  3. Recently I have read the article about our kids being taught to be perpetually sorry at schools. That means that despite that fact that we are tolerant enough, we have to be even more tolerant and literally walk on eggshells sometimes. Resent this principle. I think that all people deserve respect, especially the young ones who are reaching towards knowledge. If someone is still discriminated during the admission process, he or she would better find college essay online and consider studying online. Thanks for sharing the news.

    ReplyDelete
  4. We Provide Best Packers And Movers Chennai List for Get Free Best Quotes, Compare Charges, Save Money And Time, Household Shifting Services @
    Packers And Movers Chennai

    ReplyDelete