Essays Get Automated Scores... Why Not Report Them in the Test Center?

Right after you take the computer-based GRE, unofficial scores are available for Quant and Verbal but not Analytical Writing. You won’t learn how you did on the GRE’s essay section until your official scores come out about two weeks later.

Yet a mere two milliseconds is enough time to score your essays with e-rater, the essay evaluation software used by ETS. So why doesn’t ETS automatically calculate an unofficial score for Analytical Writing just as it does for Quant and Verbal? The answer likely has to do with the role e-rater plays in your official GRE writing score.

E-rater has been burning through GRE essays since 2008. As of 2015, the automated scoring software has yet to assign an official Analytical Writing score. Human readers retain that job. E-rater’s role is just to provide a “check score.” Here’s how a single essay is scored:

Using a half-point scale of 0 to 6, e-rater and a human score your essay. If the human score falls within a half-point of the e-rater score, you’re assigned the human score.
Otherwise, a second human evaluates your essay using the same 0 to 6 scale, and you’re assigned the average of the human scores, rounded to the nearest half-point. (In less than 5% of cases, scoring will require a third or even fourth human rater.)

Imagine Amy and Bibi are two humans who score GRE essays for ETS. Suppose Amy gives your Issue essay a 4, and e-rater generates some score between 3.5 and 4.5. Amy’s score stands. But what if e-rater’s score equals or exceeds either 3.5 or 4.5? Bibi scores your Issue essay. Say she gives it a 5. Then your final Issue essay score is the average of 4 (Amy’s score) and 5 (Bibi’s score) rounded up to the nearest half-point—that is, 4.5. Your Argument essay is likewise scored by e-rater plus one or more humans. Your official Analytical Writing score is the average of the final scores for your two essays.

Back to our question: Why doesn’t an e-rater score accompany the automated unofficial Quant and Verbal scores that you get in the test center? The short answer is that your official Quant and Verbal scores are automated, whereas your official Analytical Writing score is not.

The long answer, I suspect, is that thousands of official scores wouldn’t match the scores from e-rater. For about 5% of Argument essays, the e-rater score differs by more than 1-point from the human score, based on a 2012 report from ETS. Even if the e-rater score equals the human score for the Issue essay, the overall Analytical Writing score from e-rater may not equal the official human-assigned score. Now, imagine those scores weren’t equal for 5% of test takers, and ETS reported the unofficial e-rater score on test day. Around 28,000 examinees in 2013–2014 would’ve received an official Analytical Writing score that was higher or (*cringe*) lower than what was presented in the test center.

Still, let’s not ignore the bigger, better part of e-rater’s scoring record. About 95% of the time e-rater’s score for the Argument essay is within 1-point of the human reader’s. That percentage rises to 97% for the Issue essay. Does that make e-rater good enough to use for unofficial scores? Maybe not. But how about good enough to use for practice scores? Maybe so. If you want to try e-rater, you can submit your PowerPrep II practice essays to ETS’s ScoreItNow! writing practice service. These two official resources are about as close as you’ll get to the real exam outside the test center.

Leave a Reply Cancel reply