When LSAC added sections from the May 2020 LSAT-Flex administration to their Prep Plus package, it allowed us to see exactly how they score a Flex test. Although the Flex version of the test is no longer with us, this data is still relevant to the new LSAT. Let’s take a look at LSAC’s official scoring scale for the May 2020 Flex test:
*** There is no raw score that will produce this scaled score for this form.
Note: This scoring scale is specific to the May 2020 LSAT-Flex, not any other Flex exams.
This Flex scale has a number of intriguing features, so let’s take a closer look!
The Scale is Loose at the Top
The scale is fairly loose at the upper end. 67 out of 76, or -9, for a 170 is the 4-section equivalent of missing 12. That’s far from extraordinary but it is on the more generous end of possible curves. In fact, only two other tests since December 2016 have been that loose or looser. If want to see all of the LSAT Score conversions (from June ‘91 to the most recently released scale) we have a comprehensive chart here: LSAT Raw Score Conversion.
The Scale is Extremely Loose in the Middle
The scale is unprecedentedly forgiving in the middle! Let’s look at some score markers through the mid-range for reference:
- 145: 33 correct answers produced a 145 on the May LSAT. That translates to a regular—traditional four-section—raw score of roughly 44 questions, a number one lower than any previously seen (a handful of past tests, mostly from the 90s, allowed for 45 questions correct). On recent tests, a 145 typically requires 47-50 questions correct, so this Flex offered a 3-5 question advantage for 145-scorers.
- 150: 40 correct answers produced a 150 on May. If we equate that to a regular test it comes to approximately 52-53 correct for that same scoring result. How far from normal is that? 53 right for a 150 has only happened three times in history, and not once in the past 27 years. And 52 correct for a 150 has never happened! A far more typical range is 56-58—and it’s often reached 59-61—so again test takers in the mid-range are allowed roughly three or four more missed questions for the same 150 result in May.
- 155: 46 correct answers produced a 155 on May. Converting that to a four-section raw score means 61-62 right answers for that same 155 outcome, which is another historically-low requirement, tying the record set by, and held since, the June 1992 LSAT! Test takers these days can expect to answer at least 64-66 questions correctly before they reach a 155, a bar set at least 2-4 questions higher than the one in May.
- 160: 54 correct answers produced a 160 on May. Again, translated that’s approximately 71-72 correct on a regular LSAT, a number only nearly-matched three times in the last ten years: December 2013, June 2014, and December 2017 were all 72 right for a 160. So, like the raw 145-155 numbers, we have an outlier for a 160, as well!
The Scale Tightens Up at the Bottom
The scale is historically unique at the low end, too…but oddly by being tighter than ever! This clearly won’t affect most people, but requiring 16 correct on May to achieve a 121 is the equivalent of needing at least 21 right on a 4-section test, and that has never happened before (the most ever needed previously was 20, and more commonly these days it’s closer to 18). Looking at a 131, in May you needed 21 right—that’s 28 on a typical, full-scale test—and only one test, June 2005, has ever required more correct answers for that score.
So as forgiving as the curve was for higher-scorers, lower-scorers were forced to answer more questions correctly than ever to get the same results.
Why Was the Scale More Forgiving Overall?
So what led to the overall looseness of the curve? Let’s explore some plausible factors.
- I’ll start by dispelling what I believe to be a myth: LSAC was being charitable, lobbing out a first-Flex softball. The test makers adhere to a strict curve built around percentiles, so while fewer questions correct for any given score feels like a gift, it’s not. That means something about this test, not its creators, served to satisfy those percentiles by shifting people up the scale with a favorable calibration.
- Softer scales reflect poorer attendee performance, and traditionally that’s meant more difficult test content. But the nature of the LSAT-Flex, especially the first-ever LSAT-Flex in May, shouldn’t be overlooked! For one, it seems reasonable to conclude that many test takers, particularly those in the mid-range of 145-160 where the scale was the loosest, were penalized by seeing 50% less Logical Reasoning than prior exams. Possibly it was losing the LR content that would’ve previously provided a boost. Perhaps it was the increased emphasis on Reading Comp and Games. Maybe some combination of both. Regardless, clearly the Flex construction doesn’t play to everyone’s skillset. Secondly, the unfamiliar experience of digital testing at home could have had a negative impact on performance; after all, taking a test as difficult as the LSAT in a novel way is liable to be unsettling.
The takeaway then is that people correctly answered fewer questions than usual from scoring ranges of about 140 and up. And the most severe performance drops occurred through the middle scores of 145-160, with a less notable—closer to a single point below average—dip at 165+. Fortunately, the curve appears to have functioned exactly as intended: it softened scoring requirements sufficiently to offset any ill-effects of the test itself.
So Many Missing Scores
A quick glance at the conversion chart shows that an unusually high number of 120-180 outcomes were impossible to achieve for the May LSAT-Flex, with nine total results missing from the scale. This too deserves discussion.
First, every LSAT-Flex is likely to exhibit this tendency. With only 77 possible raw scores—from 0 correct up to 76—versus the typical 101-102 question raw scoring range, there are simply fewer scoring outcomes to convert into a scaled number from 120-180. And given how many of the raw scores produce the same scaled score (see any score above where its high and low raw numbers are different), there are naturally going to be some scaled results that get left out. So no surprises there!
The Bell Curve At Work
As for where those absent results appear, all of the impossible scores occur below 138 and above 175, or in the bottom 7% and upper 0.5% of scores. Why? There are so few people comprising those two score zones that differences of even a single question answered correctly or incorrectly are magnified when raw scores are converted. Right and wrong become increasingly meaningful when there is less competition; it’s easier to separate yourself from fewer people, in other words. And sometimes there are so few people that a single answer right or wrong separates them not by one point, but by two, leading to the score skipping in question.
This is the inverse of why we see most duplicate scores—the same score produced by different raw question counts—in the mid-range: there are so many people piled up near the center of the bell curve that a single extra question right or wrong may not sufficiently distinguish them from one another, or at least not enough to warrant awarding them different final scores.
The Other Missing Score Imbalance…
The other notable trait of impossible scores and their distribution is that there are far fewer missing scores in the 160-180 zone (2 missing: all scores possible except 176 and 178) than in the 120-140 range (7 missing: 122, 124, 126, 128, 130, 133, and 137). Two facts reconcile this apparent discrepancy. The first is that far more people are scoring in the 160-180 ranges than in the lower, 120-140 range. About 25% of test takers score at/above 160, while only about 11% are in the 120-140 range. So you have a far larger sample size to discriminate within in the top 21 scores than in the bottom, and a better ability to assign individual scorers in a meaningful way.
…Is Less Imbalanced Than It Looks
A further explanation beyond sheer volume is that each raw score from 0-15 resulted in a 120, turning the lowest 16 of the 77 raw outcomes into the same (duplicate) scaled outcome. That duplicate scaled score only happens four times at 160+ (161, 163, 165, 180). Combining duplicates (like 0 through 15 all for a 120, or 61 and 62 for a 165) and performing some quick math tells us that we’re really looking at 14 raw numbers for scores of 120-140, and 17 raw numbers for a resultant 160-180…and that three-question surplus means you can calculate more scaled scores near the top (and very nearly accounts for the 2 vs 7 missing score tallies high vs low).
How Would It Look As a Regular Scale?
Finally, a natural question is, “How would this Flex scale appear if it were drawn from a regular LSAT?” Or, “What would this Flex scale look like if applied to a past PT?” Below is this Flex scale side-by-side with its expanded, 101-question equivalent:
|Flex Low||Flex High||Score||Original Low||Original High|