Respected standardized tests are built upon three core pillars, one of which is reliability. We expect stability or consistency of measurement over time, so that similar raw scores will produce similar scaled scores any place or time a test is administered. Test-retest reliability has always been a bedrock attribute of the SAT. Colleges trust in a high level of exam consistency over different administrations so that scores from one test date can be easily and fairly compared to scores from other dates.
What, then, will colleges do with the scores from the June 2018 SAT?
In case you haven’t heard, the June SAT was easier than most other tests. A certain level of variation in difficulty from section to section and test to test occurs all the time on both the SAT and ACT, but seldom to this extent. The June test was so much easier across all sections that the exam was graded on a steep curve.
How steep was it? One of our students got 43 questions right on the Reading section, 32 right on Writing and Language, and 49 right on Math. On an average SAT, say the first official practice test on the College Board website, that student would have earned 650 Evidence-based Reading and Writing and 690 Math. However, the June 2018 report returned just 590 EBRW and 620 M. A 130-point swing is not normal.
The score volatility was especially pronounced at the upper end of the scale. For example, one of our students got 4 math questions wrong on the August 2017 SAT and earned a 770 M. On the June test, he only answered 1 math question wrong and still got a 770. No wonder students across the country are clamoring to #rescoreJuneSAT.
According to the College Board, everything went as planned: “On occasion there are some tests that can be easier or more difficult than usual. That is why we use a statistical process called equating. The equating process ensures fairness for all students.” However, the very fact that College Board has to answer difficult questions about this particular administration indicates that their current system is failing. In fact, the College Board seems to have forgotten how true equating works.
Every SAT and ACT incorporates different levels of difficulty across test sections, but determination of question difficulty is currently flawed. In the past, every official SAT included an equating or experimental section. This section could test any of the standard content areas but was unscored. The equating section worked because each section was the same length as the scored sections, ensuring that students did not know which was unscored. Consequently, the College Board received sufficient high-quality data to rank viable questions on five levels of difficulty and weed out the clunkers. Effective equating produced very reliable tests.
ACT, for some odd reason, traditionally eschews such stringent equating, which explains the disappointing score volatility ACT test takers have grown accustomed to. In adopting so many ACT features for the latest iteration of the SAT, the College Board has chosen a similar path. The precipitous scoring shift on the June 2018 SAT may have raised alarms, but neither test maker–despite their protests to the contrary–incorporates sufficient equating measures to avoid such calamities on future administrations. The #rescoreJuneSAT campaign is not likely to result in a recalibration of scaled scores, but students, parents, and especially colleges should put pressure on both the College Board and ACT to introduce real equating measures for stronger test-retest reliability. Until that occurs, we should expect more scoring debacles like the one in June.