How is a test like a duck? They may appear unruffled and serene on the surface, but underneath they are usually paddling frantically. Great tests, meaning those designed to be valid, fair, and reliable, require tremendous effort and insight to put together. Casual observers may see a random assemblage of items, but those in the know can glimpse the many hands–not to mention reams of data and decades of experience–that go into crafting entire tests, specific sections, and even individual questions for standardized exams like the SAT and ACT.
Basically, assessment design goes very, very deep. If you are going to spend weeks, months, or perhaps even years of your life analyzing test questions, some insight into testing terminology can’t hurt.
A test question–both the problem itself and any answer choices–is called an item. Tests can feature all sorts of objective and subjective items. The standardized tests used for admissions predominantly employ multiple choice items.
The problem posed by a test question is called the stem. An effective stem should present a definite problem in the form of a question or a partial sentence.
In a multiple choice test item, the stem is followed by a list of answer choices, known as alternatives or options. Some testing taxonomies also refer to alternatives as the foil, though that term can also be applied strictly to wrong answers. Ideally, alternatives will be presented clearly and concisely.
The right answer choice is called the key. Sometimes, the key will be the best possible answers. In instances where facts, formulas, or equations are tested, the key will be the only correct answer.
Every choice apart from the key is considered a distractor or, sometimes, foil. These incorrect or inferior alternatives should all be plausible to some degree but not so similar to the right response that a reasonable case can be made for multiple choices.
Tests like the SAT and ACT follow many of the fundamentals of effective item design, except in those instances where deviations from best practices add complexity to the testing process. For example, one guideline suggests that a stem should be negatively stated only when significant learning outcomes require it. The SAT and ACT are not learning tests but rather assessments. These exams often include phrased negatively stems in order to increase difficulty or add extra time to the solution process. Even the inclusion of four or even five alternatives when research indicates that three choices will often do illustrates how norm-referenced assessment design might deviate from that of criterion-referenced assessments.
A brilliant test item is a thing of complex beauty, even if most test takers lack either the time or inclination to appreciate its intricate design. Don’t let the tiny print and thin paper fool you. First, paper-and-pencil tests may not be around much longer anyway, if remote proctoring manages to earn the confidence of admissions offices. Second and more important, though, well-crafted test items are labors of love (ask a true test architect) floated by lots and lots of research and money. You may not like a test, but when you recognize a truly well-designed one, you can’t help but respect it.