Types of items
Many possible item formats are available for test construction. These include: multiple-choice, free response, performance or simulation, true/false, and Likert-type. There is no "best" format to use; the applicability depends on the purpose and content of the test. For example, a test on a complex psychomotor task would be better served by a performance or simulation item than a true/false item.
Main article: Multiple choice items
A common type of test item is a multiple-choice question, the author of the test provides several possible answers (usually four or five) from which the test subjects must choose. There is one right answer, usually represented by only one answer option, though sometimes divided into two or more, all of which subjects must identify correctly. Such a question may look like this:
The number of right angles in a square is: a) 2 b) 3 c) 4 d) 5
Test authors generally create incorrect response options, often referred to as distracters, which correspond with likely errors. For example, distracters may represent common misconceptions that occur during the developmental process. The construction of effective distracters is a key challenge that must be faced in order to construct multiple-choice items that possess strong psychometric properties. Well-designed distracters, considered in combination, can attract considerably more than 25% of the weakest students, so reducing the effects of guessing on total scores. The construction of such items may in some cases require some skill and experience on the part of the item developer.
Multiple choice distracter analysis with Item Characteristic CurveA graph depicting the functioning of a multiple-choice question is shown in Figure 1. The x-axis represents an ability continuum and the y-axis the probability of any given choice being selected by an examinee with a given level of ability. The y-axis is obviously on a scale of 0 to 1, while the x-axis represents standardized scores with a mean of 0 and standard deviation of 1, which can be based on either the items or the examinees.
The grey line maps ability to the probability of a correct response according to the Rasch model, which is a psychometric model used to analyse test data. The correct response in the example shown in Figure 1 is E. The proportion of students along the ability continuum who chose the correct response is highlighted in pink. The graph shows the proportion of students opting for other choices along the range of the ability continuum, as shown in the legend. The proportion of students at about ?1.5 on the scale (i.e., of very low ability) who responded correctly to this item is approximately 0.1, which is below the proportion expected if students were purely guessing.
An attractive feature of multiple-choice questions is that they are particularly easy to score. Machines such as the Scantron and software grading of computer-based tests can be performed automatically and instantly, which is particularly valuable for situations where there are not enough graders available to grade a large class or large-scale standardized test. Multiple-choice tests are also valuable when the test sponsor desires to have immediate score reporting available to the examinee; it is impossible to provide a score at the end of the test if the items are not actually scored until several weeks later.
This format is not, however, appropriate for assessing all types of skills and abilities. Poorly written multiple-choice questions often create an overemphasis on simple memorization and deemphasize processes and comprehension. They also leave no room for disagreement or alternate interpretation, making them particularly unsuitable for humanities such as literature and philosophy.
Free response items
Students taking a test at the University of Vienna, June 2005 Producing free response questions does not pose as much of a challenge to the test author, but evaluating the responses is a different matter. Effective scoring involves reading the answer carefully and looking for specific features, such as clarity and logic, which the item is designed to assess. Often, the best results are achieved by awarding scores according to explicit ordered categories which reflect an increasing quality of response. Doing so may involve the construction of marking criteria and support materials, such as training materials for markers and samples of work which exemplify categories of responses. Typically, these questions are scored according to a uniform grading rubric for greater consistency and reliability.
At the other end of the spectrum, scores may be awarded according to superficial qualities of the response, such as the presence of certain important terms. In this case, it is easy for test subjects to fool scorers by writing a stream of generalizations or non sequiturs that incorporate the terms that the scorers are looking for. This, along with other factors that limit their reliability and cost/measurement ratio, has caused the usefulness of this item type to be questioned.
While free-response items have disadvantages, they are able to offer more differentiating power between examinees. However, this might be offset by the length of the item, such as if a free-response item provides twice as much measurement information as a multiple-choice item, but takes as long to complete as three multiple-choice items.