Cambridge ESOL exams and the CEFR
How the Cambridge ESOL common scale was constructed
There are two chief ways in which the Common Scale was constructed.
The ‘monitoring of exam difficulty’ project (Anchor Tests)
This was the long-standing project (1993-2004) through which the Common Scale was originally developed. It involved the construction of a set of short tests. Test centres who agreed to participate were asked to administer these tests to candidates on the day of the examination or in the two weeks preceding the live examination. The tests were allocated to more than one exam level, so that the returned response data provided a common-person link between live exam items at different levels.
Pretesting
Pretesting refers to trialling newly-written test items on learners who are preparing for a particular exam. The response data, which is collected, enables the difficulty of the items to be estimated (a process known as calibration). Pretesting was introduced for all Cambridge ESOL exams in the early 1990s. Anchor tests, which are used with pretest, were developed and continually replaced over time. The existing linked design for pretest anchors enables cross-level anchoring to take place earlier in the test construction process.
Further information on pretesting.
Two other sources of information have also proved useful for verifying the Common Scale.
Candidates taking two exams
In every session a number of candidates take exams at two levels (particularly FCE, CAE or CPE). Such candidates are routinely identified and their performance in the two exams is studied. Although by their nature unrepresentative, such cases provide a valuable additional common-person method for linking levels.
Computer-adaptive tests
Cambridge ESOL has developed several computer-adaptive tests (CATs) in which items have been drawn from paper-based exams, or both test formats share the same item bank (as with BULATS). Because items are selected in real time in a CAT test, many combinations of items occur, linking all the items in the bank into a single response matrix that can be re-calibrated. Although there are issues with comparing data from linear test with data from adaptive tests, this is another useful source of information for verifying the common scale.

