Cambridge ESOL exams and the CEFR
- Empirical validation-internal: Item banking
Origins of the Cambridge ESOL Common Scale
The current system of Cambridge ESOL exam levels developed over the best part of a century, starting in 1913 with the most advanced level, Cambridge Proficiency, now associated with CEFR C2. The system evolved as the need for new exam levels was recognised – that is, in response to the needs of particular groups of learners. Up to the 1990s the exams were designed, developed and administered without much support from statistics. Item writers, teachers and publishers shared an understanding of the levels, rooted in their understanding of the learners, and the system worked well for practical purposes.
However, when at the beginning of the 1990s Cambridge ESOL began to address seriously the reliability and consistency of its assessments it was clear that better statistical underpinning of the levels system was needed.
The methodology which was beginning to become more widely adopted at the time was item banking, which is an application of item response theory (IRT). (Bond and Fox 2001, Wright and Stone 1979). Item banking involves assembling a bank of calibrated items – that is, items of known difficulty. Designs for collecting response data ensure a link across items at all levels. Thus a single measurement scale can be constructed. This scale relates different testing events within a single frame of reference, greatly facilitating the development and consistent application of standards (See Figure 1).
Item banking is applicable to tests which are objectively marked, so that item response data can be collected – for Cambridge ESOL exams this means the Reading, Listening and the Use of English papers. The Cambridge ESOL Common Scale – a single measurement scale covering all the Cambridge levels – was thus constructed with reference to these skills. Common Scales have also been published for writing and speaking, based on qualitative analysis of the features of these performance skills at different levels (See Hawkey & Barker 2004 for a discussion of this).
As Figure 1 shows, constructing a single measurement scale requires all the item response data to be linked in some way. Two ways of achieving this are common person linking, where a group of learners might for example take test papers at two different levels, and common item linking, where different tests contain some items in common. This is the basic approach used in pretesting, where each pretest is administered together with an anchor test of already calibrated material.
Figure 1 Item banking approach to scale construction

Further information
How the Cambridge ESOL Common Scale was constructed.
Cambridge ESOL’s Common Scale Levels and the CEFR
References
Bond, T. G. and C. M. Fox (2001). Applying the Rasch model. NJ: Lawrence Erlbaum Associates.
Hawkey, R & Barker, F. (2004) Developing a Common Scale for the Assessment of Writing, Assessing Writing 9/2
Wright, B. D. and M. H. Stone (1979). Best test design. Chicago, IL: MESA Press.

