This study is the second in the British Council Validation Series to focus on investigating test comparability. These studies aim to contribute to the theoretical framework for carrying out such comparability studies by using the socio-cognitive model for language test validation to design a multimethod data collection and analysis approach. In particular, an important part of both studies has been the use of detailed content analysis approaches to build a construct definition to inform interpretation of other sources of data, particularly quantitative data analysis evidence. This research focus is also informed by, and contributes to, the importance of localisation within the theoretical framework of test development and validation which has been at the centre of the Aptis test development approach from the beginning (see O’Sullivan, 2015a, O’Sullivan and Dunlea, 2015).
This particular study, as with Wu et al. (2016), reports on a comparability study of two EFL proficiency tests which use an international proficiency framework, the Common European Framework of Reference for Languages (CEFR), as an important source of feedback for test takers. The two tests in this study are VSTEP, a pen-and-paper test in Vietnam targeting CEFR levels B1 to C1, and APTIS, an international computer-based test targeting CEFR levels A1 to B2. The VSTEP is recognised by universities in Vietnam as certification of English proficiency for the purpose of meeting graduation requirements stipulated by the Ministry of Education. Aptis is used for a range of purposes in international settings, including by higher education institutions in EFL contexts.
This study reports on both a smaller scale pilot phase at one university to trial the methodology before describing the main phase in which over 400 test takers at three universities in different regions took both tests. The socio-cognitive model for language test development and validation was used to design a multi-method approach. Statistical analysis of test scores includes factor analysis, as well as a concurrent Rasch analysis to place test items from both tests on a common scale. In addition, a comprehensive pro forma, utilising categories drawn from the socio-cognitive model and the growing body of CEFR alignment studies, was developed to evaluate the constructs targeted by both tests. Questionnaire data from test takers also offers interesting insights into the attitudes of the university students regarding such aspects as differences in delivery mode for the productive skills.