Until November 2022, automated speaking assessment had evolved incrementally over decades, progressing from simple acoustic models to sophisticated deep neural networks (Cai et al., 2025). Each generation improved at measuring fundamental constructs such as pronunciation, fluency, vocabulary and grammar, yet all operated within the same paradigm: analysing learner speech against fixed criteria in a static manner through primarily monologic tasks.

The release of ChatGPT marked a potential watershed moment. Unlike previous incremental improvements, generative AI introduced capabilities that simply did not exist before: dynamic conversation, spontaneous task creation, and real-time adaptation to individual learners. This shift extends beyond technology. Generative AI is already transforming how learners communicate daily, with learners routinely engaging these tools for diverse purposes including inquiry, practice, and academic tasks. Assessment that ignores these new realities risks measuring outdated constructs whilst missing the competencies learners actually need.

This transformation demands we expand our construct definitions beyond traditional competencies to include AI-mediated communication skills (Xi, 2025). We must reconceptualise validity and authenticity to accommodate personalised, interactive assessment rather than standardised measures (Goh & Aryadoust, 2025). The question is not whether to adapt, but how to ensure this evolution enhances rather than undermines fair assessment.

Navigating Evolution and Revolution

Current research reveals that not all changes are revolutionary. Vocabulary and grammar assessment, for instance, benefit from generative AI's computational power whilst maintaining established construct definitions, demonstrating evolutionary refinement that improves efficiency without altering fundamental measurement approaches.

Revolutionary change emerges where generative AI enables previously impossible capabilities. Interactive competence, once requiring human assessors, can now be evaluated through AI-mediated dialogues. The speaking construct itself expands to include AI interaction skills and the capacity to leverage AI feedback effectively.

Through analysis of previous research and British Council's initiatives, we have identified eight critical decision points in the AI-based speaking assessment development process where professionals may encounter evolutionary adaptation and revolutionary reconceptualisation of the construct(s). These span construct definition, task design, scoring mechanisms, and deployment strategies. Each choice related to technological integration directly impacts what gets measured and how validity is maintained, calling for human-centred decision making (British Council, 2025).

Fluency: A Case Study in Complexity

Oral fluency illustrates these challenges concretely. Fluency (i.e., the general ease, flow and continuity of speech), considered a key aspect of speaking assessment and a predictor of comprehensibility and proficiency, is a construct that has historically been assessed in most international tests of English. While traditionally recognised for its pivotal role in demonstrating communicative competence and reliability when assessing L2 speaking ability, fluency reveals unexpected complexity when AI enters the equation. Learners display distinct patterns in AI-mediated conversations - speaking more freely with reduced anxiety yet pausing differently when processing AI responses versus human feedback. A change in turn-taking behaviour, a different approach to punctuating talk with pauses, and a different set of needs for repairing utterances when interacting with AI are some of the anticipated changes in communication that invite a reconsideration of oral fluency and a careful re-examination of this traditionally established and reliably measured construct in the emerging generative AI contexts. While defining and operationalising the new construct is of prime significance, the main challenge at this transitional moment in the history of L2 speaking assessment is ensuring generative AI captures genuine communicative competence rather than AI-specific test-taking skills.

Join the Conversation

Our upcoming webinar provides frameworks and practical tools for navigating this transformation:

  • Evidence-based analysis of current assessment trends.
  • Eight-layer framework for identifying evolution versus revolution.
  • Hands-on AI tool demonstrations and prompt engineering.
  • Validity protocols for AI-enhanced assessment.
  • Fluency assessment exercises demonstrating construct complexity.
  • Ready-to-use templates for immediate implementation.

No prior AI experience required - only curiosity about how generative AI is shaping speaking assessment.

References:

British Council. (2025). Human-centred AI: Lessons for English learning and assessment. https://www.britishcouncil.org/sites/default/files/human-centred_ai_lessons_for_english_learning_and_assessment.pdf

Cai, D., Naismith, B., Kostromitina, M., Teng, Z., Yancey, K. P., & LaFlair, G. T. (2025). Developing an automatic pronunciation scorer: Aligning speech evaluation models and applied linguistics constructs. Language Learning. https://doi.org/10.1111/lang.70000

Goh, C. C. M., & Aryadoust, V. (2025). Developing and assessing second language listening and speaking: Does AI make it better? Annual Review of Applied Linguistics, 1–21. https://doi.org/10.1017/S0267190525100111

Xi, X. (2025). Revisiting communicative competence in the age of AI: Expanding the construct for AI-mediated communication. Annual Review of Applied Linguistics. https://doi.org/10.1017/S0267190525000078 

Register

Join us on Friday 26 September at 09.00 BST (UK time, UTC +1) or Monday 29 September at 15.00 BST (UK time, UTC +1)

About the Presenters

Sha Liu, British Council

Sha Liu is a Test Development Researcher at the British Council and Co-Convenor of the EALTA AI for Language Assessment Special Interest Group. She holds a PhD in Language Assessment from the University of Bristol, and her research focuses on AI-powered automated speaking and writing evaluation, with particular emphasis on leveraging generative AI to enhance personalized, diagnostic feedback generation and eye-tracking methodologies for understanding learner engagement. Her work has been recognized with multiple awards, and she serves on editorial boards for Language Assessment Quarterly, Assessing Writing, and Research Synthesis in Applied Linguistics and Artificial Intelligence in Language Education.

Parvaneh Tavakoli, University of Reading

Parvaneh Tavakoli is Professor of Applied Linguistics at the University of Reading. Parvaneh's main research interest lies in the interface of second language acquisition and language assessment. She is specifically interested in performance across levels of proficiency and in different task designs. Parvaneh has led several international research projects investigating second language performance, acquisition, assessment, and policy in different contexts. She has disseminated her research in the form of articles in prestigious journals (e.g., The Modern Language Journal, SSLA and Language Learning), policy reports (e.g., Report to Welsh Government), and books by key publishers (e.g., Cambridge University Press and Toronto University Press).