BEA 2026 Shared Task
Vocabulary Difficulty Prediction for English Learners
Motivation
Vocabulary is a crucial aspect of language knowledge, shaping what learners can understand and produce. Establishing the difficulty of vocabulary is therefore essential for creating level-appropriate content and developing valid, reliable assessment instruments. However, determining word difficulty still relies on labour-intensive processes involving expert judgment and costly pretesting, which limits scalability and slows innovation. As learning and assessment increasingly rely on digital platforms, the need for more efficient and scalable solutions is more pressing than ever.
While previous shared tasks have explored related problems such as Complex Word Identification (Paetzold and Specia, 2016; Yimam et al., 2018), Lexical Complexity Prediction (Shardlow et al., 2021) and Lexical Simplification (Shardlow et al., 2024), they were not designed with English language learners in mind and did not explore the influence of the learner’s L1 on L2 vocabulary difficulty. What is more, BEA has not hosted a language learning challenge since the Grammatical Error Correction shared task in 2019, leaving a significant gap at a time when advances in AI have transformed what is possible in educational NLP.
As a result, we believe the time is right for an L1-Aware Vocabulary Difficulty Prediction shared task, and BEA 2026 is the ideal venue to host it. This task would not only establish a common benchmark for researchers but also serve as a critical testbed to evaluate to what extent state-of-the-art NLP models perform on a problem that has traditionally required psychometric calibration methods. The findings from this shared task will play a crucial role in the development of AI-powered solutions for item writing, content generation, adaptive testing, and personalised vocabulary learning, laying the foundation for the next generation of language learning and assessment systems.
Task description
The BEA 2026 shared task aims to advance research into vocabulary difficulty prediction for learners of English with diverse L1 backgrounds, an essential step towards custom content creation, computer-adaptive testing and personalised learning. In a context where traditional item calibration methods have become a bottleneck for the implementation of digital learning and assessment systems, we believe predictive NLP models can provide a more scalable, cost-effective solution.
The goal of this shared task is to build regression models to predict the difficulty of English words given a learner's L1. We believe this new shared task provides a novel approach to vocabulary modelling, offering a multidimensional perspective that has not been explored in previous work. To this aim, we will use the British Council's Knowledge-based Vocabulary Lists (KVL), a multilingual dataset with psychometrically calibrated difficulty scores. We believe this unique dataset is not only an invaluable contribution to the NLP community but also a powerful resource that will enable in-depth investigations into how linguistic features, L1 background and contextual cues influence vocabulary difficulty.