Resources

A woman sat looking at her laptop in a library.

As part of our ongoing research efforts, we've developed a collection of resources designed to support further innovation and collaboration. These resources include datasets, dowloadable files and supporting materials that reflect the findings and outputs of our work.

Browse the projects listed below to access the materials.

Establishing the difficulty of test items is an essential part of the language assessment development process. However, traditional item calibration methods are often time-consuming and difficult to scale. To address this, recent research has explored natural language processing (NLP) approaches for automatically predicting item difficulty from text. This paper investigates the use of transformer models to predict the difficulty of second language (L2) English vocabulary test items that have multilingual prompts. We introduce an extended version of the British Council’s Knowledge-based Vocabulary Lists (KVL), containing 6,768 English words paired with difficulty scores and question prompts written in Spanish, German, and Mandarin Chinese.

Using this dataset for finetuning, we explore various transformer-based architectures. Our findings show that a multilingual model jointly trained on all L1 subsets of the KVL achieve the best results, with insights suggesting that the model is able to learn global patterns of cross-linguistic influence on target word difficulty. This study establishes a foundation for NLP-based item difficulty estimation using the KVL dataset, providing actionable insights for developing multilingual test items.

Lucy Skidmore, Mariano Felice, and Karen J. Dunn. 2025. Transformer Architectures for Vocabulary Test Item Difficulty Prediction. In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 160–174, Vienna, Austria. Association for Computational Linguistics.

Read the paper: Transformer architectures for vocabulary test item difficulty prediction (Adobe PDF)

View and download the dataset (ZIP file)

The KVLs are the result of a collaborative research project supported by the British Council. Members of the English Language Research team and researchers from the University of Nottingham (UK), University of Innsbruck (Austria) and Waseda University (Japan) worked together to create innovative vocabulary lists that show the 5,000 words best known by learners of English.

Over 100,00 learners from China, Germany and Spain took part in tests that measured their ability English language word knowledge to determine which words were known best. Learners were required to spell the words correctly, demonstrating a good level of mastery.

Using this data, researchers ranked the 5,000 words from easiest to most difficult for each of the three language backgrounds. These lists are useful for teachers, curriculum designers and test developers who want to know which words learner are most likely to be able to produce accurately. This gives a clearer picture of vocabulary knowledge than frequency lists that simply report on the frequency of English words.

Access the resources and find out more.

Transformer architectures for vocabulary test item difficulty predictionClick to expand. More information available.

Knowledge-based Vocabulary Lists (KVL)Click to expand. More information available.