Towards building an automated scoring model for assessing appropriate use of metadiscourse markers in Aptis Writing across CEFR levels: A multidisciplinary study

Use of Metadiscourse Markers by L2 writers across proficiency levels

How can we better assess discourse competence in second language writing? A recent study by Sathena Chan and colleagues explores this question by examining how English learners use metadiscourse markers (MDMs) (linguistic features that help structure and connect ideas in writing) across different proficiency levels. The research is part of the Assessment Research Grant Series and offers insights into how automated scoring systems might better reflect learners’ discourse competence.

What are Metadiscourse Markers and why do they matter?

MDMs include features like hedges (might, perhaps), transition markers (however, in addition), and endophoric markers (as mentioned above). These language features are crucial for producing coherent and cohesive texts, yet they are often underrepresented in writing assessments. In human-rated tests, discourse competence is typically evaluated through basic cohesive devices, while machine-rated systems rely on models trained on those same human ratings meaning MDMs are rarely considered in depth.

This study set out to explore whether MDMs could offer a more nuanced way to assess writing ability, especially in automated scoring systems.

What the research looked at and how the study was conducted

Using a corpus of 2,003 Aptis Writing test responses (Part 4: informal and formal emails), researchers manually identified and coded MDMs for accuracy. They then analysed:

1. whether MDM use varies across CEFR proficiency levels (A1 to C)

2. whether machine learning models can reliably identify MDMs

3. whether those models can assess the accuracy of MDM use.

Key findings

MDMs are used even at beginner levels (A1 and A2), challenging assumptions that discourse competence emerges only at higher proficiency.
Advanced learners used a wider range of MDMs, showing awareness of task type (formal vs informal) and adjusting their language accordingly.
Accuracy of MDM use was generally high across all levels, especially in formal writing tasks where MDMs tend to be more formulaic.
Machine learning models were able to detect MDMs, but struggled to assess their accuracy, largely because most MDMs were used correctly, leaving few examples of incorrect usage to learn from.

Implications for assessment

The findings suggest that automated scoring systems could benefit from incorporating MDM analysis, focusing on the presence and type of markers rather than their accuracy. For example, advanced writers might be expected to use hedges or nuanced transitions, while beginners may rely on simpler markers, even if both groups use them correctly.

Moreover, genre matters: different writing tasks require different MDMs. Scoring systems should be sensitive to these variations to better reflect learners’ discourse competence — a key but often overlooked component of writing ability.

To read the full study, see: Chan, S., Sathyamurthy, M., Inoue, C., Bax, M., Jones, J., & Oyekan, J. (2024). Integrating Metadiscourse Analysis with Transformer-Based Models for Enhancing Construct Representation and Discourse Competence Assessment in L2 Writing: A Systemic Multidisciplinary Approach. Journal of Measurement and Evaluation in Education and Psychology, 15, 318–347. https://doi.org/10.21031/epod.1531269

Read the full study

British Council

Use of Metadiscourse Markers by L2 writers across proficiency levels

What are Metadiscourse Markers and why do they matter?

What the research looked at and how the study was conducted

Key findings

Implications for assessment

British Council Worldwide

Use of Metadiscourse Markers by L2 writers across proficiency levels

What are Metadiscourse Markers and why do they matter?

What the research looked at and how the study was conducted

Key findings

Implications for assessment

Subscribe to our research and insight newsletter