Europe shines in its diversity, which is expressed in part in its multilingualism. According to the European Constitution, all 24 official European languages are equal. Unfortunately, in the digital age, that is not entirely the case, as there are notable discrepancies in the field of Language Technology (LT). Back in 2012, the META-NET White Paper series Europe’s Languages in the Digital Age showed that languages with more speakers had better support through Language Technology. For example, Spanish had fairly strong LT support, though not quite on the same level as English. Among the lesser-spoken languages, Estonian was slightly better equipped, though especially Machine Translation showed some gaps.
Differences like these pose a challenge to preserving and nurturing Europe’s multilingualism. Considering the current LT landscape, every language has its own gaps and its own needs for the future. Therefore, it is necessary to address and support each language individually.
To that purpose, the EU-funded project European Language Equality (ELE) is re-examining the LT support of the 31 languages covered in the META-NET White Papers ten years ago, alongside previously unevaluated ones. In total, ELE’s efforts span the 24 official and 32 additional EU-languages as well as 33 endangered minority languages. Over the course of the project, the results of this research will create the basis of a strategic agenda and roadmap towards ELE’s main goal: Digital Language Equality (DLE) by 2030.
DLE can come across as a vague term, so a specific definition is imperative in order to know what we are working towards. Our preliminary definition describes DLE as all relevant languages having the necessary support to “continue to exist and prosper as living languages in the digital age”.
This necessary support involves two categories of factors, though they are not without overlap. First, there are technological factors. Some examples are tools and services (e.g., grammar checkers), corpora (e.g., audio transcripts) and projects or organizations active in the LT community. The second category involves contextual factors, which are essentially the political and social but also economic situation in the region where a language is spoken.
In order for this definition to be useful when examining the current LT support of a language, these factors need to be accurately quantifiable. So far, no such score exists, which is why ELE is creating the “DLE metric”. As of now, the metric consists of a comprehensive list of the aforementioned factors that make up a language’s LT support. Aspects like scoring and weighting (including the introduction of potential penalties) the individual factors will be worked out over the course of the project.
Once complete, the metric will enable the direct comparison of the technology support of our languages, allowing for the identification of current problem areas as well as future priorities due to the empirical data the metric is based upon. Additionally, the metric will enable us to track the development of the LT landscape for each individual language over time, creating a long-term overview.
The ability to measure the level of LT support in a way that is precise and consistent across languages will form an important step towards our primary goal – establishing Digital Language Equality by 2030.