Developing an agenda and a roadmap
for achieving full digital language
equality in Europe by 2030

“A matter of life and death”: Inaki Irazabalbeitia on language equality and assuring a future for the Basque Language

Inaki Irazabalbeitia is an important figure in the socio-political world of the Basque language. This includes the field of language technology, political parties and foundations as well as the Basque Language Academy. Mr. Irazabalbeitia was a member of the European parliament from 2013-2014 and continues to advocate for equality of the Basque language, for instance as mayor of the Basque village of Alkiza.

Portrait of Inaki Irazabalbeitia

Can you please tell us a bit about yourself: What projects are you currently working on with regards to the Basque language?

I was born in Donostia-San Sebastian in 1957 into a bilingual family. I studied chemistry at the University of the Basque Country and got a Ph.D. in 1986. I spent the majority of my professional career working for the normalization of the Basque language at the Elhuyar Foundation where I was CEO from 1995 to 2003 and general manager of Eleka Ingeniaritza Lingusitikoa, the language engineering branch of Elhuyar (2006-2011).

In 2012 I entered into politics full time. I’ve been a Member of the European Parliament (2013-2014). Currently I’m retired, but I’m still in politics as mayor of my adoptive village, Alkiza. Furthermore, I continue to be attached to the Ezkerraberri foundation, a socio-political Basque organization. I am still actively involved in LT related initiatives. I helped out in the drafting of MEP Jill Evan’s report on language equality in the digital age. For the last four years, I’ve been advising the department of Language Policy of the Basque Government on IT and LT policies.

What is your involvement with the European Language Equality project?

I helped Georg Rehm and Olga Perez (advisor for the Greens/EFA parliamentary group on topics related to education, culture and media) in promoting the idea of the need for an agenda and roadmap for achieving full digital language equality by means of a pilot project of the EP and in the preliminary steps of the definition of the project.

From your perspective, how has the situation of the Basque language developed over the last years and how do you view these developments?

I would highlight different aspects. First, the Basque LT community has done a great R&D investment in order to improve and develop tools and resources. New machine translation tools, based on AI and neural nets, have dramatically improved the quality of the output of commercial MT systems. This is probably the most remarkable achievement from the point of view of language professionals and the general public. Second, the creation in 2010 of Langune, the Basque Association of Language Industries, represented a step forward in the cohesion, visibility and impact of the sector.

Third, the position of the Basque Government towards language technologies has changed positively in the last 5 years. Although LTs were one of the pillars of the Science, Technology and Innovation Plan in 2001-2004, they lost relevance in the following plans. For instance in 2010’s plan, all references to LT disappeared and the language industry was mentioned only a couple of times. The government fell for the charm of big foreign LT actors and considered the local LT community subsidiary. The persons currently in charge of Language Policy in the Basque Government positively believe that a strong local LT community is one the keys to assure a future for the Basque Language in the digital world.

In your opinion, how well is the Basque language represented in the digital world?

I would say it is similar to that of the non-hegemonic languages. In STOA’s report Language equality in the digital age – Towards a Human Language Project (2017), Basque appeared in the group of languages with fragmentary support together with many of the official EU languages such as Danish, Greek or Polish. Clearly, English and other hegemonic languages such as French, German or Spanish are better represented. In the case of Basque, that is simultaneously an opportunity and a threat. A threat because it is enclosed by two of the biggest languages of the world, French and Spanish, and almost all Basque speakers are bilingual – be it Basque-Spanish or Basque-French. But, at the same time, that situation is an opportunity, because the Basque LT community has the linguistic knowledge to work in French and Spanish as well. In other words, it opens the door to a bigger market.

Some areas such as MT have developed enormously, but there is still a long way ahead to reach a fair representation of Basque in the digital world. Political support and investments in R&T and education as well as the participation of our LT community in European networks and projects like the European Language Equality project, are crucial.

What does Digital Language Equality mean to you and what could it mean for the Basque language?

I think that in the European context we can say that there is digital language equality once all languages, regardless of the number of speakers, can offer similar levels of tools and resources to their speakers. In the case of Basque, French and Spanish are our mirrors.

What do you consider the most important requirement for language to become equal in a digitized society?

Back to my previous answer, all languages should have the possibility of offering resources and tools similar to those offered by hegemonic languages to their communities.

In 2020, the Ezkerraberri Foundation published a book about the lack of representation of non-hegemonic languages in audiovisual media. If you had to sum up the key message of the book, what would it be?

In the case of regional languages like Basque or Catalan, the access to media platforms is a matter of life and death for the survival of the language, since the majority of speakers are bilingual. In the case of Basque in Spain, if Basque speakers do not have access to or cannot enjoy audiovisual content in Basque but they can do so in Spanish, the smaller language, Basque, suffers tremendously due to the disglossic situation it falls into. We know where that leads the weakest language … That’s why states should set up legislation to secure the presence of non-hegemonic languages in the audiovisual offer. Unfortunately, you can’t take it for granted even in those so-called plurilingual states.

Currently the Spanish government is preparing a new audiovisual law to transpose EU Directive 2018/1808, amending the Audiovisual Media Services Directive. The document that the government is to send to Parliament for approval strongly defends the presence of Spanish in those services, but it doesn’t do the same for Basque, Catalan or Galician.

Do you think language technology could contribute to the support of the Basque language and others like it in the field of audiovisual media?

For sure! Let me give you an example. The production of subtitles is cheaper and faster if you could use voice-to-text technologies, followed by MT. I have no doubt that non-hegemonic languages need LT to ensure a sustainable audiovisual media offer and production.

Do you have a favourite digital Basque language tool or application?

I’m in love with the ELIA machine translation tool. It makes translation easier!

Is there one you would like to see for the Basque language that doesn’t exist yet?

Yes. A voice recognition tool which could be able to identify and properly transcript the different accents, dialects and speeches of Basque.

How does the European Language Grid strengthen linguistic diversity?

Happy faces and the ELG logo

Europe consists of more than 40 different countries and even more cultures. Everyone brings something unique to the table, languages being one of the more obvious aspects. Although it is possible to encounter five different languages within a fifteen minute train ride, this diversity is less represented when it comes to the digital world and especially language technology. As was shown in the META-NET White Paper Series in 2012, tools like machine translation, text-to-speech applications and text summarisation work predominantly in English, with languages like German, French and Spanish following closely behind. Languages with weaker support include Icelandic, Latvian, Welsh and Irish.

In order to preserve and strengthen Europe’s unique linguistic diversity, languages that are less widespread need to be equally supported and represented. Welsh serves as a fitting example here: although the overall use of the language was declining, the last few decades have been marked by revitalisation efforts – governmental, scientific and social – that work towards bilinguality being more common in Wales. One of the key aspects of this is strengthening bilingual communication and representation online.

For many, English is the go-to language of the internet. Not only is it used in communication; a lot of websites also default to English even though versions in other languages are available. Looking at the big picture, this risks smaller languages falling by the wayside. On an individual level, there is another reason for this to be an issue: not everyone speaks English, and for some of those that do, it can be a chore to get through a paragraph they would much more comfortably read in their own language. Once again regarding Welsh, there is a tool that provides a start in overcoming this issue: The Welshify Widget. The plugin lets users know when a Welsh version of a website is available and guides them through the process of changing their browser settings to set Welsh as their preferred language.

By highlighting Welsh versions of websites, the widget fosters an online environment that is more inclusive towards Welsh native speakers. There are a variety of digital language tools that have similar effects for a wide range of European languages, by making smaller languages available in the digital world and supporting their usage. Each one of them contributes towards strengthening linguistic diversity and equality among European languages.

In an effort to reach those goals, it is necessary to know where each European language has gaps in digital support. The European Language Equality (ELE) project examines 70+ European languages individually, analysing where sufficient support exists and where more is needed. The results of this research will be presented in a strategic agenda and roadmap, detailing what needs to be done to reach digital language equality by 2030.

In order to make that equality a reality, language resources need to reach their intended user base. Potential consumers need to know what is available. The European Language Grid (ELG) aims to facilitate this, among other things. The ELG is a platform that hosts European language technologies with the goal of becoming their primary hub. Companies and research facilities can upload and link their projects on ELG. Having one centralised hub like the ELG will enable developers to get the word out about their products, while users have an easier time finding and downloading the type of tool they want.

ELG also allows developers to test their tools or services, which in turn makes them easier and faster to finalize. This is also aided by the communication that is made possible through the ELG. Language technology developers are able to learn from and collaborate with each other, which, among other things, opens the door to potential translations of existing tools into other European languages. Faster development of tools and communication within the language technology community will quickly create more available technologies and resources. The heightened number and visibility of these resources will not only boost individual languages – in doing so, the linguistic diversity that already exists in Europe will be strengthened as well.

Tools like the Welshify Widget make the online experience more inclusive for non-English speakers and help revitalize the language of a European culture. The ELG as the main hub for European language technology aims to provide the platform for projects like these to reach their full potential and work towards digital language equality.

How do ELE and ELG work together towards Digital Language Equality?

© Adobe Stock/bernardbodo

Europe’s diversity in terms of culture and communication sets it apart from other major players in the global field of Language Technology (LT) that usually concentrate on single languages. The number of European languages provides a unique opportunity to work together and to learn from each other in the process of developing digital language tools. In order to access this potential, it is crucial that every official, unofficial and minority language is equally represented in the digital world and LT landscape. This is one of the reasons why Digital Language Equality (DLE, further described here) is an important goal that needs to be actively worked towards. One of the main aspects of this work is handling the fragmentation of the European LT landscape that is still prevalent. The ELG project addresses this issue by building a platform that aims to host all European LT resources – the European Language Grid (ELG). Having one unified hub will support the LT community greatly. Developers will have an easier time getting the word out about a product, while consumers are more likely to hear of and be able to use it. Furthermore, a centralised platform will give LT creators a broader reach and encourage collaboration, communication and learning from one another.

While ELG is bringing the LT landscape together, it is also necessary to combat the existing language inequality actively and directly. This is where ELE comes in: During the runtime of the project, 70+ European languages are being researched and analysed to find out where exactly the inequalities lie. This effort spans Europe’s official, and many unofficial and minority languages. By the end of the project, a strategic agenda and roadmap detailing the best approach to the existing discrepancies will be presented. This research will lay important groundwork for a long-term funding program that will be able to provide support based on the discovered disparities.

ELE and ELG are working in tandem to combat the digital inequality found among European languages. ELG is establishing a platform and marketplace to bring the LT community together. Meanwhile, ELE is dedicated to understanding which aspects need to be focused on to reach DLE by 2030 as well as establishing a funding program as a tool to ease the way.

The goal of DLE is only possible with the ELE research as a blueprint and the ELG platform as the facilitator of that vision. These combined efforts aim to create an environment where the barriers that currently fragment the European LT landscape fall and languages are able to flourish alongside and in interaction with each other.

What is Digital Language Equality?

Europe shines in its diversity, which is expressed in part in its multilingualism. According to the European Constitution, all 24 official European languages are equal. Unfortunately, in the digital age, that is not entirely the case, as there are notable discrepancies in the field of Language Technology (LT). Back in 2012, the META-NET White Paper series Europe’s Languages in the Digital Age showed that languages with more speakers had better support through Language Technology. For example, Spanish had fairly strong LT support, though not quite on the same level as English. Among the lesser-spoken languages, Estonian was slightly better equipped, though especially Machine Translation showed some gaps.

Differences like these pose a challenge to preserving and nurturing Europe’s multilingualism. Considering the current LT landscape, every language has its own gaps and its own needs for the future. Therefore, it is necessary to address and support each language individually.

To that purpose, the EU-funded project European Language Equality (ELE) is re-examining the LT support of the 31 languages covered in the META-NET White Papers ten years ago, alongside previously unevaluated ones. In total, ELE’s efforts span the 24 official and 32 additional EU-languages as well as 33 endangered minority languages. Over the course of the project, the results of this research will create the basis of a strategic agenda and roadmap towards ELE’s main goal: Digital Language Equality (DLE) by 2030.

DLE can come across as a vague term, so a specific definition is imperative in order to know what we are working towards. Our preliminary definition describes DLE as all relevant languages having the necessary support to “continue to exist and prosper as living languages in the digital age”.

This necessary support involves two categories of factors, though they are not without overlap. First, there are technological factors. Some examples are tools and services (e.g., grammar checkers), corpora (e.g., audio transcripts) and projects or organizations active in the LT community. The second category involves contextual factors, which are essentially the political and social but also economic situation in the region where a language is spoken.

In order for this definition to be useful when examining the current LT support of a language, these factors need to be accurately quantifiable. So far, no such score exists, which is why ELE is creating the “DLE metric”. As of now, the metric consists of a comprehensive list of the aforementioned factors that make up a language’s LT support. Aspects like scoring and weighting (including the introduction of potential penalties) the individual factors will be worked out over the course of the project.

Once complete, the metric will enable the direct comparison of the technology support of our languages, allowing for the identification of current problem areas as well as future priorities due to the empirical data the metric is based upon. Additionally, the metric will enable us to track the development of the LT landscape for each individual language over time, creating a long-term overview.

The ability to measure the level of LT support in a way that is precise and consistent across languages will form an important step towards our primary goal – establishing Digital Language Equality by 2030.