*|MC:SUBJECT|*

Newsletter #23 – January 2023

Dear reader,

We wish you a happy, healthy and successful 2023!

This first newsletter edition of the year is starting off with an announcement regarding the launch of the Common European Language Data Space (LDS), a European platform and marketplace for the collection, creation, sharing and re-use of multilingual and multimodal language data.

We‘re also taking a look at the results of our ELE Open Call for SRIA Contribution Projects as well as introducing one of the projects, a Basic Language Resource Kit (BLARK) for the Galician language.

Our featured ELG tool of the month is Finto AI, an indexing tool suggesting subjects for any given text, available in Finnish, Swedish and English.

In our section ‘From the SRIA’, we‘re introducing the Implementation Recommendations for the programme.

Lastly, we’d like to draw your attention to the ELE SRIA endorsement form and encourage anyone interested to take a look and show their support.

With best regards

Georg Rehm

Common European Language Data Space (LDS) project to be launched in January 2023

On 19 January 2023, the service contract between the European Commission and a consortium of four partners on the creation and implementation of a Common European Language Data Space (LDS), coordinated by the DFKI, will come into force. The goal of this project is to establish a European platform and marketplace for the collection, creation, sharing and re-use of multilingual and multimodal language data.

With the LDS, participants will be able to share and also monetise their language data and other language resources, for example, language models, through a single platform on which they can exchange language data, fully taking EU values and compliance with EU rules into account. As a result, the LDS will significantly increase the much-needed availability of quality data to support the creation and deployment of large language models (LLMs) and other AI-based language technology services for a range of businesses. Existing repositories such as ELRC-SHARE and European Language Grid (ELG) will be interconnected with the LDS.

The Language Data Space will be part of a connected and competitive European data economy, supporting the monetisation and re-use of language resources within the European Data Spaces Ecosystem. The targeted stakeholder groups are industry, including language technology and media industry; as well as research, public administrations, cultural associations, NGOs and the European citizen.

The project will run for three years with an optional extension of an additional year.
More details about the project – including the project website and social media channels – will be made available soon.

Consortium: German Research Center for Artificial intelligence (DFKI), Evaluations and Language Resources Distribution Agency (ELDA), Athena Research and Innovation Center in Information, Communication and Knowledge Technologies (ILSP), SIA Tilde. Multiple subcontractors in essentially all European countries will be involved, reinforcing structures established in previous EU projects.

Contact: Georg Rehm <georg.rehm@dfki.de>

Language Technology and NLP in the news

“Artificial Intelligence Act: Council calls for promoting safe AI that respects fundamental rights” – Council of the European Union, 6 December, 2022
“The College Essay Is Dead” – The Atlantic, 6 December 2022
“ChatGPT, Galactica, and the Progress Trap“ – Wired, 9 December 2022
“DeepMind created an AI tool that can help generate rough film and stage scripts“ – Engadget, 9 December 2022
“Large language model expands natural language understanding, moves beyond English“ – VentureBeat, 12 December 2022
“Meet Ghostwriter, a haunted AI-powered typewriter that talks to you“ – Ars Technica, 14 December 2022
“Exclusive: ChatGPT owner OpenAI projects $1 billion in revenue by 2024” – Reuters, 15 December 2022
“How to spot AI-generated text” – MIT Technology Review, 19 December 2022
“How AI-generated text is poisoning the internet” – MIT Technology Review, 20 December 2022
“Is DeepMind Deliberately Avoiding ‘Shallow’ Generative AI?” – Analytics India Mag, 21 December 2022
“What we learned about AI and deep learning in 2022” – VentureBeat, 27 December 2022
“Europe Is Lagging Behind In Developing Large AI Models” – Forbes, 28 December 2022
“2023 could be the year for large language models” – VentureBeat, 2 January 2023
“Deepfake Text Detector Tool GPTZero Spots AI Writing” – Voicebot.ai, 4 January 2023
“Pressure on Google as Microsoft plans to add ChatGPT to Bing” – VentureBeat, 4 January 2023
“Microsoft Unveils VALL-E, A Voice DALL-E” – Analytics India Mag, 6 January 2023

Social media highlights

If you don’t know how to have your AI technology generate revenue, perhaps an idea would be to ask it to figure out a way for you.
Some creative liberties were taken in this comparison of GPT parameters and brain synapses. Nonetheless, an interesting thought experiment.

Selected new tools and resources on the
European Language Grid

Finto AI: Subject Indexing FI is a tool that suggests subjects for a given text in Finnish. It is also available in Swedish and English. The docker container built from its repository acts as a proxy container which routes ELG-compatible requests to finto.ai open API and returns ELG compatible responses.

The Finto API responds with subject suggestions for a given text. It is based on Annif developed mainly at the National Library of Finland. Annif is an automated subject indexing toolkit with Apache license 2.0. This ELG proxy is published under the same license.

General news

The Open Call for SRIA Contribution Projects has ended on 29 November 2022 with a total number of 36 project proposals by 24 different applicants from 15 countries. All proposals have been evaluated by two independent experts from the ELE FSTP Project Board. The Project Board selected nine projects with a total grant amount of EUR 185,000. The selected projects will receive funding amounting to 90% of the eligible costs, i.e., EUR 166,500 in total.

Congratulations to all successful applicants and many thanks to all who have submitted a proposal!

Over the next few months, the projects chosen for funding are going to be introduced in this section. This month, we’re starting with the project “A BLARK for minority languages in the era of deep learning: expertise from academia and industry“ by imaxin|software. Its goal is to create the first Basic Language Resource Kit for Galician in the Deep Learning era, as well as having it serve as a reference for other European minority languages. As a collaboration between the software company imaxin|software and the Nós Project, an initiative by the Universidade de Compostela aimed at providing the Galician language with openly licensed resources and tools in the area of intelligent technologies, it will combine both academic and industry perspectives. With Galician being one of the least supported languages in Europe, the project will be able to serve as an example for similar efforts focused on other languages of the world and, in particular, those present in the ELE SRIA.

From the SRIA

Implementation Recommendations

In order to properly implement the programme and reach digital language equality in Europe by 2030, several recommendations have been included in the agenda. First, the approx. nine year span of the ELE programme should be divided into three phases of three years each. Discussions between EU/EC and participating countries should be facilitated. The participating countries should also be encouraged to invest in tools and technologies for their own language(s). The EU should establish legislation that encourages or perhaps even ensures participation, as well as invest in pan-European coordination of all language-specific projects. Finally, the ELE programme should be structured into six themes: Language Modelling, Data and Knowledge, Machine Translation, Text Understanding, Speech and Infrastructure, with each theme being supported by coordination actions, research actions, and actions for innovation and deployment.

If you would like to voice your support for the ELE Programme and its goal and vision to achieve digital language equality in Europe by 2030, please consider filling out the endorsement form by clicking the button below and become a listed supporter on the ELE website:

Click here to endorse the ELE SRIA

Upcoming Events

January 20, 2023: Microservices Project Workshop (Finland)
April 21-23, 2023: 10th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (Poznań)

If you have an event that you think the European language technology community should know about, get in touch with us to have it featured in this newsletter.

Next edition

The next ELT newsletter will be sent out on 7 February 2022. Until then, follow our ELT social media accounts (as linked below) for the latest news!

Want to learn more? Visit https://european-language-technology.eu
or contact us directly.

The European Language Grid is an initiative funded by the European Union’s Horizon 2020 programme under grant agreement № 825627 (ELG).

The European Language Equality Project has received funding from the European Union under the grant agreement № LC-01641480 – 101018166 (ELE)

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.