Newsletter #29 – July 2023
Dear reader,

This edition of our newsletter is one of special occasions! META-FORUM 2023 took place last week and we consider it a great success. We would like  to thank all attendees, speakers, and everyone who made the event possible. 

With the release of the ELE book, which was also presented at META-FORUM 2023, this month also marks the end of the ELE 2 project that started on 1 July 2022. Revisit the project’s results on the ELE website and in the open access ELE book.

Since the end of the project also coincides with the anniversary of the end of the ELG project, we’re also taking a look at the current numbers of the European Language Grid and its resources.

Our curated news section is overflowing this month with European AI news, not least due to the topic of AI regulation and the AI act.

Finally, the European Parliament’s Panel for the Future of Science and Technology published a technical feasibility study for the development of a European streaming platform for European national news accessible in all EU languages. The study was prepared by a team of 21 researchers from six different research labs of the German Research Centre for Artificial Intelligence (DFKI).


With best regards


Georg Rehm
 
Subscribe to the Common European Language Data Space (LDS) Newsletter

The European Language Data Space initiative that was started back in January 2023 recently launched its monthly newsletter, providing information on the latest developments in secure, privacy-preserving language data sharing and use across Europe. 

We’d like to invite you to subscribe to the newsletter for updates on LDS implementation, success stories, events, and more!

In case you missed CONNECT University’s and LDS’s workshop on Large Language Models on 6 June or want to revisit it, you can watch the entire stream on the DigitalEU YouTube channel:

  • Introduction to LLM's – Philippe Gelin (Head of Sector "Multilingualism", European Commission)

  • EC policies landscape on Language Technologies – Yvo Volman (Director G Data, European Commission)

  • Overview, Basic Concepts, Capabilities – Georg Rehm, Pedro Ortiz, Malte Ostendorff (German Research Centre for Artificial Intelligence, DFKI, Germany)

  • Limitations and Evaluation – Ondrej Bojar (Charles University, Czech Republic)

  • Legal Aspects in a Nutshell – Mickael Rigault (Evaluations and Language Resources Distribution Agency, ELDA, France)

  • Data Governance – Anna Rogers (IT University of Copenhagen, Denmark)

  • Ethical Aspects and Biases – Karën Fort (LORIA, Sorbonne University, France)

  • Enabling New and Boosting Existing Applications – Peter Sarlin (Silo.AI, Sweden)

Language Technology and NLP in the news
Social media highlights
General News

A year ago, on 1 July 2022, the ELG project ended. For this anniversary, we want to take a look at its current numbers. The selection of tools and resources has been growing since, and currently features a total of:

  • 7985 Corpora

  • 3875 Tools & Services

  • 2806 Conceptual Resources

  • 512 Models & Grammars

  • 1774 Organizations

  • 513 Projects

Furthermore, we’re happy to report that ELG has more than 1,000 users – the number is constantly growing. 

We’re excited to see the continued use of the platform as an important space for Language Technologies in Europe.

Selected new tools and resources on the
European Language Grid
DaMuEL 1.0: A Large Multilingual Dataset for Entity Linking – This month’s resource is DaMuEL, a large Multilingual Dataset for Entity Linking containing data in 53 languages. DaMuEL consists of two components: a knowledge base that contains language-agnostic information about entities, including their claims from Wikidata and named entity types (PER, ORG, LOC, EVENT, BRAND, WORK_OF_ART, MANUFACTURED); and Wikipedia texts with entity mentions linked to the knowledge base, along with language-specific text from Wikidata such as labels, aliases, and descriptions, stored separately for each language. The Wikidata QID is used as a persistent, language-agnostic identifier, enabling the combination of the knowledge base with language-specific texts and information for each entity. The dataset contains 27.9M named entities in the knowledge base and 12.3G tokens from Wikipedia texts. The dataset is published under the CC BY-SA licence.
META-FORUM 2023

On 27 June, META-FORUM 2023 took place in Brussels, Belgium, with over 120 attendees, bringing together experts, stakeholders, and decision makers from academia, research, public administration and industry. At the conference, the ELE consortium members showcased key aspects of the project, placing emphasis on its outcomes such as the revised Strategic Research and Innovation Agenda (SRIA) and the roadmap aimed at achieving full digital language equality in Europe by 2030.

We want to thank everyone for attending, contributing, and participating in fruitful discussions at this year’s event. 

Videos of all presentations and panels will be uploaded on youtube in the coming weeks, so stay tuned! 

In the meantime, if you want to take another look at the full programme, featured topics, and speakers for all the sessions, check out the META-FORUM 2023 section on the ELE website.

Photo of the ELE2 consortium and representatives of the European Commission on 27 June 2023 in Brussels
Photo: The ELE2 consortium and representatives of the European Commission
(27 June 2023, Brussels)
 
From the SRIA
Infrastructure Recommendations

The SRIA presents several infrastructure recommendations to pave the way for achieving digital language equality in Europe. One crucial recommendation is to strengthen existing research infrastructures  and develop new ones, including language technology platforms. These infrastructures will support research and development activities, facilitate collaboration among stakeholders, foster knowledge sharing, and promote open access to data and technologies. Another important aspect is ensuring sufficient operational capacity, particularly for LLMs. It is crucial to allocate resources and investments to meet the computational requirements of these models effectively. Additionally, addressing the identified gaps in data, language resources, and knowledge graphs is essential. Europe should establish a clear trajectory toward comprehensive and interlinked data infrastructures that can bridge these gaps, promoting accessibility and integration.

To support the infrastructure requirements, flexible access to GPU-based high-performance computing facilities and a more suitable computing infrastructure is needed. These resources will ensure researchers and practitioners have the necessary computational power to develop and deploy advanced language technologies.

Finally, it is necessary to create a European network of centres of excellence in LT. Such a network would serve multiple purposes, including increasing industry visibility, shaping national research agendas, and implementing a European Data Strategy.


You can read more about all SRIA recommendations here or take a look at the full document.
 
Upcoming Events

If you have an event that you think the European language technology community should know about, get in touch with us to have it featured in this newsletter.
 

Next edition

The ELT newsletter will take a short summer break now. In the meantime, follow our ELT social media accounts (as linked below) for the latest news!


Want to learn more? Visit https://european-language-technology.eu 
or contact us directly.
Website
YouTube
Twitter
LinkedIn
Copyright © 2022 ELE and ELG Consortium, All rights reserved.
Why did I get this email?
The European Language Grid is an initiative funded by the European Union’s Horizon 2020 programme under grant agreement № 825627 (ELG).
The European Language Equality Project has received funding from the European Union under the grant agreement № LC-01641480 – 101018166 (ELE)
Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.